Chapter 1 — What Is a Large Language Model?
This is Part 1 of a series walking through LLM Primer I: How Generative AI Works — a mechanism-first guide to the technology behind modern AI. Over the next twelve posts I'll preview each chapter of the book, share the frameworks that organize the material, and explain why I wrote it the way I did.
The question that sounds simple, and isn't
If you ask a hundred people what a large language model is, you'll get a hundred answers, and most of them will be wrong in some interesting way. "It's an AI." "It's a chatbot." "It's a search engine that talks back." "It's the thing that wrote my report last night."
None of those are wrong, exactly. But they're descriptions of what an LLM does, not what it is. Chapter 1 is about the second question — the one most introductions skip over and most marketing copy actively obscures. Because if you can't answer that question accurately, every other claim about LLMs becomes harder to evaluate.
The three words, taken seriously
The book opens by taking the term LLM apart, word by word, because each word carries weight that gets ignored once "LLM" becomes shorthand.
Large doesn't mean physically big. It means the system has on the order of billions of internal numerical settings — called parameters — that were adjusted during training. It also means the training itself used enormous amounts of text and enormous amounts of computing power. Each of those three numbers — parameters, data, compute — has to grow together for the model to actually get smarter. Doubling just one of them tends to disappoint.
Language sounds obvious, but it has a specific meaning here. The model doesn't understand grammar or meaning the way you do. It works with sequences of small pieces of text called tokens — usually shorter than words. From the model's perspective, every prompt is a sequence of numbers, and every reply is just the next number, and the next, and the next.
Model is the most loaded word of the three. A model in this sense isn't a database that stores facts. It isn't a person who knows things. It's a trained mathematical function — a pattern-recognizer — that produces likely continuations of the text it was given. When the model "knows" the capital of France, it doesn't look up the fact. It produces "Paris" because, given the rest of the prompt, "Paris" is the most probable next token according to the patterns it absorbed from training data.
That distinction matters more than it sounds. It explains why LLMs hallucinate. It explains why they can be confidently wrong. It explains why they're so good at generating fluent text and so unreliable when asked to be authoritative about facts. The book returns to this distinction repeatedly because it's the single most useful frame for predicting how any LLM will behave in any situation.
How we got here, in a paragraph
Chapter 1 also walks through how language modeling actually evolved — because the modern LLM is the latest chapter in a story that goes back decades. For a long time, computers handled language using either hand-written grammar rules or by counting how often certain word pairs showed up in books. Both approaches plateaued. The breakthrough was learning patterns directly from huge amounts of text, instead of being told what the rules are. The ideas underneath today's LLMs are older than people think; what's new is the scale at which they're now applied.
I won't spoil the specific architectural breakthrough that changed everything — that's Chapter 3 and 4. But I'll say this: the transition from "looking up word counts" to "learning patterns" is the single most important shift in the history of natural language processing, and understanding it makes everything that came after make sense.
Three myths I take seriously enough to debunk
The chapter ends by addressing three persistent misconceptions about what LLMs are doing. I take them seriously because each of them, if you believe it, will lead you to make bad decisions about when to trust an LLM and when not to.
The first myth is that LLMs understand the way humans do. They don't. They produce outputs that look like understanding because they were trained on text written by people who do understand. The second is that LLMs are databases of facts. They aren't. Facts are distributed across billions of weights, which is why models can confidently produce plausible-but-false statements. The third is that bigger models are always smarter. They aren't. Scale interacts with data quality, training method, and architectural choices, and the largest available model isn't always the right tool for the job.
What Chapter 1 sets up
By the end of the chapter, you have a working definition of what an LLM is and isn't, a sense of how the field got here, and a clear-eyed view of the most common misconceptions. That's not a small payoff for a single chapter. It's the foundation that makes the rest of the book possible to read.
If you read Chapter 1 and nothing else, you'll come away able to reason about LLMs more accurately than most of the headlines about them. That alone is, for many readers, enough to make the book worth the price.
Next up — Chapter 2: Probability, Tokens, and Text. Tomorrow we get specific about what those "tokens" really are, why the model is fundamentally a probability machine, and how next-token prediction — the one thing the model actually does — becomes everything else it can do.