The LLM Primer Series

A seven-volume field guide to generative AI by Sho Shimoda, now complete.

What this series is

The LLM Primer Series is a structured, mechanism-first treatment of large language models — written for engineers, technical product managers, curious practitioners, and anyone willing to read carefully. Each volume takes one layer of the field in depth, holding technical precision while keeping the explanations clear enough to actually use.

The series is designed to work two ways. You can read Volume I as a complete foundation and stop there. Or you can follow the whole arc — fundamentals, mathematics, retrieval, cognition design, production engineering, scaling, and security — and finish with an end-to-end working understanding of how to build with large language models responsibly.

All seven volumes are now published and available on Amazon. Together they form a single, coherent map of the LLM engineering stack, written by the same author in the same voice, from the first token through to the last defensive control on a production system. A standalone companion volume, Physical AI, extends the map into embodied systems and robotics.

All volumes are written by Sho Shimoda, CTO of Receipt Roller Inc., who builds and operates AI systems in production and writes about them in language clear enough to be followed.

Who this is for: engineers and architects looking for a durable understanding. Product managers and executives who have to decide which AI to build. Curious practitioners and students who want to understand the technology behind the headlines. The series is written so you do not need a mathematical background to read it, but it keeps enough technical precision that an experienced engineer will not feel their time is being wasted.

How to read this page

Each volume below lists its full table of contents, organized by Part, with a chapter-by-chapter walkthrough article linked for every chapter. If you want the deepest treatment, read the book; the walkthroughs summarize each chapter and give you a place to think through the mechanics on the web.

Appendices are listed for transparency, but they are book-only content — reference material, worksheets, exercises with solutions, and other blocks that belong at the back of the book rather than as standalone walkthroughs. To get the appendices, read the book.

The seven volumes

Volume I — How Generative AI Works

A clear, practical guide to the fundamentals of large language models.

The plain-language on-ramp to the whole series. Starting from zero — tokens, training, and the simple act of predicting the next word — it builds an honest, jargon-free picture of what a large language model is, how it is trained, and why it behaves the way it does, assuming no prior background. It is the foundation every later volume sits on.

Available now on Amazon. View LLM Primer I on Amazon →

Series introduction: A chapter-by-chapter walkthrough of LLM Primer I — Series introduction & index

Part I — Concepts and foundations

Chapter 1 —	What is a large language model?
Chapter 2 —	Probability, tokens, and text
Chapter 3 —	Neural networks for language

Part II — How LLMs work

Chapter 4 —	The Transformer architecture
Chapter 5 —	Training large models
Chapter 6 —	Fine-tuning and adaptation
Chapter 7 —	Beyond next-token prediction

Part III — Practical perspectives

Chapter 8 —	Using LLMs in applications
Chapter 9 —	Performance, scaling, and costs
Chapter 10 —	Safety, ethics, and trust

Part IV — Advanced topics

Chapter 11 —	Frontier research
Chapter 12 —	Building your own LLM system

Appendices (book only)

A —	LLM glossary
B —	The mathematics behind attention
C —	Prompting reference sheet

D —	Tools and libraries
E —	Further reading

Volume II — Language Models Through Mathematics

A mathematically rigorous yet readable tour of how the machinery actually works.

A rigorous but readable walk through the internals — attention, optimization dynamics, loss landscapes, and scaling behavior, explained through the mathematics that holds them up. Every equation that matters is derived in full, each one wrapped in a story, an analogy, and a worked numerical example. For readers who want the mathematics that the first volume keeps in its sidebars.

LLM Primer II — Language Models Through Mathematics

Available now on Amazon. View LLM Primer II on Amazon →

Series introduction: A chapter-by-chapter walkthrough of LLM Primer II — Series introduction & index

Part I — Mathematical intuition

Chapter 1 —	Mathematical intuition for language models
Chapter 2 —	LLMs in context
Chapter 3 —	Mathematical tools

Part II — Anatomy of a Transformer

Chapter 4 —	Attention
Chapter 5 —	Position, order, and sequence structure
Chapter 6 —	Transformer blocks
Chapter 7 —	Efficiency and Transformer variants

Part III — Training, alignment, and evaluation

Chapter 8 —	How models learn
Chapter 9 —	Training at scale
Chapter 10 —	The mathematics of post-training and alignment
Chapter 11 —	Evaluation, calibration, and inference

Part IV — Applications, limits, and practice

Chapter 12 —	Real-world LLM applications
Chapter 13 —	Limitations, risks, and open problems
Chapter 14 —	Practical knowledge for engineers

Appendices (book only)

LLM mathematics reference sheet

A statistical perspective on LLMs

Questions people ask

Worked derivations

Exercises with solutions

Index of symbols

A complete forward pass, in numbers

A timeline of the ideas

Volume III — Enhancing Enterprise AI with RAG

A practitioner's walkthrough of the full retrieval-augmented generation stack.

Retrieval-augmented generation in practical terms — parsing, chunking, vector storage, retrieval, security, evaluation, and continuous updates — the architecture of anchoring a model in your own documents so you can get enterprise answers that are reliable and current. The volume to read if your job is shipping AI features that have to stay up to date and cite their sources.

LLM Primer III — Enhancing Enterprise AI with RAG

Available now on Amazon. View LLM Primer III on Amazon →

Series introduction: LLM Primer III — Series introduction & index

Part I — RAG foundations

Chapter 1 —

The evolution of RAG architecture

Part II — Ingestion, parsing, and chunking

Chapter 2 —	Intelligent document parsing
Chapter 3 —	Advanced chunking frameworks

Part III — Vector databases and retrieval optimization

Chapter 4 —	Choosing the right vector database
Chapter 5 —	Architecting the retrieval pipeline

Part IV — Security, privacy, and access control

Chapter 6 —	Threat models and RAG vulnerabilities
Chapter 7 —	Implementing access control
Chapter 8 —	Data anonymization in the RAG pipeline

Part V — Evaluation, monitoring, and maintenance

Chapter 9 —	The RAG evaluation triad
Chapter 10 —	Leading evaluation frameworks
Chapter 11 —	Continuous updates and pipeline optimization

Appendices (book only)

A —	Essential mathematical formulas for optimizing RAG
B —	Sample system prompts for anonymization and evaluation

C —	Vector database and tooling decision matrices
D —	Reference datasets for evaluating RAG

Volume IV — Designing AI Cognition with MCP

Engineering context, tools, and memory for reliable AI agents.

The architecture that surrounds the model — the Model Context Protocol, orchestration patterns, attention and memory budgets, and the security model for agentic systems. Structured context modeling and orchestration: how to shape the model's reasoning by engineering the context and the situations it sees, rather than modifying the model itself. The volume to read if you are building agentic systems — tool inventories, long-running loops, cross-session memory, and the discipline of designing what the model is allowed to look at.

LLM Primer IV — Designing AI Cognition with MCP

Available now on Amazon. View LLM Primer IV on Amazon →

Series introduction: LLM Primer IV chapter-by-chapter walkthrough — Series introduction & index

Part I — The paradigm shift in AI integration

Chapter 1 —	The AI integration crisis and the rise of agentic architecture
Chapter 2 —	Unveiling the Model Context Protocol (MCP)

Part II — Core MCP mechanics

Chapter 3 —	Server primitives — exposing context and capabilities
Chapter 4 —	Client primitives — agentic behaviors and control
Chapter 5 —	Transport protocols and discovery

Part III — Multi-agent orchestration patterns

Chapter 6 —	Fundamental orchestration strategies
Chapter 7 —	Advanced collaborative and dynamic patterns
Chapter 8 —	Deployment architecture layouts

Part IV — Designing cognition: context and memory

Chapter 9 —	Managing the attention budget
Chapter 10 —	Memory for long-horizon tasks

Part V — Securing agentic flows

Chapter 11 —	Attack surfaces and protocol vulnerabilities
Chapter 12 —	Protocol hardening and defenses

Part VI — Production engineering and scale

Chapter 13 —	Frameworks and cloud integration
Chapter 14 —	Benchmarking, testing, and performance

Appendices (book only)

A —	MCP quick reference and cheat sheet
B —	Implementation blueprints and code examples
C —	Production readiness and security checklists

D —	Advanced specifications and Standard Enhancement Proposals (SEPs)
E —	Benchmarks and performance data
F —	Official resources and ecosystem links

Volume V — Building Real-World LLM Applications

Designing, evaluating, and operating LLM systems in production.

A systems-focused guide from prototype to production — prompt engineering, retrieval, agents and tool calling, evaluation loops, observability, security, and serving economics — for turning a capable model into a reliable product. The volume that turns architectural understanding into deployed services with real users sitting on top of them.

LLM Primer V — Building Real-World LLM Applications

Available now on Amazon. View LLM Primer V on Amazon →

Series introduction: A chapter-by-chapter walkthrough of LLM Primer V — Series introduction & index

Part I — Foundations of AI engineering

Chapter 1 —	The discipline of AI engineering
Chapter 2 —	Foundation models and prompt engineering

Part II — Building agentic and retrieval capabilities

Chapter 3 —	Retrieval-augmented generation (RAG)
Chapter 4 —	AI agents and tool calling

Part III — Quality assurance and observability

Chapter 5 —	Evaluating LLM applications
Chapter 6 —	AI observability and tracing

Part IV — Security, scale, and optimization

Chapter 7 —	LLM security and guardrails
Chapter 8 —	Optimizing performance, serving, and cost

Appendices (book only)

A —	Production readiness and security checklists
B —	Tool and framework selection matrices
C —	Protocols, streaming, and structured outputs

D —	Rate-limiting architecture and cost management
E —	Glossary of AI engineering metrics and terms

Volume VI — Scaling AI Systems

Architecting low-latency LLM inference for production scale.

Inference at scale, cost modeling, and infrastructure: token generation mechanics, KV cache management, GPU and specialized silicon, quantization and distillation, batching and speculative decoding, disaggregated serving, and the economics of systems that have to answer millions of times a day. The volume to read once your AI system has grown beyond a single server and now needs to behave like a real piece of infrastructure.

Available now on Amazon. View LLM Primer VI on Amazon →

Series introduction: A chapter-by-chapter walkthrough of LLM Primer VI — Series introduction & index

Part I — Foundations of LLM inference

Chapter 1 —	The mechanics of token generation
Chapter 2 —	The key-value (KV) cache challenge

Part II — The hardware substrate

Chapter 3 —	Data center GPUs for generative AI
Chapter 4 —	Specialized AI silicon and ASICs

Part III — Model-level optimization (compression)

Chapter 5 —	Demystifying quantization
Chapter 6 —	Pruning and knowledge distillation

Part IV — System- and engine-level optimizations

Chapter 7 —	Advanced batching strategies
Chapter 8 —	Next-generation KV cache management
Chapter 9 —	Speculative decoding

Part V — Serving frameworks and orchestration

Chapter 10 —	The LLM engine layer
Chapter 11 —	The platform and orchestration layer
Chapter 12 —	Disaggregated serving and Kubernetes
Chapter 13 —	Autoscaling and cold-start mitigation

Part VI — Application-level economics and TCO

Chapter 14 —	Token economics and API pricing
Chapter 15 —	Serverless APIs versus dedicated infrastructure
Chapter 16 —	Cost-reduction strategies in production

Appendices (book only)

A —	Reference of mathematical formulas and cost modeling
B —	Hardware specifications and accelerators guide

C —	Deployment configurations and code snippets
D —	Benchmarking methodology and metric definitions

Volume VII — AI Security

Defending LLM systems against prompt injection, jailbreaks, and adversarial threats.

The series finale. Defensive design for systems that have to be treated as security-relevant infrastructure — adversarial risks, prompt injection, RAG poisoning, supply-chain integrity, incident response, access control, governance, bias and fairness, and the discipline of building a secure AI organization. The volume to read when your AI system has to be treated as production infrastructure that adversaries will actively probe.

Available now on Amazon. View LLM Primer VII on Amazon →

Series introduction: A chapter-by-chapter walkthrough of LLM Primer VII — Series introduction & index

Part I — Foundations of AI security

Chapter 1 —	Why AI security is different
Chapter 2 —	Threat modeling for LLM systems
Chapter 3 —	Data security and privacy

Part II — Prompt and interaction security

Chapter 4 —	Prompt injection and jailbreaks
Chapter 5 —	Input validation and output filtering
Chapter 6 —	Risks of retrieval-augmented generation

Part III — Model robustness and reliability

Chapter 7 —	Hallucinations and reliability
Chapter 8 —	Adversarial attacks on models
Chapter 9 —	Model integrity and supply-chain risks

Part IV — System-level security architecture

Chapter 10 —	Designing secure LLM architectures
Chapter 11 —	Observability, logging, and incident response
Chapter 12 —	Access control and identity

Part V — Governance, ethics, and compliance

Chapter 13 —	Regulatory landscape
Chapter 14 —	Bias, fairness, and responsible AI
Chapter 15 —	Building a secure AI organization

Part VI — Advanced topics

Chapter 16 —	Secure fine-tuning and adaptation
Chapter 17 —	Future threats and emerging defenses

Appendices (book only)

A —	AI security checklist for production systems
B —	Sample threat model template
C —	Secure prompt design patterns

D —	Incident response template for LLM applications
E —	Recommended tools and frameworks

Physical AI — a companion volume

Engineering Embodied Intelligence for the Real World. A standalone sister volume by the same author that extends the LLM Primer map into embodied systems: perception, planning, control, safety, and the engineering discipline of putting intelligence into things that move. Where the seven-volume series treats language models as software systems that generate text, Physical AI treats them as one component in a larger stack that also has sensors, actuators, and physical consequences. It stands on its own; you do not have to read the LLM Primer Series first, though the two are designed to complement each other.