The LLM Primer Series
A field guide to generative AI, built one volume at a time. Seven books, each focused on a different layer of working with large language models.
What this series is
The LLM Primer Series is a structured, mechanism-first treatment of large language models — written for engineers, technical product managers, curious professionals, and anyone willing to read carefully. Each volume covers one slice of the field in depth, with the technical precision intact and the explanations grounded enough that you can use them.
The series is designed to work two ways. You can read Volume I as a complete foundation and stop there. Or you can follow the full progression — foundations, mathematics, retrieval, context design, production engineering, scaling, security — and end up with a comprehensive working knowledge of how to build with LLMs responsibly.
Every volume is written by Sho Shimoda, CTO of Receipt Roller Inc., who builds and runs production AI systems and writes about them in plain enough language that anyone can follow along.
Who this is for: Engineers and architects who want durable understanding. Product managers and executives who have to decide what AI to build. Curious professionals and students who want to understand the technology underneath the headlines. The series is written so that you don't need a math background to read it, but enough technical precision is kept that an experienced engineer wouldn't waste their time.
How to read this page
Each volume below lists its full table of contents, organized by Part. We will publish a chapter-by-chapter walkthrough article for every chapter in the series. Chapters that already have a walkthrough are linked; chapters whose walkthroughs are forthcoming appear in plain text.
Appendices are listed for transparency but are book-only content — reference material, cheat sheets, exercises with solutions, and other matter that belongs at the back of the book rather than in a separate walkthrough. To get the appendices, read the book.
Volume I — How Generative AI Works
A Clear and Practical Guide to the Foundations of Large Language Models.
The plain-language on-ramp to the whole series. Starting from zero — tokens, training, and the simple act of predicting the next word — it builds an honest, jargon-free picture of what a large language model is, how it is trained, and why it behaves as it does, assuming no prior background. It is the foundation every later volume builds upon.
Read on Amazon: LLM Primer I — How Generative AI Works
|
|
Series introduction: A Chapter-by-Chapter Walkthrough of LLM Primer I — Series Introduction & Index
Part I — Concepts & Foundations
Part III — Practical Perspectives
Part IV — Advanced Topics
Appendices (book only)
| A — | LLM Glossary |
| B — | Mathematics Behind Attention |
| C — | Prompting Cheat Sheet |
| D — | Tools & Libraries |
| E — | Recommended Reading |
Volume II — Language Models Through Mathematics
Exploring the Inner Workings of AI with Mathematical Insight.
A mathematically rigorous yet readable tour of the inner workings of AI: attention, optimization dynamics, loss landscapes, and scaling behavior, explained through the mathematics that makes them work. Every equation that matters is fully derived — each one wrapped in a story, an analogy, and a worked numerical example. For readers who want the math that the first volume keeps in the sidebars.
Read on Amazon: LLM Primer II — Language Models Through Mathematics
|
|
Series introduction: LLM Primer II — Language Models Through Mathematics: Series Introduction & Index
Part I — Mathematical Foundations for Understanding LLMs
Part II — The Mathematics of Transformers
Part III — Optimization and Large-Scale Training
Part IV — Applications, Limitations, and the Road Ahead
Appendices (book only)
| The LLM Math Cheat Sheet |
| A Statistical Perspective on LLMs |
| Questions People Ask |
| Worked Derivations |
| Exercises, with Solutions |
| Symbol Index |
| A Full Forward Pass, by the Numbers |
| A Timeline of the Ideas |
Volume III — Enhancing Enterprise AI with RAG
A Practical Guide to Building Retrieval-Augmented Generation Systems for the Enterprise.
Practical retrieval-augmented generation — vector databases, chunking strategies, and the architecture of grounding a model in your own documents for reliable, up-to-date enterprise answers. The volume to read if your job is to ship AI features that have to stay current and have to cite their sources.
|
|
Part I — Foundations of Retrieval-Augmented Generation
| Chapter 1 — | The Evolution of RAG Architecture |
Part II — Data Ingestion, Parsing, and Chunking
| Chapter 2 — | Intelligent Document Parsing |
| Chapter 3 — | Advanced Chunking Frameworks |
Part III — Vector Databases and Retrieval Optimization
| Chapter 4 — | Selecting the Right Vector Database |
| Chapter 5 — | Architecting the Retrieval Pipeline |
Part IV — Security, Privacy, and Access Control
| Chapter 6 — | RAG Threat Models and Vulnerabilities |
| Chapter 7 — | Implementing Access Control |
| Chapter 8 — | Data Anonymization in the RAG Pipeline |
Part V — Evaluation, Monitoring, and Maintenance
| Chapter 9 — | The RAG Evaluation Triad |
| Chapter 10 — | Leading Evaluation Frameworks |
| Chapter 11 — | Continuous Updates and Pipeline Optimization |
Appendices (book only)
| A — | Essential Mathematical Formulas for RAG Optimization |
| B — | Sample System Prompts for Data Anonymization and Evaluation |
| C — | Vector Database and Tool Decision Matrices |
| D — | Benchmark Datasets for RAG Evaluation |
Volume IV — Designing AI Cognition with MCP
Engineering Context, Tools, and Memory for Reliable AI Agents.
Structured context modeling and orchestration: how to shape a model's reasoning by engineering the context and situations it sees, rather than the model itself. The volume to read if you're building agentic systems — tool inventories, long-running loops, memory across sessions, and the discipline of designing what the model gets to see.
|
|
Part I — The Paradigm Shift in AI Integration
| Chapter 1 — | The AI Integration Crisis and the Rise of Agentic Architecture |
| Chapter 2 — | Unveiling the Model Context Protocol (MCP) |
Part II — Core Mechanics of the Model Context Protocol
| Chapter 3 — | Server Primitives — Exposing Context and Capabilities |
| Chapter 4 — | Client Primitives — Agentic Behaviors and Control |
| Chapter 5 — | Transport Protocols and Discovery |
Part III — Multi-Agent Orchestration Patterns
| Chapter 6 — | Fundamental Orchestration Strategies |
| Chapter 7 — | Advanced Collaborative and Dynamic Patterns |
| Chapter 8 — | Architectural Deployment Layouts |
Part IV — Designing AI Cognition: Context and Memory
| Chapter 9 — | Managing the Attention Budget |
| Chapter 10 — | Long-Horizon Task Memory |
Part V — Securing Agentic Workflows
| Chapter 11 — | Attack Surfaces and Protocol Vulnerabilities |
| Chapter 12 — | Protocol Hardening and Defenses |
Part VI — Production Engineering and Scale
| Chapter 13 — | Frameworks and Cloud Integration |
| Chapter 14 — | Benchmarking, Testing, and Performance |
Appendices (book only)
| A — | MCP Quick Reference & Cheat Sheet |
| B — | Implementation Blueprints & Code Examples |
| C — | Production Readiness & Security Checklists |
| D — | Advanced Specifications & Standard Enhancement Proposals (SEPs) |
| E — | Benchmarks & Performance Data |
| F — | Official Resources & Ecosystem Links |
Volume V — Building Real-World LLM Applications
Designing, Evaluating, and Operating LLM Systems in Production.
A systems-focused guide from prototype to production — API design, evaluation loops, monitoring, and integration — turning a capable model into a dependable product. The volume that turns architectural understanding into shipping services with real users on them.
|
|
Part I — Foundations of AI Engineering
| Chapter 1 — | The Discipline of AI Engineering |
| Chapter 2 — | Foundation Models & Prompt Engineering |
Part II — Building Agentic and Retrieval Capabilities
| Chapter 3 — | Retrieval-Augmented Generation (RAG) |
| Chapter 4 — | AI Agents and Tool Calling |
Part III — Quality Assurance and Observability
| Chapter 5 — | Evaluating LLM Applications |
| Chapter 6 — | AI Observability and Tracing |
Part IV — Security, Scale, and Optimization
| Chapter 7 — | LLM Security and Guardrails |
| Chapter 8 — | Optimizing Performance, Serving, and Cost |
Appendices (book only)
| A — | The Production Readiness & Security Checklists |
| B — | Tooling and Framework Selection Matrices |
| C — | Protocols, Streaming, and Structured Outputs |
| D — | Rate Limiting and Cost Management Architecture |
| E — | Glossary of AI Engineering Metrics and Terms |
Volume VI — Scaling AI Systems
Architecting Low-Latency LLM Inference for Production Scale.
Architecting high-performance inference: distributed serving, latency optimization, and cost modeling for systems that must answer millions of times a day. The volume to read when your AI system has grown past one server and now needs to behave like a real piece of infrastructure.
|
|
Part I — The Foundations of LLM Inference
| Chapter 1 — | The Mechanics of Token Generation |
| Chapter 2 — | The Key-Value (KV) Cache Challenge |
Part II — The Hardware Substrate
| Chapter 3 — | Data Center GPUs for Generative AI |
| Chapter 4 — | Specialized AI Silicon and ASICs |
Part III — Model-Level Optimization (Compression)
| Chapter 5 — | Demystifying Quantization |
| Chapter 6 — | Pruning and Knowledge Distillation |
Part IV — System and Engine-Level Optimizations
| Chapter 7 — | Advanced Batching Strategies |
| Chapter 8 — | Next-Generation KV Cache Management |
| Chapter 9 — | Speculative Decoding |
Part V — Serving Frameworks and Orchestration
| Chapter 10 — | The LLM Engine Layer |
| Chapter 11 — | The Platform and Orchestration Layer |
| Chapter 12 — | Disaggregated Serving and Kubernetes |
| Chapter 13 — | Autoscaling and Cold-Start Mitigation |
Part VI — Application-Level Economics and TCO
| Chapter 14 — | Token Economics and API Pricing |
| Chapter 15 — | Serverless APIs vs. Dedicated Infrastructure |
| Chapter 16 — | Cost-Cutting Strategies in Production |
Appendices (book only)
| A — | Mathematical Formulas and Cost Modeling Reference |
| B — | Hardware and Accelerator Specifications Guide |
| C — | Deployment Configurations and Code Snippets |
| D — | Benchmarking Methodology and Metrics Definitions |
Volume VII — AI Security
Defending LLM Systems Against Prompt Injection, Jailbreaks, and Adversarial Threats.
Designing safe and robust AI: adversarial risks, prompt injection, governance frameworks, and defensive design for systems deployed in the real world. The volume to read when your AI system has to be treated as security-relevant infrastructure.
|
|
Part I — Foundations of AI Security
| Chapter 1 — | Why AI Security Is Different |
| Chapter 2 — | Threat Modeling for LLM Systems |
| Chapter 3 — | Data Security and Privacy |
Part II — Prompt and Interaction Security
| Chapter 4 — | Prompt Injection and Jailbreaks |
| Chapter 5 — | Input Validation and Output Filtering |
| Chapter 6 — | Retrieval-Augmented Generation Risks |
Part III — Model Robustness and Reliability
| Chapter 7 — | Hallucinations and Reliability |
| Chapter 8 — | Adversarial Attacks on Models |
| Chapter 9 — | Model Integrity and Supply Chain Risks |
Part IV — System-Level Security Architecture
| Chapter 10 — | Designing Secure LLM Architectures |
| Chapter 11 — | Observability, Logging, and Incident Response |
| Chapter 12 — | Access Control and Identity |
Part V — Governance, Ethics, and Compliance
| Chapter 13 — | Regulatory Landscape |
| Chapter 14 — | Bias, Fairness, and Responsible AI |
| Chapter 15 — | Building a Secure AI Organization |
Part VI — Advanced Topics
| Chapter 16 — | Secure Fine-Tuning and Adaptation |
| Chapter 17 — | Future Threats and Emerging Defenses |
Appendices (book only)
| A — | AI Security Checklist for Production Systems |
| B — | Sample Threat Model Template |
| C — | Secure Prompt Design Patterns |
| D — | Incident Response Template for LLM Applications |
| E — | Recommended Tools and Frameworks |
How this page grows
This page will be updated as each volume of the series is published, and as walkthrough articles for each chapter go live. Volumes III through VII each have their full tables of contents above; the walkthrough articles for those chapters will be added as they are written.
Bookmark this page if you want to follow the series as it unfolds. Or subscribe to the channel feed to get each new post the day it lands.
Start with Volume I. Twelve chapters, fully revised for 2026, with diagrams, plain-English sidebars, code examples, and a complete treatment of how generative AI actually works.
Grab LLM Primer I on Amazon →
Then go deeper with Volume II. The mathematics underneath the machinery — every equation derived, every idea wrapped in a story, with worked examples, exercises with solutions, a math cheat sheet, and a full glossary.
Grab LLM Primer II on Amazon →