The LLM Primer Series
A field guide to generative AI, built one volume at a time. Seven books, each focused on a different layer of working with large language models.
What this series is
The LLM Primer Series is a structured, mechanism-first treatment of large language models — written for engineers, technical product managers, curious professionals, and anyone willing to read carefully. Each volume covers one slice of the field in depth, with the technical precision intact and the explanations grounded enough that you can use them.
The series is designed to work two ways. You can read Volume I as a complete foundation and stop there. Or you can follow the full progression — foundations, mathematics, retrieval, context design, production engineering, scaling, security — and end up with a comprehensive working knowledge of how to build with LLMs responsibly.
Every volume is written by Sho Shimoda, CTO of Receipt Roller Inc., who builds and runs production AI systems and writes about them in plain enough language that anyone can follow along.
How to read this page
Each volume below lists its full table of contents, organized by Part. We will publish a chapter-by-chapter walkthrough article for every chapter in the series. Chapters that already have a walkthrough are linked; chapters whose walkthroughs are forthcoming appear in plain text.
Appendices are listed for transparency but are book-only content — reference material, cheat sheets, exercises with solutions, and other matter that belongs at the back of the book rather than in a separate walkthrough. To get the appendices, read the book.
Volume I — How Generative AI Works (Available now)
A Clear and Practical Guide to the Foundations of Large Language Models.
The plain-language on-ramp to the whole series. Starting from zero — tokens, training, and the simple act of predicting the next word — it builds an honest, jargon-free picture of what a large language model is, how it is trained, and why it behaves as it does, assuming no prior background. It is the foundation every later volume builds upon.
Read on Amazon: LLM Primer I — How Generative AI Works
Series introduction: A Chapter-by-Chapter Walkthrough of LLM Primer I — Series Introduction & Index
Part I — Concepts & Foundations
Chapter 1 — What Is a Large Language Model?
Chapter 2 — Probability, Tokens, and Text
Chapter 3 — Neural Networks for Language
Part II — How LLMs Work
Chapter 4 — The Transformer Architecture
Chapter 5 — Training Large Models
Chapter 6 — Fine-Tuning & Adaptation
Chapter 7 — Beyond Next-Token Prediction
Part III — Practical Perspectives
Chapter 8 — Using LLMs in Applications
Chapter 9 — Performance, Scaling, and Costs
Chapter 10 — Safety, Ethics, & Trust
Part IV — Advanced Topics
Chapter 11 — Cutting-Edge Research
Chapter 12 — Building Your Own LLM System
Appendices (book only)
- Appendix A — LLM Glossary
- Appendix B — Mathematics Behind Attention
- Appendix C — Prompting Cheat Sheet
- Appendix D — Tools & Libraries
- Appendix E — Recommended Reading
Volume II — Language Models Through Mathematics (Available now)
Exploring the Inner Workings of AI with Mathematical Insight.
A mathematically rigorous yet readable tour of the inner workings of AI: attention, optimization dynamics, loss landscapes, and scaling behavior, explained through the mathematics that makes them work. Every equation that matters is fully derived — each one wrapped in a story, an analogy, and a worked numerical example. For readers who want the math that the first volume keeps in the sidebars.
Read on Amazon: LLM Primer II — Language Models Through Mathematics
Series introduction: LLM Primer II — Language Models Through Mathematics: Series Introduction & Index
Part I — Mathematical Foundations for Understanding LLMs
Chapter 1 — Mathematical Intuition for Language Models
Chapter 2 — LLMs in Context: Concepts and Background
Chapter 3 — Mathematical Tools for Language Models
Part II — The Mathematics of Transformers
Chapter 4 — Attention: The Core Mechanism
Chapter 5 — Position, Order, and Sequence Structure
Chapter 6 — Transformer Blocks and Representation Power
Chapter 7 — Efficiency and Transformer Variants
Part III — Optimization and Large-Scale Training
Chapter 8 — How Models Learn
Chapter 9 — Training at Scale
Chapter 10 — Post-Training and Alignment Mathematics
Chapter 11 — Evaluation, Calibration, and Inference
Part IV — Applications, Limitations, and the Road Ahead
Chapter 12 — Real-World Applications of LLMs
Chapter 13 — Limitations, Risks, and Open Challenges
Chapter 14 — Practical Knowledge for Engineers
Appendices (book only)
- The LLM Math Cheat Sheet
- A Statistical Perspective on LLMs
- Questions People Ask
- Worked Derivations
- Exercises, with Solutions
- Symbol Index
- A Full Forward Pass, by the Numbers
- A Timeline of the Ideas
Volume III — Enhancing Enterprise AI with RAG
A Practical Guide to Building Retrieval-Augmented Generation Systems for the Enterprise.
Practical retrieval-augmented generation — vector databases, chunking strategies, and the architecture of grounding a model in your own documents for reliable, up-to-date enterprise answers. The volume to read if your job is to ship AI features that have to stay current and have to cite their sources.
Part I — Foundations of Retrieval-Augmented Generation
Chapter 1 — The Evolution of RAG Architecture
Part II — Data Ingestion, Parsing, and Chunking
Chapter 2 — Intelligent Document Parsing
Chapter 3 — Advanced Chunking Frameworks
Part III — Vector Databases and Retrieval Optimization
Chapter 4 — Selecting the Right Vector Database
Chapter 5 — Architecting the Retrieval Pipeline
Part IV — Security, Privacy, and Access Control
Chapter 6 — RAG Threat Models and Vulnerabilities
Chapter 7 — Implementing Access Control
Chapter 8 — Data Anonymization in the RAG Pipeline
Part V — Evaluation, Monitoring, and Maintenance
Chapter 9 — The RAG Evaluation Triad
Chapter 10 — Leading Evaluation Frameworks
Chapter 11 — Continuous Updates and Pipeline Optimization
Appendices (book only)
- Appendix A — Essential Mathematical Formulas for RAG Optimization
- Appendix B — Sample System Prompts for Data Anonymization and Evaluation
- Appendix C — Vector Database and Tool Decision Matrices
- Appendix D — Benchmark Datasets for RAG Evaluation
Volume IV — Designing AI Cognition with MCP
Engineering Context, Tools, and Memory for Reliable AI Agents.
Structured context modeling and orchestration: how to shape a model's reasoning by engineering the context and situations it sees, rather than the model itself. The volume to read if you're building agentic systems — tool inventories, long-running loops, memory across sessions, and the discipline of designing what the model gets to see.
Part I — The Paradigm Shift in AI Integration
Chapter 1 — The AI Integration Crisis and the Rise of Agentic Architecture
Chapter 2 — Unveiling the Model Context Protocol (MCP)
Part II — Core Mechanics of the Model Context Protocol
Chapter 3 — Server Primitives — Exposing Context and Capabilities
Chapter 4 — Client Primitives — Agentic Behaviors and Control
Chapter 5 — Transport Protocols and Discovery
Part III — Multi-Agent Orchestration Patterns
Chapter 6 — Fundamental Orchestration Strategies
Chapter 7 — Advanced Collaborative and Dynamic Patterns
Chapter 8 — Architectural Deployment Layouts
Part IV — Designing AI Cognition: Context and Memory
Chapter 9 — Managing the Attention Budget
Chapter 10 — Long-Horizon Task Memory
Part V — Securing Agentic Workflows
Chapter 11 — Attack Surfaces and Protocol Vulnerabilities
Chapter 12 — Protocol Hardening and Defenses
Part VI — Production Engineering and Scale
Chapter 13 — Frameworks and Cloud Integration
Chapter 14 — Benchmarking, Testing, and Performance
Appendices (book only)
- Appendix A — MCP Quick Reference & Cheat Sheet
- Appendix B — Implementation Blueprints & Code Examples
- Appendix C — Production Readiness & Security Checklists
- Appendix D — Advanced Specifications & Standard Enhancement Proposals (SEPs)
- Appendix E — Benchmarks & Performance Data
- Appendix F — Official Resources & Ecosystem Links
Volume V — Building Real-World LLM Applications
Designing, Evaluating, and Operating LLM Systems in Production.
A systems-focused guide from prototype to production — API design, evaluation loops, monitoring, and integration — turning a capable model into a dependable product. The volume that turns architectural understanding into shipping services with real users on them.
Part I — Foundations of AI Engineering
Chapter 1 — The Discipline of AI Engineering
Chapter 2 — Foundation Models & Prompt Engineering
Part II — Building Agentic and Retrieval Capabilities
Chapter 3 — Retrieval-Augmented Generation (RAG)
Chapter 4 — AI Agents and Tool Calling
Part III — Quality Assurance and Observability
Chapter 5 — Evaluating LLM Applications
Chapter 6 — AI Observability and Tracing
Part IV — Security, Scale, and Optimization
Chapter 7 — LLM Security and Guardrails
Chapter 8 — Optimizing Performance, Serving, and Cost
Appendices (book only)
- Appendix A — The Production Readiness & Security Checklists
- Appendix B — Tooling and Framework Selection Matrices
- Appendix C — Protocols, Streaming, and Structured Outputs
- Appendix D — Rate Limiting and Cost Management Architecture
- Appendix E — Glossary of AI Engineering Metrics and Terms
Volume VI — Scaling AI Systems
Architecting Low-Latency LLM Inference for Production Scale.
Architecting high-performance inference: distributed serving, latency optimization, and cost modeling for systems that must answer millions of times a day. The volume to read when your AI system has grown past one server and now needs to behave like a real piece of infrastructure.
Part I — The Foundations of LLM Inference
Chapter 1 — The Mechanics of Token Generation
Chapter 2 — The Key-Value (KV) Cache Challenge
Part II — The Hardware Substrate
Chapter 3 — Data Center GPUs for Generative AI
Chapter 4 — Specialized AI Silicon and ASICs
Part III — Model-Level Optimization (Compression)
Chapter 5 — Demystifying Quantization
Chapter 6 — Pruning and Knowledge Distillation
Part IV — System and Engine-Level Optimizations
Chapter 7 — Advanced Batching Strategies
Chapter 8 — Next-Generation KV Cache Management
Chapter 9 — Speculative Decoding
Part V — Serving Frameworks and Orchestration
Chapter 10 — The LLM Engine Layer
Chapter 11 — The Platform and Orchestration Layer
Chapter 12 — Disaggregated Serving and Kubernetes
Chapter 13 — Autoscaling and Cold-Start Mitigation
Part VI — Application-Level Economics and TCO
Chapter 14 — Token Economics and API Pricing
Chapter 15 — Serverless APIs vs. Dedicated Infrastructure
Chapter 16 — Cost-Cutting Strategies in Production
Appendices (book only)
- Appendix A — Mathematical Formulas and Cost Modeling Reference
- Appendix B — Hardware and Accelerator Specifications Guide
- Appendix C — Deployment Configurations and Code Snippets
- Appendix D — Benchmarking Methodology and Metrics Definitions
Volume VII — AI Security
Defending LLM Systems Against Prompt Injection, Jailbreaks, and Adversarial Threats.
Designing safe and robust AI: adversarial risks, prompt injection, governance frameworks, and defensive design for systems deployed in the real world. The volume to read when your AI system has to be treated as security-relevant infrastructure.
Part I — Foundations of AI Security
Chapter 1 — Why AI Security Is Different
Chapter 2 — Threat Modeling for LLM Systems
Chapter 3 — Data Security and Privacy
Part II — Prompt and Interaction Security
Chapter 4 — Prompt Injection and Jailbreaks
Chapter 5 — Input Validation and Output Filtering
Chapter 6 — Retrieval-Augmented Generation Risks
Part III — Model Robustness and Reliability
Chapter 7 — Hallucinations and Reliability
Chapter 8 — Adversarial Attacks on Models
Chapter 9 — Model Integrity and Supply Chain Risks
Part IV — System-Level Security Architecture
Chapter 10 — Designing Secure LLM Architectures
Chapter 11 — Observability, Logging, and Incident Response
Chapter 12 — Access Control and Identity
Part V — Governance, Ethics, and Compliance
Chapter 13 — Regulatory Landscape
Chapter 14 — Bias, Fairness, and Responsible AI
Chapter 15 — Building a Secure AI Organization
Part VI — Advanced Topics
Chapter 16 — Secure Fine-Tuning and Adaptation
Chapter 17 — Future Threats and Emerging Defenses
Appendices (book only)
- Appendix A — AI Security Checklist for Production Systems
- Appendix B — Sample Threat Model Template
- Appendix C — Secure Prompt Design Patterns
- Appendix D — Incident Response Template for LLM Applications
- Appendix E — Recommended Tools and Frameworks
How this page grows
This page will be updated as each volume of the series is published, and as walkthrough articles for each chapter go live. Volumes III through VII each have their full tables of contents above; the walkthrough articles for those chapters will be added as they are written.
Bookmark this page if you want to follow the series as it unfolds. Or subscribe to the channel feed to get each new post the day it lands.