The LLM Primer Series — A Field Guide to Generative AI, Built One Volume at a Time

The LLM Primer Series — a seven-volume field guide to generative AI by Sho Shimoda. Each volume covers a different layer of working with large language models, from foundations to scaling to security. This is the landing page: an overview of the whole series, plus the live chapter-by-chapter walkthrough of the first volume.


The LLM Primer Series — A Field Guide to Generative AI, Built One Volume at a Time

The LLM Primer Series

A field guide to generative AI, built one volume at a time. Seven books, each focused on a different layer of working with large language models.


What this series is

The LLM Primer Series is a structured, mechanism-first treatment of large language models — written for engineers, technical product managers, curious professionals, and anyone willing to read carefully. Each volume covers one slice of the field in depth, with the technical precision intact and the explanations grounded enough that you can use them.

The series is designed to work two ways. You can read Volume I as a complete foundation and stop there. Or you can follow the full progression — foundations, mathematics, retrieval, context design, production engineering, scaling, security — and end up with a comprehensive working knowledge of how to build with LLMs responsibly.

Every volume is written by Sho Shimoda, CTO of Receipt Roller Inc., who builds and runs production AI systems and writes about them in plain enough language that anyone can follow along.

Who this is for: Engineers and architects who want durable understanding. Product managers and executives who have to decide what AI to build. Curious professionals and students who want to understand the technology underneath the headlines. The series is written so that you don't need a math background to read it, but enough technical precision is kept that an experienced engineer wouldn't waste their time.

How to read this page

Each volume below lists its full table of contents, organized by Part. We will publish a chapter-by-chapter walkthrough article for every chapter in the series. Chapters that already have a walkthrough are linked; chapters whose walkthroughs are forthcoming appear in plain text.

Appendices are listed for transparency but are book-only content — reference material, cheat sheets, exercises with solutions, and other matter that belongs at the back of the book rather than in a separate walkthrough. To get the appendices, read the book.


Volume I — How Generative AI Works

A Clear and Practical Guide to the Foundations of Large Language Models.

The plain-language on-ramp to the whole series. Starting from zero — tokens, training, and the simple act of predicting the next word — it builds an honest, jargon-free picture of what a large language model is, how it is trained, and why it behaves as it does, assuming no prior background. It is the foundation every later volume builds upon.

Read on Amazon: LLM Primer I — How Generative AI Works

LLM Primer I — How Generative AI Works

Series introduction: A Chapter-by-Chapter Walkthrough of LLM Primer I — Series Introduction & Index

Part I — Concepts & Foundations

Chapter 1 —What Is a Large Language Model?
Chapter 2 —Probability, Tokens, and Text
Chapter 3 —Neural Networks for Language

Part II — How LLMs Work

Chapter 4 —The Transformer Architecture
Chapter 5 —Training Large Models
Chapter 6 —Fine-Tuning & Adaptation
Chapter 7 —Beyond Next-Token Prediction

Part III — Practical Perspectives

Chapter 8 —Using LLMs in Applications
Chapter 9 —Performance, Scaling, and Costs
Chapter 10 —Safety, Ethics, & Trust

Part IV — Advanced Topics

Chapter 11 —Cutting-Edge Research
Chapter 12 —Building Your Own LLM System

Appendices (book only)

A —LLM Glossary
B —Mathematics Behind Attention
C —Prompting Cheat Sheet
D —Tools & Libraries
E —Recommended Reading

Volume II — Language Models Through Mathematics

Exploring the Inner Workings of AI with Mathematical Insight.

A mathematically rigorous yet readable tour of the inner workings of AI: attention, optimization dynamics, loss landscapes, and scaling behavior, explained through the mathematics that makes them work. Every equation that matters is fully derived — each one wrapped in a story, an analogy, and a worked numerical example. For readers who want the math that the first volume keeps in the sidebars.

Read on Amazon: LLM Primer II — Language Models Through Mathematics

LLM Primer II — Language Models Through Mathematics

Series introduction: LLM Primer II — Language Models Through Mathematics: Series Introduction & Index

Part I — Mathematical Foundations for Understanding LLMs

Chapter 1 —Mathematical Intuition for Language Models
Chapter 2 —LLMs in Context: Concepts and Background
Chapter 3 —Mathematical Tools for Language Models

Part II — The Mathematics of Transformers

Chapter 4 —Attention: The Core Mechanism
Chapter 5 —Position, Order, and Sequence Structure
Chapter 6 —Transformer Blocks and Representation Power
Chapter 7 —Efficiency and Transformer Variants

Part III — Optimization and Large-Scale Training

Chapter 8 —How Models Learn
Chapter 9 —Training at Scale
Chapter 10 —Post-Training and Alignment Mathematics
Chapter 11 —Evaluation, Calibration, and Inference

Part IV — Applications, Limitations, and the Road Ahead

Chapter 12 —Real-World Applications of LLMs
Chapter 13 —Limitations, Risks, and Open Challenges
Chapter 14 —Practical Knowledge for Engineers

Appendices (book only)

The LLM Math Cheat Sheet
A Statistical Perspective on LLMs
Questions People Ask
Worked Derivations
Exercises, with Solutions
Symbol Index
A Full Forward Pass, by the Numbers
A Timeline of the Ideas

Volume III — Enhancing Enterprise AI with RAG

A Practical Guide to Building Retrieval-Augmented Generation Systems for the Enterprise.

Practical retrieval-augmented generation — vector databases, chunking strategies, and the architecture of grounding a model in your own documents for reliable, up-to-date enterprise answers. The volume to read if your job is to ship AI features that have to stay current and have to cite their sources.

LLM Primer III — Enhancing Enterprise AI with RAG

Part I — Foundations of Retrieval-Augmented Generation

Chapter 1 —The Evolution of RAG Architecture

Part II — Data Ingestion, Parsing, and Chunking

Chapter 2 —Intelligent Document Parsing
Chapter 3 —Advanced Chunking Frameworks

Part III — Vector Databases and Retrieval Optimization

Chapter 4 —Selecting the Right Vector Database
Chapter 5 —Architecting the Retrieval Pipeline

Part IV — Security, Privacy, and Access Control

Chapter 6 —RAG Threat Models and Vulnerabilities
Chapter 7 —Implementing Access Control
Chapter 8 —Data Anonymization in the RAG Pipeline

Part V — Evaluation, Monitoring, and Maintenance

Chapter 9 —The RAG Evaluation Triad
Chapter 10 —Leading Evaluation Frameworks
Chapter 11 —Continuous Updates and Pipeline Optimization

Appendices (book only)

A —Essential Mathematical Formulas for RAG Optimization
B —Sample System Prompts for Data Anonymization and Evaluation
C —Vector Database and Tool Decision Matrices
D —Benchmark Datasets for RAG Evaluation

Volume IV — Designing AI Cognition with MCP

Engineering Context, Tools, and Memory for Reliable AI Agents.

Structured context modeling and orchestration: how to shape a model's reasoning by engineering the context and situations it sees, rather than the model itself. The volume to read if you're building agentic systems — tool inventories, long-running loops, memory across sessions, and the discipline of designing what the model gets to see.

LLM Primer IV — Designing AI Cognition with MCP

Part I — The Paradigm Shift in AI Integration

Chapter 1 —The AI Integration Crisis and the Rise of Agentic Architecture
Chapter 2 —Unveiling the Model Context Protocol (MCP)

Part II — Core Mechanics of the Model Context Protocol

Chapter 3 —Server Primitives — Exposing Context and Capabilities
Chapter 4 —Client Primitives — Agentic Behaviors and Control
Chapter 5 —Transport Protocols and Discovery

Part III — Multi-Agent Orchestration Patterns

Chapter 6 —Fundamental Orchestration Strategies
Chapter 7 —Advanced Collaborative and Dynamic Patterns
Chapter 8 —Architectural Deployment Layouts

Part IV — Designing AI Cognition: Context and Memory

Chapter 9 —Managing the Attention Budget
Chapter 10 —Long-Horizon Task Memory

Part V — Securing Agentic Workflows

Chapter 11 —Attack Surfaces and Protocol Vulnerabilities
Chapter 12 —Protocol Hardening and Defenses

Part VI — Production Engineering and Scale

Chapter 13 —Frameworks and Cloud Integration
Chapter 14 —Benchmarking, Testing, and Performance

Appendices (book only)

A —MCP Quick Reference & Cheat Sheet
B —Implementation Blueprints & Code Examples
C —Production Readiness & Security Checklists
D —Advanced Specifications & Standard Enhancement Proposals (SEPs)
E —Benchmarks & Performance Data
F —Official Resources & Ecosystem Links

Volume V — Building Real-World LLM Applications

Designing, Evaluating, and Operating LLM Systems in Production.

A systems-focused guide from prototype to production — API design, evaluation loops, monitoring, and integration — turning a capable model into a dependable product. The volume that turns architectural understanding into shipping services with real users on them.

LLM Primer V — Building Real-World LLM Applications

Part I — Foundations of AI Engineering

Chapter 1 —The Discipline of AI Engineering
Chapter 2 —Foundation Models & Prompt Engineering

Part II — Building Agentic and Retrieval Capabilities

Chapter 3 —Retrieval-Augmented Generation (RAG)
Chapter 4 —AI Agents and Tool Calling

Part III — Quality Assurance and Observability

Chapter 5 —Evaluating LLM Applications
Chapter 6 —AI Observability and Tracing

Part IV — Security, Scale, and Optimization

Chapter 7 —LLM Security and Guardrails
Chapter 8 —Optimizing Performance, Serving, and Cost

Appendices (book only)

A —The Production Readiness & Security Checklists
B —Tooling and Framework Selection Matrices
C —Protocols, Streaming, and Structured Outputs
D —Rate Limiting and Cost Management Architecture
E —Glossary of AI Engineering Metrics and Terms

Volume VI — Scaling AI Systems

Architecting Low-Latency LLM Inference for Production Scale.

Architecting high-performance inference: distributed serving, latency optimization, and cost modeling for systems that must answer millions of times a day. The volume to read when your AI system has grown past one server and now needs to behave like a real piece of infrastructure.

LLM Primer VI — Scaling AI Systems

Part I — The Foundations of LLM Inference

Chapter 1 —The Mechanics of Token Generation
Chapter 2 —The Key-Value (KV) Cache Challenge

Part II — The Hardware Substrate

Chapter 3 —Data Center GPUs for Generative AI
Chapter 4 —Specialized AI Silicon and ASICs

Part III — Model-Level Optimization (Compression)

Chapter 5 —Demystifying Quantization
Chapter 6 —Pruning and Knowledge Distillation

Part IV — System and Engine-Level Optimizations

Chapter 7 —Advanced Batching Strategies
Chapter 8 —Next-Generation KV Cache Management
Chapter 9 —Speculative Decoding

Part V — Serving Frameworks and Orchestration

Chapter 10 —The LLM Engine Layer
Chapter 11 —The Platform and Orchestration Layer
Chapter 12 —Disaggregated Serving and Kubernetes
Chapter 13 —Autoscaling and Cold-Start Mitigation

Part VI — Application-Level Economics and TCO

Chapter 14 —Token Economics and API Pricing
Chapter 15 —Serverless APIs vs. Dedicated Infrastructure
Chapter 16 —Cost-Cutting Strategies in Production

Appendices (book only)

A —Mathematical Formulas and Cost Modeling Reference
B —Hardware and Accelerator Specifications Guide
C —Deployment Configurations and Code Snippets
D —Benchmarking Methodology and Metrics Definitions

Volume VII — AI Security

Defending LLM Systems Against Prompt Injection, Jailbreaks, and Adversarial Threats.

Designing safe and robust AI: adversarial risks, prompt injection, governance frameworks, and defensive design for systems deployed in the real world. The volume to read when your AI system has to be treated as security-relevant infrastructure.

LLM Primer VII — AI Security

Part I — Foundations of AI Security

Chapter 1 —Why AI Security Is Different
Chapter 2 —Threat Modeling for LLM Systems
Chapter 3 —Data Security and Privacy

Part II — Prompt and Interaction Security

Chapter 4 —Prompt Injection and Jailbreaks
Chapter 5 —Input Validation and Output Filtering
Chapter 6 —Retrieval-Augmented Generation Risks

Part III — Model Robustness and Reliability

Chapter 7 —Hallucinations and Reliability
Chapter 8 —Adversarial Attacks on Models
Chapter 9 —Model Integrity and Supply Chain Risks

Part IV — System-Level Security Architecture

Chapter 10 —Designing Secure LLM Architectures
Chapter 11 —Observability, Logging, and Incident Response
Chapter 12 —Access Control and Identity

Part V — Governance, Ethics, and Compliance

Chapter 13 —Regulatory Landscape
Chapter 14 —Bias, Fairness, and Responsible AI
Chapter 15 —Building a Secure AI Organization

Part VI — Advanced Topics

Chapter 16 —Secure Fine-Tuning and Adaptation
Chapter 17 —Future Threats and Emerging Defenses

Appendices (book only)

A —AI Security Checklist for Production Systems
B —Sample Threat Model Template
C —Secure Prompt Design Patterns
D —Incident Response Template for LLM Applications
E —Recommended Tools and Frameworks

How this page grows

This page will be updated as each volume of the series is published, and as walkthrough articles for each chapter go live. Volumes III through VII each have their full tables of contents above; the walkthrough articles for those chapters will be added as they are written.

Bookmark this page if you want to follow the series as it unfolds. Or subscribe to the channel feed to get each new post the day it lands.


Start with Volume I. Twelve chapters, fully revised for 2026, with diagrams, plain-English sidebars, code examples, and a complete treatment of how generative AI actually works. Grab LLM Primer I on Amazon →
Then go deeper with Volume II. The mathematics underneath the machinery — every equation derived, every idea wrapped in a story, with worked examples, exercises with solutions, a math cheat sheet, and a full glossary. Grab LLM Primer II on Amazon →

SHO
SHO
CTO of Receipt Roller Inc., he builds innovative AI solutions and writes to make large language models more understandable, sharing both practical uses and behind-the-scenes insights.