Introduction to LLM

This page provides an easy-to-understand guide on LLMs (Large Language Models) from basics to applications for AI enthusiasts.


Total of 16 articles available. | Currently on page 1 of 1.

Chapter 10 — Post-Training and Alignment Mathematics

Tenth post of the LLM Primer II walkthrough. The mathematics that civilizes a brilliant but feral next-word predictor into a helpful assistant — supervised fine-tuning, reward modeling, RLHF on a KL leash, and the elegant DPO derivation that collapses the whole pipeline into a single supervised loss.

2026-03-12

Chapter 9 — Training at Scale

Ninth post of the LLM Primer II walkthrough. How data preprocessing quietly shapes everything that follows, the mathematics of mini-batch learning and parallelism, and the surprisingly subtle question of how to keep a training run numerically stable across thousands of GPUs.

2026-03-11

Chapter 8 — How Models Learn

Eighth post of the LLM Primer II walkthrough. Why over-parameterized models generalize at all, the implicit bias of gradient-based optimization, the empirical scaling laws that forecast capability before training, and the open mathematical questions that still surround LLM theory.

2026-03-10

Chapter 3 — Mathematical Tools for Language Models

Third post of the LLM Primer II walkthrough. The probability and statistics you actually need for language modeling, the slice of linear algebra that matters, and embeddings as the first place those two tools meet inside an LLM.

2026-03-05

Chapter 1 — Mathematical Intuition for Language Models

First post of the LLM Primer II walkthrough. Mathematical notation without intimidation, probability for language generation explained from scratch, and entropy as a way to measure uncertainty — the trio that makes the rest of the book readable.

2026-03-03

LLM Primer II — Language Models Through Mathematics: Series Introduction & Index

Kicking off the chapter-by-chapter walkthrough of Book II in the LLM Primer series — Language Models Through Mathematics. How the book is organized, what each chapter delivers, and the schedule for the fourteen posts that follow, March 3 through March 16.

2026-03-02

Chapter 9 — Performance, Scaling, and Costs: The Real Engineering Trade-offs

Chapter 9 of the LLM Primer I series. The operational realities of running LLMs at scale — model size vs capability, the latency–throughput trade-off, cost economics, quantization, and edge deployment. Why frontier-tier models are often the wrong choice even when you can afford them.

2026-02-26

Chapter 5 — Training Large Models: What Actually Goes Into a Frontier Model

Chapter 5 of the LLM Primer I series. How frontier LLMs are actually trained — the data pipeline, the loss function, the months of GPU time, and why "training" is now an industrial-scale engineering problem more than a research problem. Demystifies what those hundred-million-dollar training runs are paying for.

2026-02-22

Chapter 4 — The Transformer Architecture: Inside the Engine of Modern AI

Chapter 4 of the LLM Primer I series. A tour of the Transformer block — how self-attention, positional encoding, and stacked layers combine to produce the architecture every modern LLM is built on. Includes a clear explanation of why scaling Transformers works, and what it costs.

2026-02-21

A Chapter-by-Chapter Walkthrough of LLM Primer I — Series Introduction & Index

Introduction and index for the twelve-part chapter-by-chapter walkthrough of LLM Primer I: How Generative AI Works. One post per day, Feb 18 through March 1, 2026. Read them in order or pick the chapter that matters most to you. All twelve are listed and linked here.

2026-02-17

The LLM Primer Series — A Field Guide to Generative AI, Built One Volume at a Time

The LLM Primer Series — a seven-volume field guide to generative AI by Sho Shimoda. Each volume covers a different layer of working with large language models, from foundations to scaling to security. This is the landing page: an overview of the whole series, plus the live chapter-by-chapter walkthrough of the first volume.

2026-02-15

1.3 Entropy and Information: Quantifying Uncertainty

A clear, intuitive exploration of entropy, information, and uncertainty in Large Language Models. Learn how information theory shapes next-token prediction, why entropy matters for creativity and coherence, and how cross-entropy connects probability to learning. This section concludes Chapter 1 and prepares readers for the conceptual foundations in Chapter 2.

2025-09-06

Understanding LLMs – A Mathematical Approach to the Engine Behind AI

A preview from Chapter 7.4: Discover why large language models inherit bias, the real-world risks, strategies for mitigation, and the growing role of AI governance.

2025-09-01

3.2 LLM Training Steps: Forward Propagation, Backward Propagation, and Optimization

Explore the key steps in training Large Language Models (LLMs), including initialization, forward propagation, loss calculation, backward propagation, and hyperparameter tuning. Learn how these processes help optimize model performance.

2024-09-13

3.0 How to Train Large Language Models (LLMs): Data Preparation, Steps, and Fine-Tuning

Learn the key techniques for training Large Language Models (LLMs), including data preprocessing, forward and backward propagation, fine-tuning, and transfer learning. Optimize your model’s performance with efficient training methods.

2024-09-11

2.2 Understanding the Attention Mechanism in Large Language Models (LLMs)

Learn about the core attention mechanism that powers Large Language Models (LLMs). Discover the concepts of self-attention, scaled dot-product attention, and multi-head attention, and how they contribute to NLP tasks.

2024-09-09