Introduction to LLM

This page provides an easy-to-understand guide on LLMs (Large Language Models) from basics to applications for AI enthusiasts.

Total of 8 articles available. | Currently on page 1 of 1.

Chapter 9 — Speculative Decoding

Ninth post of the LLM Primer VI walkthrough. The draft-verify paradigm — EAGLE, Medusa, MTP, Lookahead, N-gram — and the verification bottleneck that decides real speedup.

2026-05-01

Chapter 6 — Pruning and Knowledge Distillation

Sixth post of the LLM Primer VI walkthrough. Structured vs unstructured pruning, 2:4 sparsity on Hopper, and the distillation lineage from soft probabilities to Patient Knowledge Distillation and MiniLLM.

2026-04-28

Chapter 5 — Demystifying Quantization

Fifth post of the LLM Primer VI walkthrough. From BF16 to INT4 to Blackwell FP4 — quantization algorithms (AWQ, GPTQ, GGUF, SmoothQuant), NVIDIA ModelOpt, and when quantization is safe versus lossy.

2026-04-27

LLM Primer VI — Series Introduction & Index

Kicking off the chapter-by-chapter walkthrough of Book VI in the LLM Primer series — Scaling AI Systems. Why inference is the discipline that decides whether an LLM app survives real users, and the schedule for the sixteen posts that follow, April 23 through May 8.

2026-04-22

Chapter 9 — Performance, Scaling, and Costs: The Real Engineering Trade-offs

Chapter 9 of the LLM Primer I series. The operational realities of running LLMs at scale — model size vs capability, the latency–throughput trade-off, cost economics, quantization, and edge deployment. Why frontier-tier models are often the wrong choice even when you can afford them.

2026-02-26