Chapter 13 — Frameworks and Cloud Integration

Fourteenth post of the chapter-by-chapter walkthrough of LLM Primer IV: Designing AI Cognition with MCP. In which nobody builds production MCP from raw protocol, the honest question of 2025–2026 becomes which framework to standardize on, and the answer turns out to depend less on features than on which cloud the rest of the system lives in.

Why this chapter exists

By the time a team has wired up authentication, transport, session state, error retry, structured logging, and the dozen smaller details that separate a demo from an operable service, what began as "we'll just speak MCP over HTTP" has become a small framework of its own — usually a worse one than the frameworks that already exist. The engineering question is which of the public frameworks to standardize on, what each one gets right, and how they connect to the cloud services that hold the state. This chapter walks the landscape methodically: Strands with Amazon Bedrock; the AWS services that surround it for state and retrieval; Microsoft's Agent Framework, LangChain, and Semantic Kernel as the other production options; and the integration patterns the reference architectures have converged on. The aim is not to crown a winner but to describe what each framework is actually for.

One line: framework choice in 2026 is mostly a function of which cloud the rest of the system lives in, because MCP travels between them — and that portability is precisely why standardizing on MCP at the tool boundary was the point of the protocol.

13.1 Strands Agents and Amazon Bedrock

Strands is the open-source agent framework AWS released in 2025 and now runs inside Amazon Q, AWS Support, and the AWS Glue assistant. The framing is deliberate: Strands is model-driven, which means the loop that decides what to do next is the model's own tool-calling loop, not a planner-graph the framework imposes on top. The framework's job is to make that loop reliable in production — to handle tool invocations, manage session state, plug in MCP servers as first-class tool sources through MCPClient, and route everything through Bedrock's hosted model catalog. The model layer is pluggable — Anthropic's API, OpenAI, Ollama, LiteLLM — but Bedrock is the default and the audit story.

The multi-agent story is where Strands earns its production reputation. Three composition patterns map cleanly onto the orchestration shapes from earlier chapters: Agents-as-Tools (the simplest shape, an agent wrapped as a tool another agent can call); Swarm (peer-to-peer with shared working memory); and Graph (deterministic topology where the model fills in each node). The patterns nest in production. The operational tax is paid in observability, and Strands pays it by emitting OpenTelemetry spans for every agent invocation, every tool call, every model call — so a four-level composition reconstructs from logs without per-level instrumentation. Bedrock adds the access-boundary story: IAM controls which models a given agent can invoke; CloudTrail logs the use. Bedrock Guardrails handle content filtering at the gateway, Knowledge Bases give managed retrieval, and AgentCore — AWS's 2025 primitive set — formalizes memory, identity, and runtime concerns for teams that want a fully managed runtime rather than a self-hosted one.

13.2 AWS as the state layer

An agent that runs for hours and remembers what happened yesterday needs storage that outlives the process. The pattern that has settled into production: runtime state (current session, task ledger, in-flight tool calls) in DynamoDB for fast keyed access; artifact state (generated documents, intermediate work) in S3 with a session-prefixed key structure that gives both natural IAM boundaries and free versioning; semantic state (long-term memory, prior conversations) in a vector store — OpenSearch Service for hot working memory, the newer S3 Vectors for cold or archival memory. The two-tier separation between S3 and DynamoDB is not premature optimization. It keeps the DynamoDB item under 400 KB, avoids read-amplification on every step, and lets each layer scale independently.

The choice of state layer determines the failure mode of the entire system. An agent whose state is only in process memory loses everything on restart. An agent whose memory is durable but whose index is out of sync recalls references to documents it can no longer find. Production deployments treat these as separate consistency contracts: DynamoDB transactions for state atomicity, S3 strong-read-after-write for the artifact contract, an indexing pipeline that publishes documents to S3 before upserting their vectors so retrieval cannot point at something that does not yet exist. The security boundary deserves the same care. AWS's own Strands guidance recommends per-session credentials via STS rather than reusing the runtime role for end-user-data tool calls — AgentCore Identity automates this — so that the AWS-level identity behind a destructive action is the actual end user, not a shared agent role. In regulated environments that is the only acceptable answer.

13.3 Microsoft Agent Framework, LangChain, and Semantic Kernel

The Microsoft and open-source side of the landscape settled differently. The Microsoft Agent Framework arrived in late 2025 as the explicit merger of Semantic Kernel and AutoGen — SK's plugin model and .NET integration with AutoGen's multi-agent patterns and Python-first DX. MCP integration is built in through MCPStdioTool and MCPStreamableHTTPTool; Azure AI Foundry is the hosted home, equivalent to Bedrock in the AWS world. The distinguishing feature versus Strands is that the conversation graph between agents is an explicit, replayable object — which matters enormously for evaluation, because a failed conversation can be rerun with a modified prompt and compared turn by turn. The cost is more structure than Strands imposes; the right trade in mature deployments, the wrong trade in exploratory work.

LangChain in 2026 is a different animal from LangChain in 2023. The original chains abstraction has become secondary; the primary surface is LangGraph for orchestration and LangSmith for observability and evaluation. The strengths are breadth — every model, database, and tool seems to have an adapter — and LangSmith's evaluation maturity. The weakness practitioners name honestly is surface area: the framework's high-level abstractions are excellent for the first ninety percent of the work, and the last ten percent is usually done by peeling back layers until the team understands what each one is doing. Teams that plan for this transition ship faster than teams that fight the abstractions. Semantic Kernel remains the framework for .NET teams; the [KernelFunction] plugin model fits .NET service hosts cleanly. A pattern that has emerged across all three: thin agent, thick MCP — capabilities live behind the protocol boundary, frameworks become thin proxies, and the same MCP server gets consumed by Strands on AWS, Microsoft Agent Framework on Azure, and LangChain on a developer's laptop with no porting work. That is the trajectory the protocol was designed to enable.

13.4 The production integration patterns

Three patterns have settled into the 2025–2026 reference architectures. The gateway-and-state-layer pattern puts a managed model gateway (Bedrock, AI Foundry, LiteLLM) in front of the model layer and a separate durable state layer behind. The gateway is where auth, rate limiting, content filtering, and audit live; the state layer is where durability lives. This pattern is the production default; new deployments should adopt it unless they have a specific reason not to, because teams that skip the gateway and call the model provider directly regret it within a quarter. The MCP service mesh pattern treats MCP servers as microservices and exposes them through a mesh that handles mTLS, retries, and circuit breaking — worth the cost only at scale (more than ten servers) or in regulated environments that require network-layer attestation. The managed-everything pattern hands the runtime, memory, identity, and tool registry to a cloud-managed agent service (Bedrock AgentCore, Azure AI Foundry Agent Service, Vertex AI Agent Builder); it wins for internal automation and copilots where the agent is not the primary product, and loses for teams where the agent is the product and the operational levers matter. Two patterns that did not win, and are worth naming as negative examples: everything-in-the-LLM-context (agents whose memory is an ever-growing system prompt) and framework-as-orchestrator (a rigid DAG with the model filling in node parameters). Both die in production for the reasons Chapter 9 walked through, and the lesson the field absorbed is that the right balance between framework structure and model freedom shifts further toward the model as the model gets more capable.

Worth holding onto: the framework decision tree is shorter than it looks. On AWS, Strands plus Bedrock is the default. On Azure, Microsoft Agent Framework plus AI Foundry is the equivalent. Multi-cloud, LangChain with the caveat that production peels back its abstractions. None of these choices are permanent, because the MCP layer underneath travels between them. That is the actual point of standardizing on MCP at the tool boundary — the frameworks become host hardware, the tools become USB-C devices, and the team can change either side without rewriting the other.

What Chapter 13 sets up

The frameworks and cloud services let teams ship MCP-based agents without rewriting the protocol stack each time. What they do not, by themselves, tell teams is whether the resulting system actually works — whether the agent solves the tasks it was built for, how it behaves under load, where the performance cliffs are, and how to compare two competing architectures honestly. Chapter 14 takes that question head-on: it walks the MCP-Universe Benchmark, the two systemic failure modes the benchmark uncovered (long-context degradation and unknown-tool exploration), and the throughput side where the gap between a shared session pool and the naive session-per-request pattern is roughly an order of magnitude.

Next — Chapter 14: Benchmarking, Testing, and Performance. MCP-Universe on real servers, the long-context and unknown-tools mitigations that work, the ten-times throughput gap between session-per-request and shared session pools, and where the series goes next.

Want the full picture? The book walks the Strands minimal viable agent and its multi-agent composition patterns with worked code, treats the AWS state-layer consistency contracts in operational detail, gives the honest framework decision tree across AWS, Azure, and multi-cloud, and explains why two specific patterns died in production. View LLM Primer IV on Amazon →