Why do RAG chatbots fail in production?

Standard RAG chatbots and single-prompt ReAct agents rely on prompt engineering to control behavior, which can't guarantee deterministic paths. As soon as a workflow spans several API dependencies, the model tends to hallucinate, drop execution context, or fall into infinite tool-calling loops. Production workflows — adjusting an order, pulling ERP stock — need hard state-machine constraints, not best-effort conversation.

What is a LangGraph StateGraph?

A LangGraph StateGraph models an agent as a finite state machine: each node is a discrete step (often a single LLM call), and the edges between nodes are governed by your code based on the current validated state. Instead of letting the model freely choose its next action, conditional edges route execution along predefined, testable paths.

What is the Model Context Protocol (MCP)?

MCP is an open protocol that standardizes how an AI agent connects to external tools and data. It separates the agent's core logic from external services: instead of writing bespoke integration code per tool, the agent talks to MCP servers that expose tools with a declared JSON input schema. The agent only handles standard payloads and never needs to know the internals of your SAP, CRM, or database.

How do you stop an AI agent from hallucinating actions?

Constrain it structurally. Bind the model to a Pydantic (or equivalent) output schema so it must return parseable, validated state rather than free-form text, gate every state transition on that validation, and provide a human-in-the-loop escalation path for when validation fails. The model fills in nodes; it never decides the control flow on its own.

How do you control AWS costs for serverless AI agents?

Run the agent on AWS Lambda with provisioned concurrency only for latency-sensitive, user-facing paths; persist conversational state in DynamoDB instead of holding it in expensive in-memory sessions; and validate JSON schemas at API Gateway so malformed requests are rejected before they ever invoke a Lambda or spend model tokens.

LangGraph vs Pydantic AI — which one should we use?

They solve different layers of the problem. Pydantic AI is an agent framework centred on typed, validated model input/output for a single agent; LangGraph is a graph orchestrator for multi-step control flow with branches, cycles and human-in-the-loop checkpoints. For a single agent calling a few tools with strong typing, Pydantic AI alone can be enough. For workflows with conditional routing, retries and escalation paths, LangGraph governs the control flow — with Pydantic models validating the state inside each node. In practice we combine them rather than choose one.

Deterministic AI agents: LangGraph + MCP on AWS

Key takeaways

Single-prompt ReAct agents and RAG chatbots stall in production because prompt engineering alone can't guarantee deterministic execution paths.
Model the agent as a LangGraph StateGraph: every LLM call is a node, and code-governed edges — not the model — decide the next step.
Validate state with Pydantic at each transition, and route parsing failures to a human-in-the-loop escalation instead of letting the model invent an action.
Decouple tools with the Model Context Protocol (MCP) and run the agent on AWS Lambda + DynamoDB for cheap, scalable, schema-checked execution.

A production AI agent is not a chatbot with tools bolted on. A chatbot answers; an agent acts — and acting against real ERP, CRM, and payment systems demands the same predictability you’d expect from any other piece of backend software.

The short answer: standard chatbot pilots fail because prompt engineering alone can’t guarantee deterministic paths. Without a hard state machine and structured input/output validation, a conversational LLM will eventually hallucinate, drop execution context, or fall into an infinite tool-calling loop on any workflow with more than a couple of dependencies. The fix isn’t a better prompt — it’s constraining execution with LangGraph state machines and standardizing tool access with the Model Context Protocol (MCP).

Why do conversational AI pilots fail to reach production?

When building AI for DACH mid-market enterprises, consistency is non-negotiable. If an agent adjusts a customer order or pulls ERP inventory, it cannot operate on a “best-effort” conversational basis. This is the same wall that stops most pilots short of production — we wrote about the operational side of that in why AI pilots fail before production.

Traditional single-prompt agents use a loop of Thought → Action → Observation (the ReAct framework). That’s fine for simple Q&A, but it degrades fast once multiple API dependencies exist: nothing structurally prevents the model from looping, skipping a step, or fabricating a tool call. To make agents reliable, we treat their execution as a finite state machine — transitions between nodes are governed by code, not by prompt instructions.

Single-prompt ReAct

The model decides its own next action each turn. Works for Q&A; on multi-step workflows it loops, drops context, or invents tool calls. Hard to test, harder to guarantee.

LangGraph state machine

Each LLM call is a node; code-governed edges decide the next step from validated state. Deterministic, testable, and safe to run read/write commands against enterprise APIs.

The deterministic agent architecture on AWS

For low latency and scalable infrastructure, we host agents on AWS Serverless — AWS Lambda for execution, Amazon DynamoDB for state persistence — and decouple tool execution behind the open-source Model Context Protocol.

Deterministic state-chart workflow

Code-governed edges decide the next step — the model fills nodes, it doesn't choose the path.

Every LLM call maps to a node. Transitions are decided by validated state in code, not by the model inventing its next action — so a parsing failure routes to a human instead of a hallucinated write.

Organized as a state-chart, every LLM call corresponds to a specific node in the graph. The edges between nodes determine what happens next based on the structured state data, rather than letting the model dynamically invent its next action.

Building the state-chart with LangGraph

Here’s a state-validated supervisor node in Python. The pattern guarantees the agent can only transition to the next state if the current state passes schema validation.

First, define the shared state with Pydantic:

from typing import TypedDict, Annotated, Sequence, Literal
from pydantic import BaseModel, Field
from langchain_core.messages import BaseMessage

# Schema-level tool requirements
class TransactionVerification(BaseModel):
    transaction_id: str = Field(description="Must match exact format TXN-XXXXXX")
    requires_refund: bool = Field(default=False)
    audit_notes: str

# Execution graph state
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], "append"]
    verification_payload: TransactionVerification
    next_step: Literal["verify_payment", "escalate_support", "end"]

Then define the validation node and wire up the routing:

from langgraph.graph import StateGraph, END

def supervisor_node(state: AgentState) -> dict:
    """Evaluate the request and pick the path from validated schema."""
    last_message = state["messages"][-1].content

    # Structured output binds the model to the Pydantic schema, so it
    # must return parseable state instead of conversational filler.
    try:
        structured_output = llm.with_structured_output(
            TransactionVerification
        ).invoke(last_message)

        return {
            "verification_payload": structured_output,
            "next_step": "verify_payment" if structured_output.requires_refund else "end",
        }
    except Exception:
        # Schema parsing failed — fall back to a safe human escalation path.
        return {"next_step": "escalate_support"}

def route_to_node(state: AgentState) -> str:
    return state["next_step"]

workflow = StateGraph(AgentState)
workflow.add_node("supervisor", supervisor_node)
# Add other operational nodes here...

workflow.set_entry_point("supervisor")
workflow.add_conditional_edges(
    "supervisor",
    route_to_node,
    {
        "verify_payment": "payment_execution_node",
        "escalate_support": "escalation_node",
        "end": END,
    },
)

app = workflow.compile()

State machines vs DAGs: when LangGraph is the right tool

A frequent architecture question at this point: if we want deterministic execution, why not just use a DAG — Step Functions, Airflow, a plain pipeline? The answer is cycles.

A DAG is the right shape when work flows strictly forward: extract, transform, load, done. Agent workflows are rarely that shape. A verification fails and the agent needs to re-ask the user; a tool call times out and needs a bounded retry; a human reviewer sends the case back one step. Each of those is a loop — legal in a state machine, impossible in an acyclic graph without contortions.

That’s the actual division of labour, and it cuts both ways:

Straight pipeline, no mid-run decisions by a model? Use a plain DAG or Step Functions. An agent framework adds surface area without adding value — this is the most common over-engineering we see in pilots.
Conditional routing, bounded retries, clarification loops, human-in-the-loop checkpoints? That’s a state machine. LangGraph’s StateGraph gives you exactly that: nodes as discrete steps, typed shared state, and cyclic edges whose conditions live in your code — so every loop has an explicit exit and a testable bound.

The discipline that makes cycles safe is the same one that makes the whole agent safe: the model fills in state, code decides transitions, and every loop carries a counter the state schema enforces.

LangGraph vs Pydantic AI: which layer does what?

The second recurring question: LangGraph or Pydantic AI? Framed as a versus, it’s mostly a category error — they sit at different layers.

Pydantic AI is an agent framework built around one idea we agree with completely: every model interaction should have typed, validated input and output. It gives you a clean harness for a single agent — tools, dependencies, retries on validation failure — with the schema discipline built in rather than bolted on.

LangGraph is a graph orchestrator. Its unit isn’t the model call — it’s the workflow: which step runs next, under which validated condition, with which shared state, and where a human checkpoint interrupts the run.

The practical decision rule we use:

One agent, a handful of tools, request-response shape → Pydantic AI on its own is a lean, defensible choice.
Multi-step workflows with branching, cycles, escalation and resumable state → LangGraph for the control flow, with Pydantic models validating state at every transition — exactly the pattern in the code above.

The two compose: nothing stops a LangGraph node from hosting a Pydantic AI agent internally. What matters is that the outer control flow stays in code — whichever harness fills the nodes.

Standardizing integration with MCP

To keep agents from being tightly coupled to specific database schemas or API integrations, we use the Model Context Protocol. MCP is a universal bridge that separates the agent core (the brain) from external services (the hands). Instead of bespoke integration code per tool, the agent connects to MCP servers that expose tools with a declared input schema:

{
  "name": "fetch_erp_inventory",
  "description": "Pulls current physical stock counts from the SAP ERP system",
  "inputSchema": {
    "type": "object",
    "properties": {
      "sku": {
        "type": "string",
        "pattern": "^SKU-[0-9]{4}$"
      }
    },
    "required": ["sku"]
  }
}

By decoupling these steps, the agent layer on AWS Lambda doesn’t need to know the inner workings of your SAP or CRM — it only handles the standard JSON payloads defined by the MCP tool contract. Swap the backing system, keep the contract, and the agent is unchanged.

Deployment and cost control on AWS Serverless

To minimize cold starts and optimize cost, we configure the agent on AWS Lambda along three axes:

01
Provisioned concurrency
Keep 1–2 execution instances warm for latency-sensitive, user-facing paths so interactive sessions never eat a cold start. Leave background paths on-demand to stay cheap.
02
State persistence in DynamoDB
Persist conversational state and session memory in DynamoDB rather than holding expensive in-memory state. Long-running memory becomes a cheap key lookup, and any Lambda instance can resume a session.
03
Schema validation at the edge
Reject malformed JSON at AWS API Gateway before it invokes a Lambda. Invalid requests never spend downstream compute or model tokens — the cheapest request is the one you never run.

Engineering reliable agent platforms

With LangGraph’s deterministic routing and MCP’s structured tools, you move past fragile proof-of-concept chatbots to agents that are predictable, safe, and maintainable — operations you can put in front of a customer or an ERP write API without holding your breath.

We design and ship custom, production-grade AI agent systems in under 90 days, with EU AI Act alignment built in from day one rather than bolted on before launch.

Explore applied AI & custom solutions →Structure my AI integration

// SOURCES

LangGraph documentation — LangChain, 2025
Model Context Protocol — Introduction — Anthropic, 2025
Configuring provisioned concurrency for Lambda — Amazon Web Services, 2025
Pydantic — Models — Pydantic, 2025

Frequently asked questions

Why do RAG chatbots fail in production?
Standard RAG chatbots and single-prompt ReAct agents rely on prompt engineering to control behavior, which can't guarantee deterministic paths. As soon as a workflow spans several API dependencies, the model tends to hallucinate, drop execution context, or fall into infinite tool-calling loops. Production workflows — adjusting an order, pulling ERP stock — need hard state-machine constraints, not best-effort conversation.
What is a LangGraph StateGraph?
A LangGraph StateGraph models an agent as a finite state machine: each node is a discrete step (often a single LLM call), and the edges between nodes are governed by your code based on the current validated state. Instead of letting the model freely choose its next action, conditional edges route execution along predefined, testable paths.
What is the Model Context Protocol (MCP)?
MCP is an open protocol that standardizes how an AI agent connects to external tools and data. It separates the agent's core logic from external services: instead of writing bespoke integration code per tool, the agent talks to MCP servers that expose tools with a declared JSON input schema. The agent only handles standard payloads and never needs to know the internals of your SAP, CRM, or database.
How do you stop an AI agent from hallucinating actions?
Constrain it structurally. Bind the model to a Pydantic (or equivalent) output schema so it must return parseable, validated state rather than free-form text, gate every state transition on that validation, and provide a human-in-the-loop escalation path for when validation fails. The model fills in nodes; it never decides the control flow on its own.
How do you control AWS costs for serverless AI agents?
Run the agent on AWS Lambda with provisioned concurrency only for latency-sensitive, user-facing paths; persist conversational state in DynamoDB instead of holding it in expensive in-memory sessions; and validate JSON schemas at API Gateway so malformed requests are rejected before they ever invoke a Lambda or spend model tokens.
LangGraph vs Pydantic AI — which one should we use?
They solve different layers of the problem. Pydantic AI is an agent framework centred on typed, validated model input/output for a single agent; LangGraph is a graph orchestrator for multi-step control flow with branches, cycles and human-in-the-loop checkpoints. For a single agent calling a few tools with strong typing, Pydantic AI alone can be enough. For workflows with conditional routing, retries and escalation paths, LangGraph governs the control flow — with Pydantic models validating the state inside each node. In practice we combine them rather than choose one.

// share

linkedin email

Moving past chatbots: deterministic, schema-validated AI agents