Skip to main content

#ai

Moving past chatbots: deterministic, schema-validated AI agents

Why RAG chatbots stall in production, and how to build deterministic, schema-validated AI agents on AWS with LangGraph state machines and MCP.

Published
2026-06-10
Read
6 min

Key takeaways

  1. Single-prompt ReAct agents and RAG chatbots stall in production because prompt engineering alone can't guarantee deterministic execution paths.
  2. Model the agent as a LangGraph StateGraph: every LLM call is a node, and code-governed edges — not the model — decide the next step.
  3. Validate state with Pydantic at each transition, and route parsing failures to a human-in-the-loop escalation instead of letting the model invent an action.
  4. Decouple tools with the Model Context Protocol (MCP) and run the agent on AWS Lambda + DynamoDB for cheap, scalable, schema-checked execution.

A production AI agent is not a chatbot with tools bolted on. A chatbot answers; an agent acts — and acting against real ERP, CRM, and payment systems demands the same predictability you’d expect from any other piece of backend software.

The short answer: standard chatbot pilots fail because prompt engineering alone can’t guarantee deterministic paths. Without a hard state machine and structured input/output validation, a conversational LLM will eventually hallucinate, drop execution context, or fall into an infinite tool-calling loop on any workflow with more than a couple of dependencies. The fix isn’t a better prompt — it’s constraining execution with LangGraph state machines and standardizing tool access with the Model Context Protocol (MCP).

Why do conversational AI pilots fail to reach production?

When building AI for DACH mid-market enterprises, consistency is non-negotiable. If an agent adjusts a customer order or pulls ERP inventory, it cannot operate on a “best-effort” conversational basis. This is the same wall that stops most pilots short of production — we wrote about the operational side of that in why AI pilots fail before production.

Traditional single-prompt agents use a loop of Thought → Action → Observation (the ReAct framework). That’s fine for simple Q&A, but it degrades fast once multiple API dependencies exist: nothing structurally prevents the model from looping, skipping a step, or fabricating a tool call. To make agents reliable, we treat their execution as a finite state machine — transitions between nodes are governed by code, not by prompt instructions.

Single-prompt ReAct

The model decides its own next action each turn. Works for Q&A; on multi-step workflows it loops, drops context, or invents tool calls. Hard to test, harder to guarantee.

LangGraph state machine

Each LLM call is a node; code-governed edges decide the next step from validated state. Deterministic, testable, and safe to run read/write commands against enterprise APIs.

The deterministic agent architecture on AWS

For low latency and scalable infrastructure, we host agents on AWS Serverless — AWS Lambda for execution, Amazon DynamoDB for state persistence — and decouple tool execution behind the open-source Model Context Protocol.

Deterministic state-chart workflow

Code-governed edges decide the next step — the model fills nodes, it doesn't choose the path.

A deterministic AI agent state-chart with LangGraph and MCP Flow chart. A user prompt enters a LangGraph supervisor node that validates state against a Pydantic schema. On success, a solid arrow leads to an MCP tool server, which loops back to the supervisor with a state update until the run resolves. On schema-validation failure, a dashed arrow routes down to a human-in-the-loop escalation node. User prompt Ingress LangGraph supervisor Pydantic-validated state MCP tool server Schema'd tool calls Escalation node Human-in-the-loop State update loop Validation failure
Every LLM call maps to a node. Transitions are decided by validated state in code, not by the model inventing its next action — so a parsing failure routes to a human instead of a hallucinated write.

Organized as a state-chart, every LLM call corresponds to a specific node in the graph. The edges between nodes determine what happens next based on the structured state data, rather than letting the model dynamically invent its next action.

Building the state-chart with LangGraph

Here’s a state-validated supervisor node in Python. The pattern guarantees the agent can only transition to the next state if the current state passes schema validation.

First, define the shared state with Pydantic:

from typing import TypedDict, Annotated, Sequence, Literal
from pydantic import BaseModel, Field
from langchain_core.messages import BaseMessage

# Schema-level tool requirements
class TransactionVerification(BaseModel):
    transaction_id: str = Field(description="Must match exact format TXN-XXXXXX")
    requires_refund: bool = Field(default=False)
    audit_notes: str

# Execution graph state
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], "append"]
    verification_payload: TransactionVerification
    next_step: Literal["verify_payment", "escalate_support", "end"]

Then define the validation node and wire up the routing:

from langgraph.graph import StateGraph, END

def supervisor_node(state: AgentState) -> dict:
    """Evaluate the request and pick the path from validated schema."""
    last_message = state["messages"][-1].content

    # Structured output binds the model to the Pydantic schema, so it
    # must return parseable state instead of conversational filler.
    try:
        structured_output = llm.with_structured_output(
            TransactionVerification
        ).invoke(last_message)

        return {
            "verification_payload": structured_output,
            "next_step": "verify_payment" if structured_output.requires_refund else "end",
        }
    except Exception:
        # Schema parsing failed — fall back to a safe human escalation path.
        return {"next_step": "escalate_support"}

def route_to_node(state: AgentState) -> str:
    return state["next_step"]

workflow = StateGraph(AgentState)
workflow.add_node("supervisor", supervisor_node)
# Add other operational nodes here...

workflow.set_entry_point("supervisor")
workflow.add_conditional_edges(
    "supervisor",
    route_to_node,
    {
        "verify_payment": "payment_execution_node",
        "escalate_support": "escalation_node",
        "end": END,
    },
)

app = workflow.compile()

Standardizing integration with MCP

To keep agents from being tightly coupled to specific database schemas or API integrations, we use the Model Context Protocol. MCP is a universal bridge that separates the agent core (the brain) from external services (the hands). Instead of bespoke integration code per tool, the agent connects to MCP servers that expose tools with a declared input schema:

{
  "name": "fetch_erp_inventory",
  "description": "Pulls current physical stock counts from the SAP ERP system",
  "inputSchema": {
    "type": "object",
    "properties": {
      "sku": {
        "type": "string",
        "pattern": "^SKU-[0-9]{4}$"
      }
    },
    "required": ["sku"]
  }
}

By decoupling these steps, the agent layer on AWS Lambda doesn’t need to know the inner workings of your SAP or CRM — it only handles the standard JSON payloads defined by the MCP tool contract. Swap the backing system, keep the contract, and the agent is unchanged.

Deployment and cost control on AWS Serverless

To minimize cold starts and optimize cost, we configure the agent on AWS Lambda along three axes:

  1. 01

    Provisioned concurrency

    Keep 1–2 execution instances warm for latency-sensitive, user-facing paths so interactive sessions never eat a cold start. Leave background paths on-demand to stay cheap.

  2. 02

    State persistence in DynamoDB

    Persist conversational state and session memory in DynamoDB rather than holding expensive in-memory state. Long-running memory becomes a cheap key lookup, and any Lambda instance can resume a session.

  3. 03

    Schema validation at the edge

    Reject malformed JSON at AWS API Gateway before it invokes a Lambda. Invalid requests never spend downstream compute or model tokens — the cheapest request is the one you never run.

Engineering reliable agent platforms

With LangGraph’s deterministic routing and MCP’s structured tools, you move past fragile proof-of-concept chatbots to agents that are predictable, safe, and maintainable — operations you can put in front of a customer or an ERP write API without holding your breath.

We design and ship custom, production-grade AI agent systems in under 90 days, with EU AI Act alignment built in from day one rather than bolted on before launch.

// SOURCES

  1. LangGraph documentation — LangChain, 2025
  2. Model Context Protocol — Introduction — Anthropic, 2025
  3. Configuring provisioned concurrency for Lambda — Amazon Web Services, 2025
  4. Pydantic — Models — Pydantic, 2025

Frequently asked questions

  • Why do RAG chatbots fail in production?
    Standard RAG chatbots and single-prompt ReAct agents rely on prompt engineering to control behavior, which can't guarantee deterministic paths. As soon as a workflow spans several API dependencies, the model tends to hallucinate, drop execution context, or fall into infinite tool-calling loops. Production workflows — adjusting an order, pulling ERP stock — need hard state-machine constraints, not best-effort conversation.
  • What is a LangGraph StateGraph?
    A LangGraph StateGraph models an agent as a finite state machine: each node is a discrete step (often a single LLM call), and the edges between nodes are governed by your code based on the current validated state. Instead of letting the model freely choose its next action, conditional edges route execution along predefined, testable paths.
  • What is the Model Context Protocol (MCP)?
    MCP is an open protocol that standardizes how an AI agent connects to external tools and data. It separates the agent's core logic from external services: instead of writing bespoke integration code per tool, the agent talks to MCP servers that expose tools with a declared JSON input schema. The agent only handles standard payloads and never needs to know the internals of your SAP, CRM, or database.
  • How do you stop an AI agent from hallucinating actions?
    Constrain it structurally. Bind the model to a Pydantic (or equivalent) output schema so it must return parseable, validated state rather than free-form text, gate every state transition on that validation, and provide a human-in-the-loop escalation path for when validation fails. The model fills in nodes; it never decides the control flow on its own.
  • How do you control AWS costs for serverless AI agents?
    Run the agent on AWS Lambda with provisioned concurrency only for latency-sensitive, user-facing paths; persist conversational state in DynamoDB instead of holding it in expensive in-memory sessions; and validate JSON schemas at API Gateway so malformed requests are rejected before they ever invoke a Lambda or spend model tokens.

Was this helpful?

// share

linkedin email

want this in your inbox?

one short note per month. only when we have something worth reading.

 subscribe-via-rss

// or write: hello@saloid.com · gräfelfing · de