Skip to main content

# BLUEPRINT-AWS-AGENTS

Agentic workflows on AWS

Planner/supervisor LangGraph with pluggable teams and tools, single-table DynamoDB state, and a sync/async split that keeps chat latency bounded while long jobs run on SQS.

// authored by Oleks Saloid · published · last reviewed

A blueprint is a pattern already running in production. The numbers below describe the system that produced the evidence, not a prospective engagement. We adapt blueprints to each client's stack and scale.

# THE-PROBLEM

why this pattern exists.

Most 'AI agent' systems are a loose prompt loop wired to a handful of tools, and they fail the moment you need determinism, multi-tenancy, or auditability. You end up unable to explain why an agent did what it did, unable to extend the system without redeploying the runtime, and unable to move long-running work off the synchronous request path.

A production-shaped agent platform needs to isolate LLM non-determinism behind a validated contract, let non-core contributors register new agents and tools without touching infrastructure, and keep sync and async execution paths architecturally distinct. The data model needs to tolerate schema drift (new team shapes, new tool types) without table-by-table migrations.

# ARCHITECTURE

how it's built.

  1. Two-phase planner/supervisor

    A planner agent converts each user request into a strictly-typed JSON plan, validated with a Zod schema. A supervisor agent compiles that plan into a LangGraph StateGraph at runtime, sequentially invoking a generic teamWorker Lambda for each step and threading the previous step's output into the next step's context.

  2. Pluggable team model

    Teams are first-class, user-extensible artifacts in DynamoDB. A buildTeam factory supports three shapes — agent (tool-using ReAct executors), llm (prompt-template workers), and custom (dynamically-imported graph modules) — so operators register new capabilities by editing data, not code.

  3. Typed tool registry

    Tools share base classes for file, static-web, and HTTP/API integrations and a common ToolResult schema. Integrations as different as the Facebook Marketing API, Google Search, Amazon APIs, and local file I/O present an identical success/error shape to the supervisor.

  4. Sync + async execution paths

    Synchronous chat flows through API Gateway behind a Cognito JWT authorizer, an API-key requirement, and a resource-policy IP allowlist. Long-running agent work moves onto SQS and is drained by a dedicated taskRunner Lambda sized for 15-minute executions.

  5. Single-table DynamoDB

    One table with PK/SK composite keys and two GSIs backs chat sessions, plans, tool-call traces, team definitions and prompts. Trading schema rigidity for predictable latency — plus PITR, TTL retention, and KMS CMK encryption — keeps the data plane operationally simple.

  6. Per-group IAM + per-CMK encryption

    Dedicated API Lambdas (bas-api, bas-admin-api, teams-api, tasks-api, chats-api) bind to purpose-built IAM roles. Customer-managed KMS keys encrypt the data plane end-to-end — DynamoDB, SQS, and S3 each have their own CMK and scoped key policy.

# KEY-PRIMITIVE

the load-bearing idea.

Schema-validated plan as a deterministic compiled StateGraph

The pivotal decision in this blueprint is separating planning from execution. The LLM runs once to produce a plan validated against a Zod schema; that plan is then compiled into a LangGraph StateGraph and executed deterministically. This eliminates a whole class of runtime LLM drift, makes every step auditable (plans, tool-call traces, and step outputs all persist to DynamoDB), and lets the supervisor fan out work to workers with a predictable contract. It's the difference between 'an agent that sometimes works' and a platform you can build on.

# TECH-STACK

what runs it.

AWS Lambda API Gateway DynamoDB (single-table + 2 GSIs, PITR, TTL, CMK) SQS S3 Cognito (User Pool + Resource Server) KMS Secrets Manager EventBridge CloudWatch CloudFormation Serverless Framework (multi-stage, cross-stack exports) Node.js 24 LangChain.js LangGraph Zod AWS SDK v3 React Vite Structured JSON logging with request correlation

# PRODUCTION-EVIDENCE

what we've measured.

Production-grade multi-tenant platform — infrastructure-as-code, end-to-end encryption, extensible by teams.

We shipped this as a general-purpose platform for designing and orchestrating LLM agents as composable workflows. The entire environment is codified with the Serverless Framework plus modular CloudFormation fragments (IAM, policies, Cognito, DynamoDB, SQS, S3, KMS) and per-stage overrides with cross-stack exports. Runtime secrets are externalized to AWS Secrets Manager via a small config loader — credentials stay out of function environments, rotation requires no redeploy.

Plan then execute
2-phase
Team shapes
3
DynamoDB table
1
KMS encryption
E2E

want one of these in production?

30-min discovery call. we adapt the blueprint, we don't resell it.

 book-call

// or write: hello@saloid.com · gräfelfing · de