# BLUEPRINT-AWS-AGENTS
Agentic workflows on AWS
Planner/supervisor LangGraph with pluggable teams and tools, single-table DynamoDB state, and a sync/async split that keeps chat latency bounded while long jobs run on SQS.
# THE-PROBLEM
why this pattern exists.
Most 'AI agent' systems are a loose prompt loop wired to a handful of tools, and they fail the moment you need determinism, multi-tenancy, or auditability. You end up unable to explain why an agent did what it did, unable to extend the system without redeploying the runtime, and unable to move long-running work off the synchronous request path.
A production-shaped agent platform needs to isolate LLM non-determinism behind a validated contract, let non-core contributors register new agents and tools without touching infrastructure, and keep sync and async execution paths architecturally distinct. The data model needs to tolerate schema drift (new team shapes, new tool types) without table-by-table migrations.
# ARCHITECTURE
how it's built.
-
Two-phase planner/supervisor
A planner agent converts each user request into a strictly-typed JSON plan, validated with a Zod schema. A supervisor agent compiles that plan into a LangGraph StateGraph at runtime, sequentially invoking a generic teamWorker Lambda for each step and threading the previous step's output into the next step's context.
-
Pluggable team model
Teams are first-class, user-extensible artifacts in DynamoDB. A buildTeam factory supports three shapes — agent (tool-using ReAct executors), llm (prompt-template workers), and custom (dynamically-imported graph modules) — so operators register new capabilities by editing data, not code.
-
Typed tool registry
Tools share base classes for file, static-web, and HTTP/API integrations and a common ToolResult schema. Integrations as different as the Facebook Marketing API, Google Search, Amazon APIs, and local file I/O present an identical success/error shape to the supervisor.
-
Sync + async execution paths
Synchronous chat flows through API Gateway behind a Cognito JWT authorizer, an API-key requirement, and a resource-policy IP allowlist. Long-running agent work moves onto SQS and is drained by a dedicated taskRunner Lambda sized for 15-minute executions.
-
Single-table DynamoDB
One table with PK/SK composite keys and two GSIs backs chat sessions, plans, tool-call traces, team definitions and prompts. Trading schema rigidity for predictable latency — plus PITR, TTL retention, and KMS CMK encryption — keeps the data plane operationally simple.
-
Per-group IAM + per-CMK encryption
Dedicated API Lambdas (bas-api, bas-admin-api, teams-api, tasks-api, chats-api) bind to purpose-built IAM roles. Customer-managed KMS keys encrypt the data plane end-to-end — DynamoDB, SQS, and S3 each have their own CMK and scoped key policy.
# KEY-PRIMITIVE
the load-bearing idea.
Schema-validated plan as a deterministic compiled StateGraph
The pivotal decision in this blueprint is separating planning from execution. The LLM runs once to produce a plan validated against a Zod schema; that plan is then compiled into a LangGraph StateGraph and executed deterministically. This eliminates a whole class of runtime LLM drift, makes every step auditable (plans, tool-call traces, and step outputs all persist to DynamoDB), and lets the supervisor fan out work to workers with a predictable contract. It's the difference between 'an agent that sometimes works' and a platform you can build on.
# TECH-STACK
what runs it.
# PRODUCTION-EVIDENCE
what we've measured.
Production-grade multi-tenant platform — infrastructure-as-code, end-to-end encryption, extensible by teams.
We shipped this as a general-purpose platform for designing and orchestrating LLM agents as composable workflows. The entire environment is codified with the Serverless Framework plus modular CloudFormation fragments (IAM, policies, Cognito, DynamoDB, SQS, S3, KMS) and per-stage overrides with cross-stack exports. Runtime secrets are externalized to AWS Secrets Manager via a small config loader — credentials stay out of function environments, rotation requires no redeploy.
- Plan then execute
- 2-phase
- Team shapes
- 3
- DynamoDB table
- 1
- KMS encryption
- E2E
want one of these in production?
30-min discovery call. we adapt the blueprint, we don't resell it.
book-call// or write: hello@saloid.com · gräfelfing · de