Skip to main content
HashiConf More sessions have been added to the conference agenda. Buy your pass and plan your schedule. Register
Blog

Deploying serverless AI agents on AWS with Terraform and securing them with HCP Vault

Learn how to build secure, serverless AI agents on AWS using the Strands SDK, Terraform, and Vault — complete with authentication, tool orchestration, and infrastructure as code.

By: Oscar Medina | Anton Aleksandrov | Debasis Rath

»Introduction

Agentic AI is reshaping how modern applications interact with users, systems, and data. Unlike traditional LLM-powered chat interfaces, agentic systems can reason through goals, call external tools, and make real-time decisions — offloading complex workflows from users to intelligent software.

In this post, you’ll explore how to use Terraform and HCP Vault to build an agentic AI system using the Strands Agents SDK and run it on AWS serverless services. From identity and state management to tool orchestration via Model Context Protocol (MCP), you’ll discover how to design agents that are secure, scalable, and easy to operate — with Terraform managing infrastructure as code, and Vault handling secrets and issuing tokens for secure authentication. You’ll learn how to:

  • Build agents using the Strands Agents SDK

  • Authenticate and authorize users securely with HCP Vault

  • Integrate with remote tools via Model Context Protocol (MCP)

  • Terraform solution overview

The earliest AI assistants were stateless and transactional, responding to one prompt at a time, with no memory or awareness of user context. Over time, they evolved with system prompts and context windows and began incorporating enterprise knowledge through techniques like Retrieval-Augmented Generation (RAG). But even with these improvements, they remained passive - capable of responding but not reasoning or acting.

Evolving RAG Architecture

Agentic AI introduced a major change. Unlike traditional AI assistants, agentic systems reason through goals, dynamically invoke tools, and adapt their behaviors in real time. They're built to execute, not just generate. But making that leap — from prompt to production — requires more than just model logic and a collection of tools.

Agentic Systems Loop

Running agentic systems in the real world requires more than just a model - you need secure APIs, proper authentication and authorization, persistent state management, internal and external tool integration, and end-to-end system observability. Terraform helps you to manage all of this as code, making it easy to replicate and scale your setup across environments. By codifying components like compute, identity, secrets, and monitoring, Terraform gives you a predictable and repeatable foundation for deploying reliable, secure, composable, and governable agentic systems.

»Building agents using Strands Agents SDK

Strands Agents is an open-source SDK that lets you define agents in Python with minimal boilerplate. These agents natively know how to maintain state, orchestrate tools, and execute multi-step reasoning loops. Since they’re just Python apps, you can run them anywhere. The following code snippet illustrates creating an AI agent using Strands. You specify the configuration, and all the agentic loop logic implementation is encapsulated within the Agent class. 

from strands import Agent
 
agent = Agent(
 
    model = my_model,
 
    system_prompt = "you're a travel agent...",
 
    tools = [my_tools]
 
)
 
agent("Book me a trip to Paris")

When running these agents on AWS, you can use serverless services like Amazon Bedrock AgentCore (preview), AWS Lambda or AWS Fargate to reduce the infrastructure management overhead and stay focused on delivering business value. Regardless which AWS service you use to run your agents, Terraform and Vault can help to automate infrastructure provisioning, manage resource lifecycle, and securely manage secrets. 

In agentic systems, reasoning doesn't happen in a vacuum. Agents must retain memory, apply business logic, invoke tools, and interact with LLM, based on user intent. While Strands Agents SDK abstracts the agentic loop at the application level, it is important to properly define the system architecture using infrastructure as code.

A great example of this principle is externalizing agent state. In cloud-native architectures, where compute is stateless by design, externalizing agent state is key as it promotes resiliency and allows you to scale your agents horizontally.

Externalizing Agent State within Cloud Native Architectures

A common approach for persisting state is using a durable storage like Amazon S3, as illustrated below. 

    session_manager = S3SessionManager(
 
        session_id=f"session_for_user_{user.id}",
 
        bucket=SESSION_STORE_BUCKET_NAME,
 
        prefix="agent_sessions"
 
    )
 
    agent = Agent(
 
        session_manager=session_manager
 
    )

This allows any agent instance to reload conversation history on demand - delivering a seamless, stateful user experience while keeping the infrastructure itself stateless, resilient, and horizontally scalable.

»Automated provisioning with Terraform

You can codify this setup as a Terraform module, encapsulating provisioning the S3 bucket used for session storage and defining precise IAM permissions for the agent to read and write sessions securely. The result is a declarative, auditable, and repeatable state management layer that aligns with your overall IaC approach.

resource "aws_s3_bucket" "agent_state"  {...}
 
resource "aws_iam_role" "agent" {...}
 
resource "aws_lambda_function" "agent" {
 
    role = aws_iam_role.agent.arn
 
    environment {
 
        variables = {
 
            STATE_BUCKET = aws_s3_bucket.agent_state.bucket_name
 
        }
 
    }
 
    ... redacted ...
 
}
 
resource "aws_iam_policy" "agent" {...}
 
resource "aws_iam_role_policy_attachment" "agent" {
 
    role = aws_iam_role.agent.name
 
    policy_arn = aws_iam_policy.agent.arn
 
}

»Authentication and authorization using HCP Vault

Many organizations prefer a cloud-agnostic, centralized secrets and identity management solution - especially when operating in multi-cloud or hybrid environments. HCP Vault offers a robust solution that supports fine-grained access control, dynamic secrets, and secure identity brokering. This section outlines how to configure HCP Vault for user authentication and authorization in your serverless agentic system.

For agents operating in an enterprise environment, the user’s identity, along with their authorization to take specific actions, must be securely determined and validated (based on assigned role and policies), before a trusted agent can take any actions. As depicted in the following diagram, once the user has successfully authenticated with Vault and received a JWT, this JWT can then be used when making requests to API Gateway and validated using Lambda authorizer.  

For enterprise AI agents to operate safely, they must know who the user is and what they’re allowed to do. This goes beyond basic identity validation - AI agents often act on behalf of users, so they might need to enforce role-based access controls, support auditing, and comply with corporate policies.

Once user authentication is complete, and JWT is obtained, you can use Amazon API Gateway with Lambda authorizer for token validation. Similarly to the previous section, bootstrapping this setup can also be automated with IaC and Terraform.

User authentication via HCP Vault

»HCP Vault integration overview

1. Set Up HCP Vault

  • Create a new HCP Vault cluster via the HCP Portal.

  • Enable the userpass or OIDC auth method depending on your identity provider:

    • For internal users: userpass

    • For enterprise SSO: OIDC with Okta, Azure AD, etc.

  • Enable the JWT auth method for validating tokens issued to users:

     vault auth enable jwt

2. Configure JWT authentication in Vault

Define a JWT auth role that maps user tokens to Vault policies

vault write auth/jwt/role/agent-users \
 
role_type="jwt" \
 
bound_audiences="agent-app" \
 
user_claim="sub" \
 
policies="agent-user-policy" \
 
ttl="15m"

Upload the public key or JWKS endpoint of your identity provider to Vault

vault write auth/jwt/config \
 
oidc_discovery_url="https://your-idp.com" \
 
bound_issuer="https://your-idp.com"

3. Issue JWTs to users

Use your identity provider (e.g., Cognito, Auth0, or Okta) to issue JWTs to users.

These tokens will be passed to your API Gateway and validated by a Lambda authorizer.

The Lambda authorizer provided in the sample project looks for the following claims to extract username — username, preferred_username, email, sub. Make sure the access token is issued in JWT format and at least one of these claims is available. If your Vault instance does not issue JWT-formatted access tokens, update the authorizer code with your custom token extraction and validation logic. 

 

»Terraform integration

Use the official Vault Terraform provider to manage:

  • JWT auth method configuration

  • Roles and policies

  • Secrets engines (e.g. dynamic credentials for DBs or cloud providers)

resource "vault_jwt_auth_backend" "jwt" {
 
  path = "jwt"
 
}
 
resource "vault_jwt_auth_backend_role" "agent_users" {
 
  backend     = vault_jwt_auth_backend.jwt.path
 
  role_name   = "agent-users"
 
  bound_audiences = ["agent-app"]
 
  user_claim  = "sub"
 
  policies    = ["agent-user-policy"]
 
  ttl         = "15m"
 
}

»Security best practices

  • Use short-lived tokens (e.g., 15 minutes) to reduce risk of token leakage.

  • Validate all JWT claims in the Lambda authorizer, especially iss, aud, and exp.

  • Avoid forwarding user tokens to downstream services (see “Confused Deputy” section).

  • Use Vault’s dynamic secrets to issue short-lived credentials for agents accessing external systems.

»Integrating remote tools via Model Context Protocol (MCP)

The Model Context Protocol (MCP) introduces a clean separation between agents and the tools they use. Rather than hard coding tool logic into the agent, your agent becomes a lightweight client that connects to one or more MCP-compliant servers - each exposing tools, resources, and reusable prompts over a standardized protocol.

In modern architectures, tools often live across teams, domains, or even organizations. Treating tools like microservices promotes modularity, ownership, and reuse. One team can evolve a tool independently while another consumes it without needing to manage its implementation.

From an infrastructure perspective, this means treating MCP endpoints as pluggable, versioned dependencies. This is where Terraform becomes crucial by provisioning required resources, such as MCP endpoint configuration.

# terraform/modules/mcp/outputs.tf
 
output "mcp_endpoint" {
 
    value = "https://mcp-server-endpoint.com/mcp"
 
}
 
# terraform/modules/agent/main.tf
 
variable mcp_endpoint {}
 
resource "aws_lambda_function" "agent" {
 
    environment {
 
        variables = {
 
            MCP_ENDPOINT = var.mcp_endpoint
 
        }
 
    }
 
}
 
# terraform/main.tf
 
module "mcp" {
 
    source = "./modules/mcp"
 
}
 
module "agent" {
 
    source = "./modules/agent"
 
    mcp_endpoint = module.mcp.mcp_endpoint
 
}

This decoupling also enables governance: tools can enforce their own access controls, audit usage, and evolve independently - while agents remain lean, configurable, and composable.

»Identity propagation to MCP servers

In agentic systems, the agent often needs to securely communicate with external services, such as MCP servers, to retrieve tools, prompt templates, or other domain-specific capabilities. While it may seem easiest to simply reuse the user’s access token for securing requests with these services, doing so introduces a classic security risk - the confused deputy problem.

In this scenario, the agent is a trusted service capable of performing privileged actions. If it blindly forwards the user's token to the MCP server, it can be tricked into misusing its authority - acting on behalf of a user who might not have direct access to those tools. Worse, if users discover that the MCP server accepts their token directly, they could bypass the agent entirely, leading to unauthorized access, inconsistent auditing, and broken trust boundaries.

User can directly access MCP with their token

A more secure and scalable approach is to separate trust domains. The user authenticates with the agent, providing a token that the agent validates and uses to extract identity, roles, or other relevant context. When the agent calls the MCP server, it authenticates using its own identity - typically a short-lived service token, a Vault-issued credential, or JWT obtained using OAuth2 client credentials flow. User context is then forwarded in a controlled, non-authoritative way, such as a custom header containing the user ID. This ensures the MCP server trusts only the agent, not the end user, and the agent retains full control over enforcing user-level policy.

To enforce this model securely, both the agent and MCP server could leverage JWT claims. The agent should validate the user token’s iss (issuer), sub (subject), and aud (audience) fields to ensure it was issued by a trusted identity provider, intended for the agent, and represents a valid user. Similarly, the MCP server should reject any JWTs that aren’t explicitly issued to the agent as the subject and intended for the MCP server as the audience. These validations prevent token misuse and make it harder for malicious actors to impersonate legitimate callers.

User Token not accepted directly by MCP Server

»Securing secrets & IAM policies using Vault and Terraform

From the IaC perspective, this setup can also be codified using Terraform. You can provision secrets via HashiCorp Vault, configure IAM policies to tightly control access, and inject required environment variables into the Lambda runtime. This gives you full control over token management, secret distribution, and runtime configuration - all declaratively and securely.

mcp_client = MCPClient(lambda: streamablehttp_client(
 
    url = os.environ["MCP_ENDPOINT"],
 
    headers = {
 
        "Authorization": f"Bearer {os.environ['MCP_TOKEN']}",
 
        "x-user-id": user_id
 
    }
 
))
 
with mcp_client:
 
    tools = mcp_client.list_tools_sync()
 
    agent = Agent(tools=tools)

This model ensures the MCP server only accepts calls from trusted agents, not directly from users. It avoids replay attacks, enforces separation of concerns, and maintains clean authorization boundaries — critical for building scalable, secure agentic systems.

»Observability

Observability is critical for operating agentic systems in production. These workloads go far beyond simple request-response metrics. In addition to traditional metrics like uptime, error rates, and latency, agentic workflows introduce new telemetry dimensions - tracking tokens consumed, reasoning cycles, tool invocations, and decision loop timing. These insights are essential not only for debugging but also for understanding system performance and controlling cost.

When deploying agents on AWS services like Lambda or ECS, you get seamless integration with Amazon CloudWatch for logs, metrics, and tracing. With Terraform, you can automate creation of log groups, dashboards, metric filters, alarms, and even subscription filters as code, ensuring your observability setup is consistent, versioned, and reproducible across environments.

For example, you can use Terraform to provision aws_cloudwatch_log_group for capturing agent logs, build aws_cloudwatch_dashboard widgets to visualize metrics like token usage and invocation latency, and configure aws_cloudwatch_metric_alarm rules to notify you of anomalies such as unexpected tool call durations or unusually high cost drivers.

In addition, the Strands Agents SDK provides out-of-the-box observability using OpenTelemetry (OTEL). It automatically generates spans for each agent loop, including calls to the LLM, tool invocations, and memory operations. It also exports detailed metrics like token counts, tool execution times, and reasoning duration. These OTEL-compatible metrics can be sent to any backend - CloudWatch, Prometheus, or a third-party observability platform.

By provisioning observability infrastructure alongside your agentic app infrastructure with Terraform, you ensure your team gets real-time visibility into how your agents reason, adapt, and perform under load - without manual setup or drift across environments. This ensures observability is a first-class citizen baked into your system from day one – not added as an afterthought.

»Terraform + Vault solution overview

Follow instructions in this GitHub repo to deploy a sample project implementing practices described above. The solution uses HashiCorp Vault and Terraform to provide security and deployment automation layers. Augmented with AWS serverless services, Strands Agents SDK, and remote MCP server deployed as Lambda functions, this approach illustrates an efficient and secure deployment of agentic AI-based solutions, as depicted below. 

The sample assumes you already have a Vault instance provisioned and configured to issue JWT access tokens via userpass authentication and OIDC identity provider

GitHub Repo Solution Architecture

»Conclusion

Agentic systems bring powerful new capabilities to AI — reasoning, taking actions, and integrating with real-world systems. But to run them reliably in production, you need infrastructure and tools that are modular, secure, scalable, and easy to manage.

Terraform provides a consistent, traceable, and code-driven way to provision everything from compute and IAM to secrets and observability. Vault complements this by handling sensitive configuration like access credentials, enabling your users to securely authenticate with agents, and agents to securely authenticate and interact with external systems without exposing credentials in code or pipelines. And serverless services like Lambda, Fargate, and S3 let you scale effortlessly while minimizing operational overhead.

Together, Terraform, Vault, and AWS serverless form a robust foundation for building agentic AI systems - automated, composable, and ready for the real world. Whether you’re deploying a single agent or orchestrating an entire ecosystem of tools via MCP, this combination gives you full control, strong security, and enterprise-grade scalability while minimizing operational complexity.

»Useful resources

Oscar Medina is a Technical Field Strategy Director at HashiCorp.  Oscar drives initiatives that truly reflect customer real-world use cases and patterns.  He is an advocate for multi-cloud initiatives and cloud agnostic frameworks.

Anton Aleksandrov is a Principal Solutions Architect for AWS Serverless and Event-Driven architectures. Having over two decades of hands-on engineering and architecture experience, Anton works with major ISV and SaaS customers to design highly scalable, innovative, and secure cloud solutions.

Debasis Rath is a Senior Serverless Specialist Solutions Architect at AWS. Debasis specializes in helping large enterprises adopt serverless and event-driven architectures to modernize their applications and accelerate their pace of innovation.

More resources like this one

  • 2/3/2023
  • Case Study
Automating Multi-Cloud, Multi-Region Vault for Teams and Landing Zones
  • 1/5/2023
  • Case Study
How Discover Manages 2000+ Terraform Enterprise Workspaces
  • 12/22/2022
  • Case Study
Architecting Geo-Distributed Mobile Edge Applications with Consul
zero-trust
  • 12/13/2022
  • PDF
A Field Guide to Zero Trust Security in the Public Sector