How to Build Custom AI Agents with OpenAI’s Responses API: From Single-Turn Chat to Multi-Step Autonomous Workflows






How to Build Custom AI Agents with OpenAI’s Responses API


How to Build Custom AI Agents with OpenAI’s Responses API

A practical, end-to-end tutorial for developers: environment setup, a basic agent, tool integration, multi-step reasoning, error handling, and deploying to production with Python examples.

Header illustration” />

Contents

  • Introduction: What is an AI agent?
  • 1. Setup: development environment and libraries
  • 2. Building a minimal agent with the Responses API
  • 3. Adding tool use: registers, secure execution, and tool results
  • 4. Multi-step reasoning and plan-execute-observe loops
  • 5. Robustness: validation, retries, and error handling
  • 6. Production readiness: deployment, monitoring, and governance
  • Appendix: utility code, Dockerfile, and testing suggestions

Introduction: What is an AI agent?

In this tutorial you’ll learn how to build customizable AI agents using OpenAI’s Responses API. In the context of this guide, an “agent” is a software component that:

  • Accepts user input (a question, task, or request)
  • Plans how to complete the task, possibly breaking it into steps
  • Invokes external tools (search, a calculator, internal APIs, or custom code) when needed
  • Observes the results of tools and iterates until it completes the task
  • Returns a structured and user-friendly answer

This guide uses Python examples and assumes you have a valid OpenAI API key and a reasonable familiarity with Python development. Where appropriate, you’ll see full working snippets and suggestions for production hardening.

1. Setup: development environment and libraries

Before building an agent, prepare a development environment. The minimal requirements:

  • Python 3.10+ (or 3.11 recommended)
  • A virtual environment (venv, pipenv, or poetry)
  • An OpenAI API key set as an environment variable (OPENAI_API_KEY)
  • Basic libraries: openai (or the latest official OpenAI Python SDK), requests, and optionally pydantic for input/output validation

Create a project and virtual environment

# Create a project directory and a venv
python -m venv .venv
source .venv/bin/activate        # macOS / Linux
.venv\Scripts\activate           # Windows

# Upgrade pip and install dependencies
pip install --upgrade pip
pip install openai requests pydantic

Set your API key in your shell environment (never hard-code it in source). On macOS/Linux:

export OPENAI_API_KEY="sk-..."

On Windows (PowerShell):

$env:OPENAI_API_KEY="sk-..."

Note: OpenAI’s Python SDK sometimes changes; if you are using the newer openai package that exposes an OpenAI client class, import and instantiate it. In this guide we use the pattern shown below:

from openai import OpenAI

client = OpenAI()  # will read OPENAI_API_KEY from the environment

Be careful with secrets and credentials. Use environment variables or a secrets manager (HashiCorp Vault, AWS Secrets Manager, etc.) in production.

Setup illustration” />

2. Building a minimal agent with the Responses API

We’ll start by creating a minimal agent that receives a user query and returns a direct answer using the Responses API. The agent will be modular: it will have an Agent class responsible for orchestration and a simple “LLM” wrapper for calls to the Responses API.

Minimal LLM wrapper

This wrapper centralizes API calls (you can add logging, retries, or telemetry later).

from openai import OpenAI

class LLM:
    def __init__(self, client=None, model="gpt-4o-mini"):
        self.client = client or OpenAI()
        self.model = model

    def generate(self, prompt, max_tokens=512, temperature=0.2):
        resp = self.client.responses.create(
            model=self.model,
            input=prompt,
            max_output_tokens=max_tokens,
            temperature=temperature
        )
        # The Responses API returns a structure with `output` content
        # For simplicity we concatenate text segments if present
        output_text = ""
        if resp.output and isinstance(resp.output, list):
            for item in resp.output:
                if getattr(item, "content", None):
                    # item.content may be structured; handle simple case
                    for c in item.content:
                        if c["type"] == "output_text":
                            output_text += c["text"]
        elif getattr(resp, "output_text", None):
            output_text = resp.output_text
        else:
            # fallback
            output_text = str(resp)
        return output_text

Simple agent orchestration

The agent will accept a user prompt, call the LLM, and return the result. This is intentionally minimal and suitable for simple Q&A and small tasks.

class SimpleAgent:
    def __init__(self, llm: LLM):
        self.llm = llm

    def handle(self, user_input: str) -> str:
        prompt = f"You are an assistant. Answer concisely and clearly:\n\nUser: {user_input}\nAssistant:"
        return self.llm.generate(prompt)

# Usage
if __name__ == "__main__":
    client = OpenAI()
    llm = LLM(client=client)
    agent = SimpleAgent(llm)
    print(agent.handle("What's a good plan to learn web development in 3 months?"))

That gets you started: a request goes to the Responses API and a text answer returns. Next, we’ll allow the agent to “use tools” — functions that perform actions like web search, database queries, or calculators.

3. Adding tool use

Agents become much more powerful when they can call external tools to retrieve facts, perform calculations, or connect to internal systems. We’ll implement:

  • A tool registry to register and describe tools
  • A simple protocol for the agent to request a tool call
  • Secure execution and result feeding back to the LLM

There are multiple patterns to enable “tool use”. Some frameworks offer built-in tool/function calling. Here, to stay framework-agnostic and explicit, we’ll use a “structured JSON plan” approach: ask the LLM to return a JSON plan describing steps, where each step can ask the agent to call a named tool with arguments.

Tool registry

from typing import Callable, Dict, Any
import json

class Tool:
    def __init__(self, name: str, description: str, func: Callable[..., Any]):
        self.name = name
        self.description = description
        self.func = func

class ToolRegistry:
    def __init__(self):
        self.tools: Dict[str, Tool] = {}

    def register(self, tool: Tool):
        if tool.name in self.tools:
            raise ValueError(f"Tool {tool.name} already registered")
        self.tools[tool.name] = tool

    def call(self, name: str, args: dict):
        if name not in self.tools:
            raise ValueError(f"Unknown tool: {name}")
        return self.tools[name].func(**args)

    def describe(self):
        # Return a plain description for the LLM prompt
        return [
            {"name": t.name, "description": t.description} for t in self.tools.values()
        ]

Example tools: web_search and calculator

For demonstration, we’ll add a very small web search wrapper (using Bing or Google scraping in production is not recommended — use an official search API). We’ll show a placeholder function here; in real usage, replace with a proper search API.

import requests
from urllib.parse import urlencode

def web_search_stub(query: str, top_k: int = 3):
    """
    Stubbed web search. In production, call a search API.
    Returns a list of dicts like: [{"title": "...", "snippet": "...", "url": "..."}]
    """
    # Placeholder: return a deterministic fake result for testing
    return [
        {"title": f"Result {i+1} for {query}", "snippet": f"Snippet {i+1}", "url": f"https://example.com/{i+1}"}
        for i in range(top_k)
    ]

def calculator(expr: str):
    # Very small, safe evaluator — DO NOT eval() untrusted code
    # For demonstration, use simple arithmetic parsing
    import ast, operator as op
    allowed_operators = {
        ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul, ast.Div: op.truediv,
        ast.Pow: op.pow, ast.USub: op.neg
    }
    def eval_(node):
        if isinstance(node, ast.Num):
            return node.n
        if isinstance(node, ast.BinOp):
            return allowed_operators[type(node.op)](eval_(node.left), eval_(node.right))
        if isinstance(node, ast.UnaryOp):
            return allowed_operators[type(node.op)](eval_(node.operand))
        raise ValueError("Unsupported expression")

    try:
        node = ast.parse(expr, mode='eval').body
        return eval_(node)
    except Exception as e:
        raise ValueError(f"Could not evaluate expression safely: {e}")

Registering tools and prompting the model for plans

We’ll ask the model to produce a JSON plan. The model is given a list of available tools and a schema for the plan. The agent parses the plan, runs the tools, and returns the result. This approach gives you full control over how tools are executed (sandboxing, validation, logging).

import json

tool_registry = ToolRegistry()
tool_registry.register(Tool(name="web_search", description="Search the web and return top results", func=web_search_stub))
tool_registry.register(Tool(name="calculator", description="Evaluate basic arithmetic expressions", func=calculator))

class ToolUsingAgent:
    def __init__(self, llm: LLM, tools: ToolRegistry):
        self.llm = llm
        self.tools = tools

    def _build_tool_prompt(self, user_input: str) -> str:
        tools_list = json.dumps(self.tools.describe(), indent=2)
        prompt = f"""
You are an assistant that can plan actions and call tools. Available tools:
{tools_list}

Respond with a JSON object describing your plan. Schema:
{{
  "plan": [
    {{
      "action": "think" | "call_tool" | "finish",
      "thought": "short internal reasoning (optional)",
      "tool": "tool_name (if action is call_tool)",
      "args": {{ ... }} (if action is call_tool),
      "note": "optional textual note for the user (optional)"
    }}
  ]
}}

User request: {user_input}

Remember:
- If you need to fetch facts, use "call_tool" with the web_search tool.
- If you need to compute, use "call_tool" with the calculator tool.
- Conclude with an action "finish" that summarizes the result for the user.
"""
        return prompt

    def handle(self, user_input: str):
        prompt = self._build_tool_prompt(user_input)
        raw = self.llm.generate(prompt, max_tokens=800, temperature=0.0)
        # The model should return JSON. Attempt to find and parse the JSON in the output.
        try:
            plan_json = json.loads(raw)
        except Exception:
            # Try to extract JSON substring
            import re
            m = re.search(r"\{.*\}", raw, flags=re.S)
            if m:
                plan_json = json.loads(m.group(0))
            else:
                raise ValueError("LLM did not return JSON plan:\n" + raw)

        # Execute the plan sequentially
        final_user_message = None
        for step in plan_json.get("plan", []):
            action = step.get("action")
            if action == "think":
                # optional internal step; we can log thought
                print("LLM thought:", step.get("thought"))
            elif action == "call_tool":
                tool_name = step.get("tool")
                args = step.get("args", {})
                result = self.tools.call(tool_name, args)
                # Provide the observation back into a short follow-up prompt to the LLM
                followup_prompt = f"Observation from tool call ({tool_name}):\n{json.dumps(result, indent=2)}\n\nContinue planning based on this observation."
                # You can call the LLM again to get the next plan or let the loop continue
                raw = self.llm.generate(followup_prompt, max_tokens=400, temperature=0.0)
                # For simplicity we assume the original plan contained all steps.
                print(f"Tool {tool_name} returned:", result)
            elif action == "finish":
                final_user_message = step.get("note") or "Task complete."
            else:
                raise ValueError(f"Unknown action: {action}")

        if final_user_message is None:
            final_user_message = "No final summary provided by agent."
        return final_user_message

# Example usage
if __name__ == "__main__":
    client = OpenAI()
    llm = LLM(client=client)
    agent = ToolUsingAgent(llm, tool_registry)
    print(agent.handle("What is the population of Japan and what is 123 * 456?"))

This plan-execute loop is explicit, auditable, and simple to reason about. The LLM produces a plan; your code validates and executes it. This pattern keeps the execution environment safe because you control which tools are callable and how arguments are validated.

For production use, you might prefer “function-calling” style if available in the SDK, which allows the model to return a structured function call that your client can map to a tool. The manual JSON-plan approach is intentionally portable across SDKs and versions.

Before building custom agents with the Responses API, developers who previously used the Assistants API need to understand the migration path. The architectural differences between the two APIs are significant, affecting how state management, tool use, and conversation threading work. Our step-by-step migration guide covers how to migrate from the OpenAI Assistants API to the Responses API with complete code examples.

4. Multi-step reasoning and plan-execute-observe loops

Many tasks require multiple steps: gather facts, analyze, compute, and produce a final answer. The plan-execute-observe loop (also called “recurrent planning”) lets an agent interleave LLM planning with tool execution until a stopping criterion is met.

Design pattern

  1. Prompt LLM to produce a plan (one or more actions)
  2. Validate the plan format and tool arguments
  3. Execute the first actionable step(s)
  4. Collect observations from tools and append them to a context
  5. Re-prompt the LLM with the updated context to get the next plan
  6. Repeat until the LLM signals completion or a max step limit is reached

Avoid exposing the model’s chain-of-thought or internal deliberations to end users. You can keep “thoughts” in the plan for debugging but exclude them from the final output.

Example: multi-step agent loop

import time

class MultiStepAgent:
    def __init__(self, llm: LLM, tools: ToolRegistry, max_steps: int = 6):
        self.llm = llm
        self.tools = tools
        self.max_steps = max_steps

    def run(self, user_input: str):
        step = 0
        context = {"user_input": user_input, "observations": []}
        while step < self.max_steps:
            prompt = self._build_prompt(context)
            raw = self.llm.generate(prompt, max_tokens=800, temperature=0.0)
            plan = self._parse_plan(raw)
            if not plan:
                raise RuntimeError("No plan returned from LLM")
            # Process first actionable item
            next_action = plan[0]
            action = next_action.get("action")
            if action == "call_tool":
                tool_name = next_action.get("tool")
                args = next_action.get("args", {})
                # Validate args (ensure types, size limits etc.)
                obs = self._safe_call(tool_name, args)
                context["observations"].append({"tool": tool_name, "args": args, "result": obs})
                step += 1
                continue
            elif action == "finish":
                return next_action.get("note") or "Done."
            elif action == "think":
                # treat as internal; add to debug logs
                context.setdefault("internal_thoughts", []).append(next_action.get("thought"))
                step += 1
                continue
            else:
                raise RuntimeError(f"Unknown action from plan: {action}")
        raise RuntimeError("Max steps reached without finishing")

    def _build_prompt(self, context: dict):
        tools_list = json.dumps(self.tools.describe(), indent=2)
        observations_text = json.dumps(context["observations"], indent=2)
        return f"""
You are an assistant that outputs a plan for action in JSON form. Tools: {tools_list}
User request: {context['user_input']}
Past observations: {observations_text}

Return an array "plan", where each plan item is one of:
- {{ "action": "call_tool", "tool": "tool_name", "args": {{...}} }}
- {{ "action": "think", "thought": "internal note" }}
- {{ "action": "finish", "note": "final answer for the user" }}
"""

    def _parse_plan(self, raw: str):
        try:
            parsed = json.loads(raw)
            return parsed.get("plan", [])
        except Exception:
            # attempt to extract JSON substring
            import re
            m = re.search(r"\{.*\}", raw, flags=re.S)
            if m:
                parsed = json.loads(m.group(0))
                return parsed.get("plan", [])
            else:
                return []

    def _safe_call(self, tool_name: str, args: dict):
        # Validate arguments size/type
        if tool_name not in self.tools.tools:
            raise ValueError("Attempt to call unknown tool")
        # Example validation: limit string length
        for k, v in args.items():
            if isinstance(v, str) and len(v) > 2000:
                raise ValueError("Argument too large")
        # Run tool and return observation
        return self.tools.call(tool_name, args)

# Example usage
if __name__ == "__main__":
    client = OpenAI()
    llm = LLM(client=client)
    agent = MultiStepAgent(llm=llm, tools=tool_registry)
    print(agent.run("Find three recent news articles about electric vehicles and summarize their points."))

Note: The agent re-prompts the LLM at each step using the growing observation log. This is robust because you can validate each tool call and decide whether to feed results back.

Limit tokens growth: when you keep adding observations to the prompt, token usage will increase and costs may rise. Use summarization, vector stores, or truncated histories to keep contexts manageable.

5. Robustness: validation, retries, and error handling

Real-world systems require robust error handling for network failures, rate limits, partial tool failures, and unexpected model outputs. This section outlines practical patterns for production-grade reliability.

API call retries and exponential backoff

Use exponential backoff for transient errors (HTTP 429 or 5xx). Below is an example decorator that performs retry with jitter:

import time
import random
from functools import wraps

def retry_on_exception(max_attempts=5, base_delay=0.5, backoff=2.0, jitter=0.1, retry_on=(Exception,)):
    def decorator(f):
        @wraps(f)
        def wrapper(*args, **kwargs):
            attempt = 0
            while True:
                try:
                    return f(*args, **kwargs)
                except retry_on as e:
                    attempt += 1
                    if attempt >= max_attempts:
                        raise
                    delay = base_delay * (backoff ** (attempt - 1))
                    delay = delay * (1 + random.uniform(-jitter, jitter))
                    time.sleep(delay)
        return wrapper
    return decorator

# Example usage wrapping the LLM generate call
class LLM:
    # ... (previous code)
    @retry_on_exception(max_attempts=4, base_delay=0.5, backoff=2.0)
    def generate(self, prompt, max_tokens=512, temperature=0.2):
        resp = self.client.responses.create(
            model=self.model,
            input=prompt,
            max_output_tokens=max_tokens,
            temperature=temperature
        )
        # (parsing as before)
        ...

Validation of model outputs

LLMs sometimes return malformed JSON, hallucinated tool names, or unexpected types. Validate everything before executing.

def validate_plan(plan: list, allowed_tools: list) -> None:
    if not isinstance(plan, list):
        raise ValueError("Plan should be a list")
    for i, step in enumerate(plan):
        if not isinstance(step, dict) or "action" not in step:
            raise ValueError(f"Step {i} missing action")
        action = step["action"]
        if action == "call_tool":
            if "tool" not in step or "args" not in step:
                raise ValueError(f"Step {i} missing tool or args")
            if step["tool"] not in allowed_tools:
                raise ValueError(f"Step {i} requests unknown tool: {step['tool']}")
        elif action == "finish":
            if "note" in step and len(step["note"]) > 10000:
                raise ValueError("Finish note too long")
        elif action == "think":
            continue
        else:
            raise ValueError(f"Step {i} invalid action: {action}")

Handling tool errors

Tools might fail (network errors, timeouts, exceptions). Decide for each tool whether failure should abort, retry, or be skipped. Log failures and provide meaningful diagnostics to the user.

def safe_execute_tool(registry: ToolRegistry, tool_name: str, args: dict, max_retries=2):
    attempts = 0
    while attempts <= max_retries:
        try:
            return registry.call(tool_name, args)
        except Exception as e:
            attempts += 1
            # Log
            print(f"Tool {tool_name} failed on attempt {attempts}: {e}")
            if attempts > max_retries:
                # Decide: propagate a structured error to the agent
                return {"error": str(e)}
            time.sleep(0.5 * attempts)

Dealing with hallucinations and factuality

LLM responses can be plausible but incorrect. To reduce hallucinations:

  • Prefer retrieving facts via trusted tools (official APIs, databases).
  • Ask the model to cite sources and verify with tool calls (e.g., web_search).
  • Use temperature=0 for deterministic outputs when generating structured plans.
  • Validate facts against a ground-truth source where possible.

Timeouts and circuit breakers

Implement timeouts on network calls and a circuit breaker to stop repeatedly calling a failing downstream service.

import threading

def call_with_timeout(func, args=(), kwargs=None, timeout=10):
    result = {}
    kwargs = kwargs or {}
    def target():
        try:
            result['value'] = func(*args, **kwargs)
        except Exception as e:
            result['error'] = e

    thread = threading.Thread(target=target)
    thread.start()
    thread.join(timeout)
    if thread.is_alive():
        raise TimeoutError("Operation timed out")
    if 'error' in result:
        raise result['error']
    return result.get('value')

Logging, tracing, and observability

Maintain structured logs for:

  • Prompts sent to the LLM (avoid logging PII in plain text; consider redaction)
  • Tool calls and results
  • Plan steps and validation outcomes
  • Errors, stack traces, and retry attempts

For privacy and security, never log unredacted sensitive user data in plaintext. Use tokenization, pseudonymization, or redaction before storing logs that include user content.

6. Production readiness: deployment, monitoring, and governance

Transitioning from a demo to production requires additional considerations: secure secrets management, scalability, rate-limiting, testing, monitoring, and compliance.

Packaging and containerization

Containerize your agent for reproducible deployments. A minimal Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY pyproject.toml poetry.lock* /app/
# If using pip
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt

COPY . /app

ENV PYTHONUNBUFFERED=1

CMD ["python", "server.py"]

If your agent is exposed via HTTP, supply a small web server (FastAPI or Flask) that receives requests and forwards them to your agent instance. An example FastAPI app skeleton:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI()

class RequestInput(BaseModel):
    user_input: str

class ResponseOutput(BaseModel):
    answer: str

# Assume agent is created globally
# agent = MultiStepAgent(...)

@app.post("/api/agent", response_model=ResponseOutput)
def run_agent(payload: RequestInput):
    try:
        result = agent.run(payload.user_input)
        return {"answer": result}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Scaling and concurrency

LLM calls are often the slowest part. To scale:

  • Use a job queue (Celery, RabbitMQ, or managed queues) for long-running tasks
  • Run multiple worker instances and autoscale based on queue depth
  • Cache frequent queries and tool results (use Redis or a CDN)
  • Use rate limiting to avoid overloading the LLM or downstream services

Authentication and authorization

Expose your agent behind an authenticated API. Use OAuth, JWTs, API gateway tokens, or a managed identity provider. For internal tools, enforce least privilege for the agent’s service account.

Secrets and key management

Once your agents are built, understanding the different operational modes available in Codex becomes critical for production deployment. Each mode offers different tradeoffs between autonomy, safety, and speed. Our comprehensive guide explains the complete guide to OpenAI Codex modes including Plan, Execute, and Review and how to choose the right mode for every task.

  • Unit tests for tool functions and argument validation
  • Integration tests mocking the LLM responses (do not call the live API in CI)
  • End-to-end tests in a staging environment using limited API keys
  • Automated behavioral tests to ensure the agent doesn’t violate policies ()

Monitoring and SLOs

Track metrics:

  • Request rate, latency per request, and 95/99th percentile latencies
  • LLM token usage and approximate cost per request
  • Error rates per tool and LLM call
  • Observability traces to correlate LLM prompts, tool calls, and responses

Cost controls

LLMs can be expensive. Implement:

  • Rate limiting per user and per service
  • Cost-aware routing: prefer cheaper models for low-risk queries
  • Token budgeting and summarization strategies to limit prompt sizes

Safety, content filtering, and governance

Ensure your agent complies with policies and legal requirements:

  • Apply content filters for disallowed content
  • Use policy checks for privacy-sensitive requests
  • Log policy-relevant decisions and maintain an audit trail

If your agent can take irreversible actions (tire multi-step confirmations and human approvals for high-risk operations.

Deployment checklist

Before going live, verify:

  • Secrets are stored securely
  • Rate limits and throttles are configured
  • Prompts and logs do not leak PII
  • Monitoring and alerts are in place
  • Automated tests and a rollback plan exist

For migration notes or when updating older code, see .

Appendix: utility code, Dockerfile, testing suggestions

This appendix collects practical snippets and suggestions you can copy into a repo.

Complete, minimal agent server using FastAPI

# server.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI()
llm = LLM(client=client)
tool_registry = ToolRegistry()
tool_registry.register(Tool("web_search", "Search the web", web_search_stub))
tool_registry.register(Tool("calculator", "Compute arithmetic", calculator))
agent = MultiStepAgent(llm=llm, tools=tool_registry)

class RequestPayload(BaseModel):
    user_input: str

class ResponsePayload(BaseModel):
    answer: str

@app.post("/api/agent", response_model=ResponsePayload)
def run(payload: RequestPayload):
    try:
        ans = agent.run(payload.user_input)
        return {"answer": ans}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Run with: uvicorn server:app --host 0.0.0.0 --port 8000

Dockerfile (example)

# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
ENV PYTHONUNBUFFERED=1
EXPOSE 8000
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8000"]

Testing tips

  • Mock LLM responses in unit tests by replacing LLM.generate with a deterministic stub.
  • Record cassettes of tool calls (VCR-like) to simulate external services.
  • Use property-based tests to exercise input validation logic.
  • Run end-to-end flows in staging with low-privilege API keys and strict logging/monitoring enabled.

Security considerations

  • Use input sanitization to prevent injection attacks in tools that execute commands or queries.
  • Avoid running arbitrary code returned by the model. Always map actions to pre-defined, validated functions.
  • Implement privilege checks for tool calls that can access sensitive data.

If you want to adopt more advanced agent orchestration frameworks or existing agent libraries, evaluate them for security and policy compliance before integrating them into production.

Additional section illustration” />

Access 40,000+ AI Prompts for ChatGPT, Claude & Codex — Free!

Subscribe to get instant access to our complete Notion Prompt Library — the largest curated collection of prompts for ChatGPT, Claude, OpenAI Codex, and other leading AI models. Optimized for real-world workflows across coding, research, content creation, and business.

Subscribe & Get Free Access →

Closing notes

Building custom AI agents with OpenAI’s Responses API involves two key responsibilities:

  1. Designing clear and verifiable interactions between the LLM and your code (plans, tool calls, validations)
  2. Engineering robust, secure infrastructure around the LLM calls (retries, monitoring, secrets, testing)

The examples in this guide are intentionally explicit and framework-agnostic. They shroduction agents:

  • Start simple: a minimal agent which uses the R
  • Introduce tools with a registry and explicit JSON plans
  • Use a step alidation at each step
  • Harden with retries, validation, logging, and secure deployment practices

For additional patterns and reference architectures, see . For migration assistance and version updates, consult . For guidance on automated tests and safety checks, see .

Author: ChatGPT AI Hub — Practical developer tutorials and code examples.

Remember: according to your organization’s privacy practices and OpenAI policies, always respect user privacy and data retention rules when using LLMs in production.


Get Free Access to 40,000+ AI Prompts for ChatGPT, Claude & Codex

Subscribe for instant access to the largest curated Notion Prompt Library for AI workflows.

More on this