OpenAI Responses API and the old assistant migration clock

Why This Became Real Work

A lot of internal AI tools started small. Someone needed faster answers to policy questions, ticket summaries, document search, or routine drafts, and an assistant was built around the older OpenAI patterns. That was reasonable at the time. It looks different now. OpenAI's Assistants FAQ now points builders first to the Responses API, OpenAI-hosted tools, the Agents SDK, and tracing, and says the improvements from the Assistants beta were carried into Responses. The migration guide is more direct: Responses is the future direction for building agents on OpenAI. For a business, that changes the framing. What used to feel like a contained feature is now a legacy integration with a sunset path to plan around.

The dates make that clearer. The FAQ says OpenAI released the building blocks of its Agents platform on March 11, 2025 and described a target Assistants sunset in the first half of 2026 after feature parity. The current migration guide says that, as of August 26, 2025, the Assistants API is deprecated with a sunset date of August 26, 2026. The exact deadline matters, but the bigger issue is direction. New capabilities, current guidance, and pricing all center on Responses. Once the platform moves that decisively, staying on the old pattern is no longer a neutral maintenance choice.

Why This Is More Than an Endpoint Rename

This is not just a switch from one endpoint to another. OpenAI presents Responses as a unified interface for agent-style applications, with built-in tools including web search, file search, computer use, code interpreter, and remote MCP servers. It is designed for multi-turn interactions where teams can pass previous responses forward, keep state with store: true, or reuse encrypted reasoning items in stateless workflows. The migration guide also changes the data model. Responses uses Items rather than the older message-centric structure, and tool calls and tool outputs become separate units linked by call_id. Multi-turn chains can be carried with previous_response_id.

OpenAI does note that Assistant-like and Thread-like objects now exist in Responses. That helps with the landing, but it does not remove the migration work. Teams still have to map assistants, threads, runs, prompts, tool wrappers, and retry logic into the response and item model. Most internal helpers also accumulate quiet assumptions over time: where conversation state lives, how long jobs are allowed to run, how retrieval is attached, and what the system should do when a tool call fails. Those assumptions are usually where a "simple migration" stops being simple.

Where Old Internal Assistants Usually Break

State and job handling are usually the first fault line. Older assistant builds often leaned on persistent threads and runs so application code could stay thin. Responses makes that choice explicit. You can store state across turns, disable storage with store: false, or support Zero Data Retention-style workflows with encrypted reasoning items that are decrypted in memory and then discarded. OpenAI also added background mode. In the Responses feature update, it says reasoning models can take several minutes on complex problems and that background mode exists to handle those tasks asynchronously and more reliably. If an internal assistant does research, analysis, or multi-step operations, migration is where a team has to decide what still belongs in a synchronous request and what should be treated as a managed background job.

Retrieval is the next pressure point. The Assistants FAQ describes fixed file search defaults: 800-token chunks, 400-token overlap, text-embedding-3-large at 256 dimensions, and up to 20 chunks added to context. It also lists hard limits: one vector store per assistant, one vector store per thread, no control over chunking or embedding settings, no image parsing inside documents, and no structured retrieval over CSV or JSONL. The newer Responses announcement adds file search updates that support searches across multiple vector stores and attribute filtering with arrays. That is a real operational difference. It affects how a company partitions knowledge, applies permissions, and verifies that the assistant still pulls the right evidence once production traffic and real users are involved.

Tooling is the third place older implementations tend to feel exposed. The Responses update adds remote MCP server support, which OpenAI describes as a way to connect models to tools hosted on any MCP server with just a few lines of code. That increases what an internal assistant can do, but it also changes the governance conversation. Once the platform encourages broader tool orchestration inside a single request, it is harder to defend loose controls around which tools are available, how they are approved, and how their usage is monitored.

Why This Is Also a Budget and Governance Project

The pricing model makes it clear that this is not just a developer task. OpenAI's pricing page lists web search at $10 per 1,000 calls and says search content tokens are free. The same page also breaks out container pricing and notes a March 31, 2026 change to per-20-minute session pricing. The Assistants FAQ documents Code Interpreter at $0.03 per session and file search storage at $0.10 per GB per day. The newer Responses feature update lists Code Interpreter at $0.03 per container, file search at $0.10 per GB of vector storage per day plus $2.50 per 1,000 tool calls, and no additional cost for the remote MCP tool beyond output tokens from the API. None of that is unmanageable, but it does mean migration has to be treated as an LLM operations and budgeting exercise, not just a code rewrite.

The practical budget question is straightforward: which workflows are allowed to call paid tools, how often, with what ceilings, and where is the spend visible? OpenAI's pricing page also points to monthly budgets, email thresholds, usage tracking, and project-level billing restrictions. Those controls belong in scope. A helper that looks inexpensive in development can become hard to explain in production when model tokens, retrieval storage, tool-call billing, and long-running execution are all mixed together without clear ownership.

Governance needs the same discipline. The Assistants FAQ says data and files sent to the API are not used to train OpenAI's models, but it also says data uploaded to the Assistants API is stored indefinitely until manually deleted. Responses introduces more explicit state choices and adds reasoning summaries at no additional cost, which OpenAI positions as useful for debugging and auditing. For a business system, that matters. If the assistant touches policies, contracts, customer information, or operational documents, migration should cover retention rules, deletion paths, and observability design alongside feature-parity testing.

What a Good Migration Project Actually Includes

A sensible Responses migration usually has five workstreams:

Audit the current assistant setup, including assistants, threads, files, prompts, tool definitions, long-running jobs, and known failure modes.
Map legacy concepts into Responses, especially stored state, Items, tool calls, retrieval flows, and any use of previous_response_id or background execution.
Refactor the operating controls: logging, cost ceilings, usage monitoring, retention rules, and deletion workflows.
Re-test retrieval and tool behavior, particularly where document permissions, file search quality, or long-running analysis affect user trust.
Cut over in a controlled way by comparing outputs, validating costs, and removing brittle assumptions only after the new path proves stable.

That is where Greg's service angle is useful. He can audit an existing assistant, map threads, files, and tools to Responses, refactor job handling and retrieval, add cost and logging controls, and test the cutover so the feature holds up beyond the prototype stage. The value is not "prompt magic." It is disciplined migration work, clearer operations, and fewer expensive surprises after launch.

The strategic point is simple. OpenAI's FAQ, migration guide, feature rollout, and pricing all point in the same direction: Responses is where new agent capability is landing. If a business still relies on an older internal assistant pattern, doing nothing does not preserve stability. It increases the odds that workflows, costs, and governance drift out of step with the platform. That is why this shift turns old internal assistants into a paid migration project.