8/12/2025

No More Timeouts: How to Build Long-Running MCP Tools That Actually Finish the Job

Alright, let's talk about something that's probably driven you up a wall if you've been building with the Model Context Protocol (MCP). You've designed this brilliant, multi-step tool. It does deep research, it crunches data, it generates a masterpiece of a report. You fire it up in your AI application, you watch the little spinner go... & then... BAM. Timeout. The workflow dies, the user is left hanging, & you're left staring at a frustratingly generic error message.
Honestly, it's one of the biggest bottlenecks when you move from simple, quick-fire tools to creating genuinely powerful, agentic workflows. Most standard MCP hosts & clients just aren't built for tasks that need to think for more than 30 or 60 seconds. But here's the thing: the most valuable work often takes longer than that. Whether it's in-depth market analysis, complex code generation, or a deep dive into a user's support history, the real magic happens when an AI can take its time to do the job right.
So how do you build MCP tools that can run for minutes, or even longer, without everything falling apart? Turns out, you don't need to reinvent the wheel, but you do need to think a little differently about how you structure your server & manage your tasks. We're going to dive deep into the strategies that separate the hobbyist tools from the production-grade, resilient ones. We'll cover everything from clever async workarounds to full-on durable execution that can survive crashes, network hiccups, & pretty much anything else you throw at it.

The Core of the Problem: Why Timeouts Happen in the First Place

Before we jump into solutions, let's quickly break down why this is even an issue. In a basic MCP setup, the communication is often synchronous. Your AI application (like Claude or a custom agent) sends a request to your MCP server, & then it waits. And waits. And waits.
This request-response loop is simple & it works great for tools like
1 getCurrentWeather
or
1 lookUpStockPrice
. But for your
1 conductDeepAnalysisOnQuantumComputingTrends
tool, the AI application is holding the line open, waiting for a final answer. Most systems have a built-in timeout—a safety mechanism to prevent a single, stuck request from hogging resources forever. If your tool takes 55 seconds to run but the client's timeout is set to 30 seconds, it's going to fail every single time.
Some folks' first instinct is to just crank up the timeout value. A GitHub pull request for the N8N platform even discussed making this timeout user-configurable, which is a step in the right direction. It gives you some breathing room. But it's not a real solution; it's a band-aid. What if a task sometimes takes 2 minutes but other times takes 10? Do you set the timeout to 15 minutes & just hope for the best? It’s inefficient & brittle. A misbehaving server could still cause everything to hang for an unacceptably long time.
This is where a more robust architecture comes in.

The Asynchronous Hand-Off: A Smarter Way to Work

The first major leap forward is to stop making the AI wait. Instead of a single, long-running request, you break the interaction into several smaller, quicker ones. This is the asynchronous approach, & it's a game-changer for long-running tasks.
Here’s the core idea: when the AI application wants to run your long-running tool, your MCP server doesn't actually run it right away. Instead, it immediately hands back a task ID.
It's like dropping off a roll of film to be developed (if you remember those days). You don't stand at the counter for an hour. You get a claim ticket, you leave, & you come back later to check on the progress.
A fantastic Medium article by a developer named JIN breaks down a really elegant architecture for this. It involves three key components:
  1. The MCP Server: This is your main orchestration layer. It exposes a few different tools to the AI application, not just one. Instead of a single
    1 do_research
    tool, you'd have a suite of them.
  2. The TaskManager: This is the brain behind the async execution. It lives on your server & is responsible for keeping track of all the tasks that are running in the background.
  3. The DeepResearch Agent (or your long-running logic): This is the actual worker, built on something like LangGraph or your own custom code, that performs the heavy lifting.

A New Set of Tools

The magic of this setup is in the tools you expose through your MCP server. Instead of one big, blocking tool, you provide several smaller, non-blocking ones that work together:
  • 1 start_research(topic)
    : When the AI calls this, your MCP server tells the
    1 TaskManager
    to kick off the research task in the background (using something like
    1 asyncio.create_task()
    in Python). It then IMMEDIATELY returns a unique
    1 task_id
    to the AI. The AI now has its "claim ticket."
  • 1 query_research(task_id)
    : The AI can call this at any time to get a quick status update. Is the task "running," "completed," or "failed"? This is a fast, synchronous call.
  • 1 wait_research(task_id)
    : This is the clever part. The server doesn't just block. It enters a loop, checking the task's progress every couple of seconds. As the background task completes steps ("Generating queries," "Searching web," "Analyzing results"), it reports this progress. The
    1 wait_research
    tool can then stream these updates back to the AI using a feature like
    1 ctx.report_progress()
    . The Medium article calls this "pseudo-streaming," & it's a great way to give the user a sense of progress without true, complex streaming infrastructure.
  • 1 cancel_research(task_id)
    : A crucial but often overlooked tool. This allows the AI (or the user) to kill a task that's taking too long or is no longer needed.
This async pattern completely solves the timeout problem. Each individual MCP call (
1 start
,
1 query
,
1 wait
) is super fast. The long-running work happens completely disconnected from the request-response cycle. The AI can even go do other things & then come back to check on the task later. In a real-world test, a user could start a research task, interrupt the client (Ctrl+C), & the task would keep running on the server, ready to be queried later. PRETTY COOL.

Going Fully Durable: The Unbreakable Workflow

The async approach is fantastic, & for many use cases, it's all you need. But what happens if your server crashes while a task is running? Or what if you need to deploy a new version of your code? With a simple
1 asyncio
setup, that background task is gone forever.
This is where we graduate to enterprise-grade resilience with something called a durable execution engine. Think of it as a supervisor for your tasks that has a perfect memory & is virtually indestructible. The most prominent player in this space is an open-source platform called Temporal.
An incredible article on moving from "AI Hype to Durable Reality" makes a powerful case for why this is the future for serious, agentic AI systems. Companies like Netflix & even OpenAI use Temporal for their own complex, asynchronous operations, like scaling image generation for ChatGPT.
Here’s how it works: instead of your
1 TaskManager
just launching a background task on the same server, it submits the task as a "Workflow" to the Temporal engine. The Temporal engine then takes over completely.
This gives you some almost unbelievable superpowers:
  • Durability & Resilience: A Temporal Workflow will run to completion, no matter what. If the server running the code crashes, the workflow's state is preserved. As soon as a new worker comes online, it will pick up EXACTLY where the last one left off. Network outage? Database flap? The workflow just pauses & automatically retries when things are stable again. It turns fragile processes into crash-proof, replayable workflows.
  • Scalability: With a simple async setup, running hundreds of concurrent research tasks could overwhelm your single MCP server. With Temporal, the MCP server stays thin & lightweight. It just forwards tasks to the Temporal "worker fleet," which can be scaled independently across as many machines as you need. A sudden burst of requests just gets fanned out automatically.
  • Visibility: Temporal gives you a detailed, auditable history of every single step your workflow took. You can see exactly what happened, when it happened, what the inputs & outputs were. This is invaluable for debugging complex, multi-step processes.
In this "Durable Tools" pattern, every single tool you expose via MCP can be implemented as a Temporal Workflow. Your MCP server becomes a simple, stateless client that just starts, signals, & queries these ultra-reliable workflows. It's the ultimate separation of concerns, letting your MCP server focus on communication while the workflow engine handles the heavy lifting of execution.

Weaving It All Together: Practical Best Practices

So, whether you're starting with the async hand-off or going all-in with a durable engine like Temporal, there are a few key best practices to keep in mind:
  1. Break It Down: As a user on Reddit wisely noted, it's always a good idea to break down your monolithic tasks into smaller, logical chunks. This isn't just about avoiding timeouts; it's about better progress reporting. Instead of a single "running" status for 10 minutes, you can report "Step 1: Gathering sources," "Step 2: Analyzing data," "Step 3: Synthesizing report."
  2. Granular Progress is Key: Users get antsy staring at a static loading screen. The "pseudo-streaming" approach of reporting meaningful progress updates is CRUCIAL for a good user experience. LangGraph nodes provide natural checkpoints for this.
  3. Handle Your Errors Gracefully: What happens if a third-party API your tool relies on goes down? What if the model fails to generate a valid response? Your long-running task needs robust error handling. With a durable engine, you can implement automatic retries with back-off for transient failures.
  4. Manage Concurrency: If you're running your own models locally, be careful about how many concurrent tasks you kick off. You might need to implement a queueing system to prevent your resources from being overwhelmed.

Bringing It to the Real World: AI Chatbots That Don't Give Up

This all might sound a bit abstract, but it has HUGE implications for the quality of AI we can deploy, especially in customer-facing applications.
Think about a customer service chatbot on an e-commerce site. A simple question like "What's your return policy?" is easy. But what about a question like, "I've had three different issues with my last two orders, can you analyze my support tickets & tell me what's going wrong & what you can do to fix it for good?"
A standard chatbot would choke on that. The request requires multiple steps: fetch order history, retrieve support ticket data, run a sentiment analysis, synthesize the findings, & then formulate a solution. This is a classic long-running task. A simple, synchronous tool would time out & fail, leaving the customer more frustrated than when they started.
This is where building on a robust, asynchronous foundation becomes a business necessity. This is EXACTLY the kind of problem that platforms designed for sophisticated conversational AI are built to solve. For instance, when you're building with Arsturn, you're leveraging a system designed for these complex, real-world interactions. Arsturn helps businesses create custom AI chatbots trained on their own data, but the magic is in how it handles the conversation flow.
When a user asks a complex question, an Arsturn-powered chatbot can kick off a durable, long-running workflow on the backend without missing a beat. The chatbot can provide an immediate, conversational response like, "That's a great question. Let me dig into your account history & support tickets to get a complete picture. This might take a minute, I'll let you know as soon as I have an answer."
The user isn't left staring at a dead-end error. The system is working for them in the background, reliably executing the multi-step analysis we discussed. When the workflow is complete, the chatbot can proactively deliver the comprehensive answer. This is how you move from a simple Q&A bot to a true AI assistant. It's how Arsturn helps businesses build no-code AI chatbots that not only provide instant support 24/7 but also boost conversions & provide deeply personalized customer experiences by reliably handling even the most demanding background tasks.

Wrapping it up

So there you have it. The secret to building long-running MCP tools that don't timeout isn't about finding a magic "no-timeout" flag. It's about fundamentally rethinking your architecture.
Start with the asynchronous hand-off pattern. It's powerful, relatively straightforward to implement, & will solve the immediate problem for most use cases. Use a
1 TaskManager
& a suite of non-blocking tools (
1 start
,
1 query
,
1 wait
) to give your AI a resilient way to interact with slow processes.
And when you're ready to build truly bulletproof, enterprise-grade systems, look into durable execution engines like Temporal. By offloading your workflows to a system designed for resilience & scale, you can build agents that are not only powerful but also incredibly reliable.
It’s a shift in mindset, for sure. But once you stop thinking in terms of single, blocking calls & start thinking in terms of durable, asynchronous tasks, you can build AI tools that can tackle ANY challenge, no matter how long it takes.
Hope this was helpful! Let me know what you think.

Copyright © Arsturn 2025