FastAPI under load: async, sync, and DI
The three settings that matter between the first deploy and the hundredth.
FastAPI is fast enough that tuning it feels almost wrong. Most services I've worked on sustain their first few thousand requests per second without anyone changing a setting. Then something shifts: a new sync dependency, a slow upstream, a burst of concurrent clients. The process stops using its CPU budget despite sitting near idle on the graphs. The event loop is blocked, or the threadpool is full, or every request is rebuilding a client it didn't need to rebuild.
This is a short collection of the FastAPI patterns that survive that kind of traffic. Nothing here is exotic. The defaults are reasonable for a prototype and slightly wrong for a production service.
async and sync endpoints
FastAPI supports both async def and def endpoints in the same application. They aren't interchangeable.
async defendpoints run directly on the event loop. They're cheap to multiplex, and the framework can interleave thousands of in-flight requests on a single worker.defendpoints are dispatched viaanyio.to_thread.run_sync. Each call takes a worker from a bounded threadpool. When the pool is full, further sync calls queue.
The failure mode most teams hit first is calling a blocking library from inside an async def: time.sleep, requests, psycopg2 in sync mode, a synchronous SDK from a cloud provider. Any of these stalls the event loop for the duration of the call, and every other in-flight request on that worker waits. Latency drops in steps rather than degrading gracefully.
The inverse problem is less common but worth naming. Putting a heavy sync dependency on an async def handler gives you the worst of both worlds. The handler reports as async, observability counts it as async, and the dependency silently hijacks the event loop on every request.
Rule of thumb:
- If every I/O call in the handler has a native async driver, the endpoint should be
async def. - If the handler needs a sync library (most often a legacy SDK), keep the whole handler
defand let the threadpool own it. - Don't mix. A partially-async handler that blocks once is worse than a fully sync one that blocks predictably.
The threadpool limit
Sync endpoints and sync dependencies share a process-wide threadpool, managed by AnyIO. The default limit is 40.
Forty is a reasonable ceiling for a service that does mostly async work and occasionally drops into a sync library. It isn't reasonable for a service whose endpoints are sync and spend most of their time waiting on a database or an upstream HTTP call. Once the limit is reached, requests queue, latency rises, and the process still looks idle on CPU graphs.
Raise it in the app's lifespan, before the first request is served:
from contextlib import asynccontextmanager
import anyio
from fastapi import FastAPI
@asynccontextmanager
async def lifespan(app: FastAPI):
limiter = anyio.to_thread.current_default_thread_limiter()
limiter.total_tokens = 200
yield
app = FastAPI(lifespan=lifespan)
Pick the number based on downstream capacity, not ambition. If the database pool holds 50 connections, 200 threads will happily line up to wait for those 50 connections. The bottleneck has moved rather than disappeared. Size the threadpool to roughly match the tail of concurrency the downstream can absorb.
Instrument it. A gauge on limiter.statistics().tasks_waiting tells you whether the ceiling is being paid for. If it's always zero, the limit is fine. If it climbs under load, you either need more threads or fewer sync calls.
DI patterns that keep requests cheap
FastAPI's Depends system resolves dependencies per request. That's the right default for request-scoped values. It's the wrong default for anything expensive to construct.
Hoist long-lived resources into the lifespan
Database pools, HTTP clients, Kafka producers, and Redis connections should be constructed once at startup and torn down at shutdown. Hand them to requests through app.state or a dependency that reads from it.
import httpx
from contextlib import asynccontextmanager
from fastapi import Depends, FastAPI, Request
@asynccontextmanager
async def lifespan(app: FastAPI):
app.state.http = httpx.AsyncClient(
timeout=5.0,
limits=httpx.Limits(max_connections=100),
)
yield
await app.state.http.aclose()
app = FastAPI(lifespan=lifespan)
def get_http(request: Request) -> httpx.AsyncClient:
return request.app.state.http
@app.get("/status")
async def status(http: httpx.AsyncClient = Depends(get_http)):
r = await http.get("https://upstream/health")
return {"upstream": r.status_code}
Two things are happening here. The HTTP client's connection pool is reused across all requests, so there's no TLS handshake on the hot path. And the dependency itself is trivial: one attribute lookup, no I/O.
Cache settings and construction-heavy objects
Settings objects, config parsers, and anything that reads from disk or the environment at build time should be memoized. functools.lru_cache is enough:
from functools import lru_cache
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
database_url: str
api_key: str
@lru_cache
def get_settings() -> Settings:
return Settings()
This pattern drops into Depends(get_settings) directly. The first request constructs the object; every subsequent request returns the cached instance.
Keep request-scoped dependencies small
Authentication, authorization, request IDs, and the current database session belong in per-request dependencies. Anything else doesn't.
Test: if the dependency would return the same value for every request in the same process, it doesn't need to be per-request. Hoist it.
Use yield for teardown, not construction
async def db_session(request: Request):
async with request.app.state.db_pool.acquire() as conn:
yield conn
The async with releases the connection back to the pool after the response is sent. This pattern is for checking resources out of a pool and returning them, not for constructing new ones on each request.
Be deliberate about Pydantic on hot paths
Pydantic v2 is fast, but it isn't free. On endpoints that accept large or deeply nested payloads, profile before assuming the validation cost is negligible. For internal service-to-service endpoints where the caller is another service under your control, a Body(...) with a lightweight TypedDict or a raw dict is sometimes the right call. For anything external-facing, keep the validation.
A minimal pattern that scales
The shape ends up small:
async defhandlers by default. Drop todefonly when a sync library forces it.- Threadpool limit raised in the lifespan, sized to downstream capacity.
- Long-lived clients constructed in the lifespan, stored on
app.state, exposed via thin dependencies. - Settings memoized with
lru_cache. - Request-scoped dependencies reserved for values that vary per request.
- Pool sizes chosen as a reasoned fraction of the threadpool limit, not picked from a tutorial.
None of it is clever, but it's the difference between a service that scales linearly with workers and one that starts queuing at half its theoretical throughput.
Closing
FastAPI defaults are tuned for the first deploy, not the hundredth. The three settings that matter most for sustained throughput are the async/sync boundary, the threadpool ceiling, and the scope of dependency construction. Get them right early and they don't show up again until the service is an order of magnitude bigger.