httpx.AsyncClient in long-running processes

Four defaults that cost file descriptors, latency, and pool capacity.

httpx is the standard async HTTP client in Python. AsyncClient is stateful: a connection pool, a TLS context, and a set of timeouts and limits attached to it. Its defaults are sized for short scripts. Four of them cause most of the damage in code that runs longer than a script: per-request client construction, missing aclose on shutdown, a 5-second total timeout, and a 100-connection pool ceiling.

None of them surface as a misconfiguration error. They show up as TLS handshakes on the hot path, sockets piling up in TIME_WAIT, requests queueing inside the client, and cleanup that never runs at SIGTERM.

This is the setup I use for any process that outlives a single request: services, workers, daemons, long-lived scripts.

Two wrong ways to own the client

Per-request

async def fetch_user(user_id: str):
    async with httpx.AsyncClient() as client:
        r = await client.get(f"https://upstream/users/{user_id}")
        return r.json()

Each call builds its own connection pool and TLS context, opens a connection to the upstream, and closes everything on exit. Two costs follow.

The first is latency. TCP and TLS handshakes are paid on every request. Depending on RTT and cipher choice, that adds tens to low hundreds of milliseconds to every call.

The second is socket pressure. Connections closed by the client enter TIME_WAIT on the local side and hold their 4-tuple (local IP, local port, remote IP, remote port) until the kernel's TIME_WAIT timeout expires (60 seconds on Linux by default). Under sustained RPS against a single upstream, TIME_WAIT entries accumulate faster than the kernel reaps them. The process eventually runs out of ephemeral local ports for new connections to that upstream, or hits EMFILE: too many open files against the global descriptor limit.

Global, never closed

client = httpx.AsyncClient()  # module-level

async def fetch_user(user_id: str):
    r = await client.get(f"https://upstream/users/{user_id}")
    return r.json()

The pool is reused, which removes both costs above. But nothing closes it. At process shutdown the keepalive connections are dropped without their TCP/TLS close handshakes, and any in-flight streams are abandoned mid-read. Python emits ResourceWarning for the unclosed sockets when they're garbage-collected; the default warning filter hides those in production but pytest surfaces them, so the symptom is usually first noticed as warning noise on every test teardown.

The fix in both cases is to tie the client's lifetime to the process, not to a request.

Two defaults that bite throughput

5-second total timeout

httpx.AsyncClient() applies a 5-second timeout to each phase of the request: connect, read, write, and pool acquisition. The read timeout is the one that bites, and it cuts off any slow upstream: paginated exports, model inference, LLM completions, vendor APIs that occasionally stall. The failure surfaces as ReadTimeout on the client side with nothing in the upstream's logs to correlate against.

Set timeouts per phase instead:

timeout = httpx.Timeout(
    connect=5.0,
    read=30.0,
    write=10.0,
    pool=5.0,
)

connect covers a downed host. read should be the upstream's p99 with headroom. write covers large request bodies. pool is how long a request waits for a free connection inside the client before raising PoolTimeout. Keep it short so pool exhaustion fails fast rather than stacking requests behind the cap.

100-connection pool

Defaults are max_connections=100, max_keepalive_connections=20, keepalive_expiry=5.0. A service fanning out to a single upstream at high concurrency hits the cap. At the cap, requests don't fail; they wait inside the client for the pool timeout and only then raise. Latency rises while CPU stays flat and the upstream looks fine.

Size the pool to the concurrency you expect to serve. For 500 concurrent calls against an upstream, max_connections=600. For short, idempotent calls, raise max_keepalive_connections and keepalive_expiry so TLS sessions are reused across requests instead of being torn down after five seconds of idle.

Response lifecycle and where memory goes

A regular await client.get(url) reads the body into response.content before returning and releases the connection back to the pool. There is no leak on the response itself; if the Response object is unreferenced it gets collected and its bytes go with it.

Streaming is different. With client.stream(), httpx returns control to you with the connection still checked out and the body unread:

async def fetch_export():
    async with client.stream("GET", "https://upstream/export") as response:
        async for chunk in response.aiter_bytes():
            process(chunk)

The async with is what releases the connection back to the pool. Skip it, exit through an exception the caller swallows, or use client.send(request, stream=True) and forget aclose(), and the connection stays checked out indefinitely. It still counts against max_connections. Repeat that mistake under load and the pool runs dry; unrelated code paths start blocking on pool timeouts because the budget is gone.

Memory grows along two distinct paths, neither of which involves httpx buffering bytes faster than the consumer reads them:

A Response retained after the request completes (stored in a cache, attached to a long-lived task, captured by a closure) keeps its .content bytes alive. For non-streaming responses to large endpoints, this scales with however many Response objects you hold.
A streaming Response partially consumed and then abandoned holds its partial buffer plus the connection. For long-lived streams (server-sent events, log tails, anything where the upstream keeps writing) the consumer paces the read through TCP backpressure, so a slow consumer doesn't grow buffers without bound. A stalled consumer holds the connection open and whatever it has already buffered in memory until the upstream times out or you remember to close.

The close contract is the same when you bypass the context manager:

async def fetch_export():
    response = await client.send(request, stream=True)
    try:
        async for chunk in response.aiter_bytes():
            process(chunk)
    finally:
        await response.aclose()

Either form works. Any path that opens a stream must close it on every exit, including exceptions.

What this means for a service

In a service, the four defaults compound. A shared pool means a few leaked streams degrade unrelated handlers. The 5-second total timeout cuts requests that would have completed in 6 or 7. Per-request construction multiplies handshakes by request rate and accumulates TIME_WAIT sockets. A missing aclose makes SIGTERM a partial leak instead of a clean exit.

Construct the client in the application's lifespan, configure it for the upstream, and inject it into handlers:

import httpx
from contextlib import asynccontextmanager
from fastapi import Depends, FastAPI, Request


@asynccontextmanager
async def lifespan(app: FastAPI):
    timeout = httpx.Timeout(connect=5.0, read=30.0, write=10.0, pool=5.0)
    limits = httpx.Limits(
        max_connections=600,
        max_keepalive_connections=200,
        keepalive_expiry=30.0,
    )
    app.state.http = httpx.AsyncClient(timeout=timeout, limits=limits)
    try:
        yield
    finally:
        await app.state.http.aclose()


app = FastAPI(lifespan=lifespan)


def get_http(request: Request) -> httpx.AsyncClient:
    return request.app.state.http


@app.get("/users/{user_id}")
async def get_user(user_id: str, http: httpx.AsyncClient = Depends(get_http)):
    r = await http.get(f"https://upstream/users/{user_id}")
    r.raise_for_status()
    return r.json()

The same boundary exists outside FastAPI. Celery has worker_init and worker_shutdown. A bare asyncio.run() daemon has the entry coroutine. APScheduler has its scheduler lifecycle. The mechanism is different in each; what matters is constructing the client before any work runs and closing it after the last.

Two refinements worth adding once the basic shape is in place:

One client per upstream when SLAs differ. Sharing a client couples failure modes, since a slow vendor consumes connection slots that a fast vendor needs. Separate clients keep their pool budgets independent.
For request-scoped headers or tracing context, use client.build_request() and client.send(). Building a new AsyncClient to attach a header reintroduces per-request construction.

Closing

The corrected setup is one AsyncClient per upstream, attached to the process lifespan, with httpx.Timeout and httpx.Limits set explicitly for the upstream's p99 and the service's actual concurrency, and every stream closed on every exit. The defaults are sized for scripts; anything longer-lived needs those four numbers chosen on purpose.

← back to writing