Arrow IPC over HTTP
Streaming columnar data straight into pandas, polars, and duckdb.
Internal data services default to JSON because JSON is the obvious choice. For a numeric payload (a panel of prices or a matrix of features) that default is expensive. A 100k-row × 10-column float panel serializes to roughly 40 MB of JSON, takes close to a second to parse on the client, and arrives with every type flattened to a string. Timestamps come back as strings, Decimal as float, integers as floats once they round-trip through the encoder.
Apache Arrow is the columnar in-memory format that pandas, polars, duckdb, and most modern data tooling already use internally. Arrow IPC is its wire format. Using it over HTTP means the thing on the wire and the thing in the client's process are the same shape, and the parse step effectively disappears.
What Arrow IPC is
Arrow IPC has two wire formats:
- Stream format. Record batches written one after another, no footer. Good for streaming responses.
- File format. Record batches plus a footer with offsets, seekable. Good for static files.
For HTTP, the stream format is usually what you want. The content types are application/vnd.apache.arrow.stream and application/vnd.apache.arrow.file respectively.
The key property is deserialization cost. When a client reads an Arrow stream, it reconstructs an Arrow Table by pointing at the bytes it already received. There's no row-by-row parse. Handing that table to polars or duckdb is another zero-copy step, because they all share the same columnar memory layout.
There's also Arrow Flight, a gRPC-based protocol purpose-built for this. It's the right tool if you control both ends of the wire and want bidirectional streaming, authentication via gRPC interceptors, and server discovery. For most internal data services, plain Arrow IPC over HTTP is the simpler call. It goes through the same load balancers, auth layers, and CDNs as every other HTTP endpoint, and clients are requests.get(...) away.
Serving it from FastAPI
import io
import pyarrow as pa
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
def build_table() -> pa.Table:
return pa.table({
"ts": pa.array([1, 2, 3], type=pa.timestamp("ns")),
"symbol": ["SPY", "QQQ", "IWM"],
"close": [502.1, 438.7, 208.3],
})
def arrow_stream(table: pa.Table):
sink = io.BytesIO()
opts = pa.ipc.IpcWriteOptions(compression="lz4")
with pa.ipc.new_stream(sink, table.schema, options=opts) as writer:
for batch in table.to_batches(max_chunksize=64_000):
writer.write_batch(batch)
yield sink.getvalue()
sink.seek(0)
sink.truncate()
@app.get("/prices")
async def prices():
table = build_table()
return StreamingResponse(
arrow_stream(table),
media_type="application/vnd.apache.arrow.stream",
)
Why a sync generator from an async endpoint
The shape is worth pausing on: async def handler, sync def generator, StreamingResponse. That's deliberate.
Starlette's StreamingResponse accepts both sync and async iterables. When you pass a sync generator, Starlette runs each __next__ call in the threadpool via iterate_in_threadpool. The event loop is never blocked, even though the generator itself is sync.
That matches the shape of the work. Arrow serialization (batching, LZ4 compression, writing to BytesIO) is CPU-bound Python with no await surface. A sync generator is the honest signature. Wrapping it in async def would add syntax without adding concurrency; the GIL still serializes the work.
The endpoint handler stays async def because routing is async and table construction (either build_table() or a duckdb fetch_arrow_table()) returns fast. If the data source is itself async, like streaming rows from asyncpg or polling an async client, the generator should be async def and yield between awaits. The StreamingResponse call is the same.
Rule of thumb: match the generator's sync/async-ness to the data source, not to the endpoint. Starlette will bridge the two.
Other details worth naming
StreamingResponseoverResponsemeans the client can begin decoding before the server has finished writing. For large payloads that matters; for small ones it's harmless.- Yielding inside the batch loop (with
sink.seek(0)+truncate()to reset the buffer) is what makes this stream. Yielding once after thewithblock exits would buffer the entire response before sending a byte. max_chunksize=64_000is a reasonable default for batch size. Too small and the per-batch overhead dominates; too large and the client can't start work until the batch completes.compression="lz4"is the right default for internal traffic. LZ4 costs almost nothing on both sides and typically shrinks numeric payloads by 3-5×. ZSTD compresses more but is slower; use it when bandwidth is the bottleneck rather than CPU.
Consuming it
The client side is short in every language that matters.
pandas:
import requests
import pyarrow as pa
r = requests.get("http://service/prices", stream=True)
reader = pa.ipc.open_stream(r.raw)
df = reader.read_all().to_pandas()
polars:
import polars as pl
import requests
r = requests.get("http://service/prices")
df = pl.read_ipc_stream(r.content)
duckdb:
import duckdb
import pyarrow as pa
import requests
r = requests.get("http://service/prices", stream=True)
table = pa.ipc.open_stream(r.raw).read_all()
con = duckdb.connect()
con.register("prices", table)
result = con.sql("SELECT symbol, avg(close) FROM prices GROUP BY 1").df()
For browser clients there's apache-arrow on npm, but the library isn't small (~150 KB gzipped) and the cost-benefit tips against it once payloads are under a few megabytes. Ship JSON there and keep Arrow for service-to-service traffic.
A concrete example: price panel service
A typical shape for a hedge-fund data service is a minute-bar endpoint. Quants hit it from notebooks, strategies hit it from research jobs, and risk systems read it from batch pipelines. Same endpoint, different client libraries on the receiving end.
Data sits in partitioned Parquet on S3. DuckDB reads it directly. The handler assembles an Arrow table and streams it.
import io
import duckdb
import pyarrow as pa
from fastapi import FastAPI, Query
from fastapi.responses import StreamingResponse
from contextlib import asynccontextmanager
@asynccontextmanager
async def lifespan(app: FastAPI):
app.state.db = duckdb.connect()
app.state.db.sql("INSTALL httpfs; LOAD httpfs;")
yield
app.state.db.close()
app = FastAPI(lifespan=lifespan)
def fetch_panel(db, symbols: list[str], start: str, end: str) -> pa.Table:
placeholders = ",".join(f"'{s}'" for s in symbols)
return db.sql(f"""
SELECT ts, symbol, open, high, low, close, volume
FROM read_parquet('s3://bars/minute/*/*.parquet')
WHERE symbol IN ({placeholders})
AND ts BETWEEN TIMESTAMP '{start}' AND TIMESTAMP '{end}'
ORDER BY ts, symbol
""").fetch_arrow_table()
def arrow_stream(table: pa.Table):
sink = io.BytesIO()
opts = pa.ipc.IpcWriteOptions(compression="lz4")
with pa.ipc.new_stream(sink, table.schema, options=opts) as writer:
for batch in table.to_batches(max_chunksize=64_000):
writer.write_batch(batch)
yield sink.getvalue()
sink.seek(0)
sink.truncate()
@app.get("/panel")
async def panel(
symbols: list[str] = Query(...),
start: str = Query(...),
end: str = Query(...),
):
table = fetch_panel(app.state.db, symbols, start, end)
return StreamingResponse(
arrow_stream(table),
media_type="application/vnd.apache.arrow.stream",
)
On the quant side, consuming that endpoint is four lines:
import polars as pl
import requests
params = {"symbols": ["SPY", "QQQ"], "start": "2026-01-01", "end": "2026-01-31"}
df = pl.read_ipc_stream(requests.get("http://panel-service/panel", params=params).content)
No schema file or custom deserializer needed. ts arrives as a Datetime, close as a Float64, symbol as a dictionary-encoded string.
Wire shape
client ─── GET /panel?symbols=...&start=...&end=... ───▶ FastAPI
│
├─ duckdb.fetch_arrow_table()
│ (S3 Parquet → Arrow in RAM)
│
├─ RecordBatchStreamWriter
│ ├─ schema
│ ├─ batch 1 (64k rows)
│ ├─ batch 2 (64k rows)
│ └─ EOS
│
client ◀── application/vnd.apache.arrow.stream ────────┘
│
└─ pl.read_ipc_stream(body) ──▶ DataFrame (zero-copy)
Size and latency
Same payload, 250 symbols × 390 minute bars × 8 numeric columns, on a 1 Gbps internal link. Numbers are illustrative of the common case, not benchmarks of any particular setup:
size parse total round-trip
JSON (records) ~45 MB 900 ms 1.4 s
JSON (split) ~30 MB 620 ms 0.9 s
CSV ~18 MB 420 ms 0.8 s
Arrow IPC (uncompressed) ~9 MB 15 ms 0.25 s
Arrow IPC (lz4) ~4.2 MB 25 ms 0.20 s
The parse column is the one that moves hardest. Network savings are incidental. What matters is the client spending milliseconds rather than seconds turning bytes back into a DataFrame.
When not to use it
- Tiny payloads. Below a few kilobytes the HTTP overhead dominates and JSON is fine.
- Browser clients where the Arrow.js bundle is larger than the data it would decode.
- Debugging endpoints.
curl | jqdoesn't work on Arrow. Add a?format=jsonquery parameter for debuggability; the branch is cheap. - Public APIs where callers can't be assumed to have an Arrow client. Offer both content types via
Acceptheader negotiation if you need to support them.
Closing
Arrow IPC over HTTP is a pragmatic middle ground. The protocol, transport, and auth all stay the same; only the response body changes. Once the data platform speaks Arrow end-to-end, serialization drops off the latency budget.