One endpoint, many formats

HTTP content negotiation is still doing work nobody remembers giving it.

Most data services end up with parallel endpoints: /panel, /panel.csv, /panel.parquet, /panel.arrow. Each one serves the same data through a different encoder, so you end up maintaining four routes for what's really one resource. The browser wants an HTML rendering, the Python client wants Parquet, Spark wants CSV, and the only thing any of them agree on is the schema.

HTTP already has a way to fold all of that into a single URL. It's called content negotiation. It's been in the spec since 1999, and most services don't bother with it.

The mechanism

Every HTTP client sends an Accept header. It's a weighted preference list of media types the client is willing to receive.

Chrome:

accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Read that as: "ideally HTML or XHTML, otherwise XML at 0.9 quality, otherwise literally anything at 0.8."

curl, by default:

accept: */*

"I'll take anything you've got."

Python's requests:

accept: */*

Same.

The server's job is to look at the Accept header, pick the best representation it can serve, and set a matching Content-Type on the response. One URL, multiple representations, and the client chooses.

It's the same trick csvbase uses. The same URL renders as a table in the browser, comes back as CSV under curl, and lands as a DataFrame when pandas reads it.

What this buys you

One URL per resource. Every tool treats the data as the same thing and decodes it differently.
Documentation stays small. GET /panel covers every client; the format lives in the header rather than in the path.
Clients stop caring about paths. pd.read_csv("http://service/panel") works because pandas sends Accept: */* and the server falls back to CSV.
Caches still work. HTTP caches key on Vary: Accept, so each representation caches independently.

Building it into FastAPI

FastAPI doesn't ship a negotiation helper, but the primitives are all there. A small dependency function handles the matching:

from fastapi import Header, HTTPException

SUPPORTED = [
    "application/vnd.apache.arrow.stream",
    "application/parquet",
    "text/csv",
    "application/json",
]


def negotiate(accept: str | None = Header(default=None)) -> str:
    if not accept or accept == "*/*":
        return "text/csv"
    requested = parse_accept(accept)
    for media_type, _q in requested:
        if media_type == "*/*":
            return "text/csv"
        if media_type in SUPPORTED:
            return media_type
        main_type = media_type.split("/", 1)[0] + "/*"
        match = next((s for s in SUPPORTED if s.startswith(main_type[:-1])), None)
        if match:
            return match
    raise HTTPException(status_code=406, detail="no acceptable representation")


def parse_accept(header: str) -> list[tuple[str, float]]:
    items = []
    for part in header.split(","):
        piece, *params = part.strip().split(";")
        q = 1.0
        for p in params:
            k, _, v = p.strip().partition("=")
            if k == "q":
                try:
                    q = float(v)
                except ValueError:
                    pass
        items.append((piece.strip(), q))
    items.sort(key=lambda x: -x[1])
    return items

406 Not Acceptable is the right failure mode when a client insists on a media type you don't serve. In practice very few clients hit it, since the */* fallback covers almost everything.

Picking the default for Accept: */* is a judgment call. CSV is a reasonable one for a data service, since it has the widest tool support and the fewest surprises. Arrow IPC is the fastest default if your clients are mostly Python data tools. Pick one and document it.

The endpoint

Dispatch on the negotiated media type and reuse the same underlying query:

import io
import duckdb
import pyarrow as pa
import pyarrow.parquet as pq
from fastapi import Depends, FastAPI, Query
from fastapi.responses import StreamingResponse, Response

app = FastAPI()


def fetch_panel(symbols: list[str], start: str, end: str) -> pa.Table:
    placeholders = ",".join(f"'{s}'" for s in symbols)
    return duckdb.sql(f"""
        SELECT ts, symbol, open, high, low, close, volume
        FROM read_parquet('s3://bars/minute/*/*.parquet')
        WHERE symbol IN ({placeholders})
          AND ts BETWEEN TIMESTAMP '{start}' AND TIMESTAMP '{end}'
        ORDER BY ts, symbol
    """).fetch_arrow_table()


def table_to_csv(table: pa.Table) -> bytes:
    from pyarrow import csv
    buf = io.BytesIO()
    csv.write_csv(table, buf)
    return buf.getvalue()


def table_to_parquet(table: pa.Table) -> bytes:
    buf = io.BytesIO()
    pq.write_table(table, buf, compression="zstd")
    return buf.getvalue()


def table_to_arrow(table: pa.Table):
    sink = io.BytesIO()
    opts = pa.ipc.IpcWriteOptions(compression="lz4")
    with pa.ipc.new_stream(sink, table.schema, options=opts) as writer:
        for batch in table.to_batches(max_chunksize=64_000):
            writer.write_batch(batch)
            yield sink.getvalue()
            sink.seek(0)
            sink.truncate()


@app.get("/panel")
async def panel(
    symbols: list[str] = Query(...),
    start: str = Query(...),
    end: str = Query(...),
    media_type: str = Depends(negotiate),
):
    table = fetch_panel(symbols, start, end)
    headers = {"vary": "accept"}

    if media_type == "application/vnd.apache.arrow.stream":
        return StreamingResponse(
            table_to_arrow(table),
            media_type=media_type,
            headers=headers,
        )
    if media_type == "application/parquet":
        return Response(
            table_to_parquet(table),
            media_type=media_type,
            headers=headers,
        )
    if media_type == "text/csv":
        return Response(
            table_to_csv(table),
            media_type=media_type,
            headers=headers,
        )
    # JSON fallback
    return Response(
        table.to_pandas().to_json(orient="records").encode(),
        media_type="application/json",
        headers=headers,
    )

Vary: Accept matters. Without it a shared cache will serve whichever representation it saw first to every later client, regardless of their own Accept header, and negotiation turns into a lottery.

The file-extension escape hatch

The IANA media type registry is incomplete. Parquet doesn't have an official type (application/parquet is a common de-facto choice but isn't registered). Arrow IPC isn't registered in some stacks either, and neither is JSONL. Some tools also send Accept: text/html no matter what they want back; Apache Spark is the usual offender.

The pragmatic fix is a file-extension override:

EXT_TO_MEDIA = {
    ".csv": "text/csv",
    ".parquet": "application/parquet",
    ".arrow": "application/vnd.apache.arrow.stream",
    ".json": "application/json",
}


@app.get("/panel{ext:path}")
async def panel(
    ext: str = "",
    symbols: list[str] = Query(...),
    start: str = Query(...),
    end: str = Query(...),
    media_type: str = Depends(negotiate),
):
    if ext and ext in EXT_TO_MEDIA:
        media_type = EXT_TO_MEDIA[ext]
    # ... same dispatch as before

Now both paths work, and the extension wins whenever it's present:

GET /panel with Accept: text/csv returns CSV
GET /panel with Accept: application/vnd.apache.arrow.stream returns Arrow
GET /panel.parquet returns Parquet regardless of Accept

The extension version is also what you paste into a chat message. "Grab it as .parquet" is a one-line instruction. "Set your Accept header to application/parquet" is not.

What it looks like from the client

A browser sends an Accept that prefers HTML. If the data is structured and small enough you can serve a rendered preview, otherwise return a 406 with a link to the CSV version.

curl:

# default: whatever the server's fallback is
curl -s http://service/panel > panel.csv

# explicit:
curl -s -H 'accept: application/parquet' http://service/panel > panel.parquet

# or via extension:
curl -s http://service/panel.parquet > panel.parquet

pandas:

import pandas as pd

# Accept: */*, server's default (CSV in our config)
df = pd.read_csv("http://service/panel")

# Explicit format via extension:
df = pd.read_parquet("http://service/panel.parquet")

polars:

import polars as pl
import requests

# Arrow IPC via Accept
r = requests.get(
    "http://service/panel",
    headers={"accept": "application/vnd.apache.arrow.stream"},
)
df = pl.read_ipc_stream(r.content)

One endpoint serves four client stacks with four different wire formats on the way out, and nothing on the server is duplicated past the encoder functions themselves.

When this is the wrong tool

Public APIs with versioned contracts. If you need to guarantee a response shape, put the format in the URL. /v1/panel.json is unambiguous in a way that Accept isn't.
Resources that differ in shape per format. If the CSV version is a subset of the Arrow version, with different columns or a different schema, they aren't the same resource and separate URLs make that explicit.
CDNs that ignore Vary. Some caching layers drop Vary: Accept or fold it in lossy ways. Verify before relying on it.
Clients you can't control that send the wrong Accept header. Spark is the canonical offender, which is why the extension escape hatch exists.

Closing

Content negotiation already works in browsers, in curl, and in every HTTP library you're likely to pull in. It mostly doesn't get used because people forget the Accept header is there. One URL per resource is a cleaner API surface than one URL per format, and HTTP handed us the machinery for it two decades ago.

Turn it on and send the Vary header. Keep the extension fallback for clients that can't set headers. Every tool ends up getting the format it wants without the URL having to encode that choice.

← back to writing