The end of glue code

Why most of data engineering is glue, and why that is about to change.

Most of what data engineers actually do is glue.

By glue I mean the small pieces of code that translate between systems. The cron pulling vendor data into your warehouse. The reader importing a partner's CSV. The PDF parser nobody will ever replace with a form. The bridge between two SaaS tools whose schemas almost match but don't. The reconciliation script for two reports that should have been wired together and never were. Not the systems on the resume; the brittle code connecting them.

Not every team works this way. Stripe doesn't spend most of its hours on glue. Anthropic doesn't either. Plenty of product companies put their effort into the surface customers see. Funds, alt-data shops, BI platforms, e-commerce intelligence, regulated industries running on legacy stacks; for them, glue is the work. Those organizations are most of the data-engineering market.

The economics are funny. Each piece is small. Each is owned by someone, but barely. None of it is a project. The cost is spread across thousands of tasks that never appear on a roadmap. CFOs know the headcount; nobody knows the line item.

It is normal for a team to accumulate hundreds of these without anyone deciding to, and to find a handful of senior engineers permanently absorbed in patching whatever broke this month.

The next decade will be mostly about removing this layer.

What changed

Agents have existed for a few years. Most engineers I respect are wary of the breathless versions of the story, and they should be. The change is narrower than the marketing claims. On routine work, pulling tabular data from a non-hostile source, parsing a moderately structured PDF, bridging two SaaS tools, the smallest unit of glue has dropped from engineer-weeks to engineer-hours. On hard work, multi-factor auth, real-time bidirectional sync, integrations whose business logic is itself disputed, agent-driven approaches still trail hand-coded ones, sometimes badly. The median case is most of the volume, which is why an asymmetric improvement still moves the line.

The mechanism is plain. Work that used to be "write code that interprets this data source" has been replaced, on the median case, by "describe what you want from this data source." The first version is expensive and brittle; the second is cheaper and easier to inspect when it goes wrong.

PDF extraction fits this shape. So does reading legacy spreadsheets where every business unit invented its own column names, bridging two SaaS tools whose schemas almost line up, or the analyst reconciling a quarterly report between two systems nobody got around to wiring together. Different surfaces, the same underlying problem: a semi-structured source, a destination wanting something specific, and a piece of code in the middle whose job is to interpret. The interpretation step is the part getting cheaper.

Runtime, skill, evaluation

When teams take the problem seriously, the same three pieces show up regardless of what kind of glue is being replaced.

The first is a runtime built for agents rather than for tests. Most existing libraries expose primitives designed for human engineers writing deterministic test suites: mechanical, low-level, fine-grained. Hand those primitives to an agent and it makes too many round trips because each step is too small. A runtime designed for agents exposes higher-level operations: "show me what's actionable here," "wait until this region stops changing," "execute this multi-step flow atomically and tell me what changed." For document extraction, that's a layout-aware extractor that surfaces semantic chunks instead of glyph coordinates. For SaaS APIs, it's a wrapper mediating between the surface the API offers and the surface the agent expects. It doesn't need to be intelligent, just careful.

The second piece is that the script becomes a description. A few paragraphs of structured instructions per source: what to look at, what to ignore, what success looks like, what the known traps are. Something between a runbook and a prompt. Call it a skill. Skills are version-controlled, reviewed in pull requests, and authored faster than scripts, sometimes by people who aren't the engineers who'd otherwise write them. A team that previously maintained ten thousand integrations ends up maintaining ten thousand skills, at lower cost per unit.

The third is evaluation, which is where most teams skip a step and come to regret it. Agents drift, skills decay, and sources change in ways nobody notices. Without something catching this on a schedule, you have a system that looks like it works until suddenly it doesn't. The failure mode is worse than the hand-coded version. When a Python script breaks, it usually breaks loudly: stack trace, missing data, someone gets paged. When an agent silently extracts plausible-but-wrong values, the bad data flows into your warehouse and may not surface for weeks. For a fund, that can be catastrophic; plausible-but-wrong is exactly the shape of data you might trade on. Evaluation runs on a cadence and asks whether the system is still producing what you expect, surfacing drift before it becomes downstream damage. You can run a hand-coded pipeline in production without it. You can't run an agent-driven one.

Prior art

This isn't the first attempt to make glue declarative. Diffbot has been doing automatic article extraction since 2012. A long line of low-code integration platforms preceded this generation. RPA, Robotic Process Automation, promised something almost identical for enterprise integration in the 2010s, absorbed billions of dollars, and produced mostly cautionary tales. Anyone claiming this generation is different has to explain what's different.

Previous attempts encoded the interpretation step in advance. Diffbot worked on news because news articles share enough structure for a fixed extractor to generalize; outside that domain it struggled. RPA worked on stable enterprise UIs and broke whenever a system updated. The new ingredient is that the agent does the interpretation at runtime, against a runtime designed to make interpretation tractable. Whether that's a durable advance or a more flexible version of the same wrong bet is something we'll know in three years.

The arithmetic

The economics, when this works, are real, but the headline numbers oversell. A team that kept sixty engineers busy on glue can probably keep fifteen doing the same volume. The other forty-five don't disappear cleanly. Some get redeployed to higher-value work the team always wanted to reach. Some get let go. Which happens depends on the organization, not the technology, and anyone selling this transformation as cleanly redeployment-shaped is selling something. The savings on engineering capacity are partly transferred to inference costs, which at production volume aren't trivial. Total cost of ownership lands lower than hand-coded for most data operations. The gap is narrower than the headline arithmetic implies.

Build versus buy is worth taking seriously. Vendors are converging on roughly this architecture, domain by domain. For most teams, buying a managed version is the right call. Building internally makes sense when control over the runtime matters: deep integration with internal systems, data residency requirements, or sources sensitive enough that routing agent traffic through a third party isn't acceptable. Funds and regulated environments fit that profile. Most teams don't.

What stays hard

Authentication that needs a human in the loop. Phone-tap MFA, hardware keys, compliance flows where someone has to physically click "I attest." Agents don't have phones, hands, or attestable identities, and that won't change quickly.

The legal layer. ToS, GDPR, CCPA, jurisdictional data law. None of this changes because the technology did. If anything, agent-driven traffic is easier for plaintiffs to characterize as bad-faith automation than a hand-tuned script running at human pace.

Sources where the schema itself is fluid and disputed. Some integrations are hard not because the data is hostile but because what the data means isn't agreed on. No agent solves disagreement.

Adversarial sources with active anti-automation engineering. The economics of a fight between an automated client and a vendor-funded detection team don't change because the client is an agent. They might get worse.

These were hard for hand-coded glue too. The new architecture doesn't claim to solve them. It claims that the long, expensive middle of the distribution is moving fast enough to change the cost structure of the work.

Stages

The realization tends to land in stages. The runtime gets taken seriously enough to invest in. The skill layer follows once writing skills feels worth it. The evaluation layer usually arrives later, after someone gets bitten by silent drift. Some teams skip a stage and pay for it. A few have run the full play and have real numbers. I don't have the data for industry-wide claims, only my own experience and the experience of teams I've watched, and the trajectory looks consistent.

Whether this holds up in three years is a real question. RPA didn't. Maybe this won't either. The teams worth watching are the ones whose engineers are doing more glue than they should be, which is most teams in 2026. Sooner or later they'll feel the pressure to figure it out, usually in that order: runtime, then skill layer, then evaluation, then everything else.

← back to writing