The Data Context Layer: Why Data Engineering Agents Need More Than Code and Databases

SponsorWorkshop trackconfirmed

The Data Context Layer: Why Data Engineering Agents Need More Than Code and Databases

Day: Day 1 — Workshop Day
Time: 2:20pm-4:20pm
Room: Track 2
Track: Track 2

Accessible with the Engineering + Workshops pass and above.

About this session

Modern AI agents typically understand either code or databases. Code-focused agents reason over files, dependencies, and syntax, while database agents see tables, columns, and query results. This works for software development and basic analytics—but it breaks down for data engineering. In real data environments, agents fail because they lack context: an understanding of how data flows, what it represents, and why it behaves the way it does in production. Introducing the data context layer—a missing third layer that bridges code, data, and business semantics. Without it, agents hallucinate impact, suggest unsafe joins, and struggle with root cause analysis. This presentation will define the data context layer and showcase its use in practice, including end-to-end lineage from sources to reports; semantic metadata such as grain, measures, dimensions and business logic; runtime signals including job executions, failures, and performance patterns; and logical vs. physical modeling distinctions. Attendees will walk away with a greater understanding of: Why the code layer (dbt SQL, manifests, Git history) provides structure but misses grain, aggregation semantics, and join safety Why the data layer (warehouse tables, execution metrics, failures) shows what happened, but not why How the data context layer unifies lineage, semantic metadata, runtime behavior, and business rules The presentation will also cover architecture patterns for building and maintaining a data context layer, including why property graphs are well-suited for contextual reasoning and how agents can query context safely instead of relying on prompt stuffing.

Speaker

Yoni Michael