Clean The Data Swamp With Data Governance

Data governance is impossible when domain ownership is unclear and quality rules don’t exist. Start by defining standards in one domain.
Data governance doesn’t collapse because of technology. It fails quietly in meetings where no one admits they don’t know which data sets are trusted. It fails when teams get penalized for breaking SLAs they didn't even know existed. It fails in silence, then reappears as a swamp of duplicated tables and decaying pipelines.
Data governance is broken by design, not behavior
Most CDOs inherit systems stitched together from acquisitions, outsourced builds, and backlog-driven scrambles. Every team thinks they're being pragmatic—copying datasets for quick access, tweaking downstream transformations to match today’s needs. No one plans for entropy. Everyone believes someone else owns data quality.
That belief lingers because ownership is vague. Tools like Collibra or Alation can display metadata. They can map lineage. What they can't do is assign accountability that sticks. Until someone says, “The finance team owns revenue recognition logic in this domain,” meetings will continue ending in actionless agreement.
In a Gartner survey, 87% of organizations said they had high business demand for data governance. Fewer than 30% had implemented even basic quality metrics. The problem isn’t strategy. It’s the absence of workload-based change. Naming ownership risks conflict. So governance becomes architecture theater: frameworks, forums, and federated models with no impact on reliability.
Amazon’s internal shift in consumer business units offers a contrast. They assigned explicit domain ownership and bound teams to SLAs for their critical data tables. If downstream teams noticed freshness or accuracy issues, they could trace problems directly to the accountable domain. That traceability changed behavior. It also allowed platform engineers to scale tooling with confidence. The culture didn’t shift because of a policy. It shifted because one team’s unreliability became another team’s blocker.
Start with one domain and make expectations visible
The first step is small enough that most leaders overlook it: pick a single domain that drives executive decision-making this quarter. Don’t chase alignment across every data product. Make one usable.
Select an area with both high-value outputs and a team willing to co-own definitions. For most firms, marketing attribution or sales pipeline velocity work well. Despite being messy, they have business engagement. You can’t hide poor quality in those domains behind backend abstractions.
Once the domain is selected, define exactly what “quality” means. Data profiling tools like Monte Carlo or Soda can provide baselines. That’s insufficient. You need rules with business consequences. For example:
- Pipeline should be complete by 7 a.m. Eastern every weekday
- No more than 1% of leads can belong to unknown source buckets
- Revenue-stage conversion rates must match Salesforce reports to ±2%
These aren’t hygiene metrics. They're promises. When they’re missed, someone explains why. When they hold, teams build faster without double checks.
LinkedIn’s data mesh experiment surfaced a similar dynamic. By assigning SLOs at the domain level and embedding reliability engineers alongside product analysts, they moved dozens of brittle tables into routinely consumed assets. Trusted outputs triggered investment in better tooling—not the other way around.
Make operational failure observable
Data governance that works is visible. Failure isn’t abstract misalignment—it’s a dashboard turning red. When SLAs break and no one notices, governance dies in perception long before it fails in practice.
To avoid that death, hook quality rules into alerting channels your product and analytics teams already use. PagerDuty, Slack, Jira. Format alerts so they resemble service failures, not passive metrics. Avoid dashboards that require interpretation. A broken SLA should cause a reaction.
The observable nature of failure forces better scoping. Teams stop promising what they can’t measure. They start refusing ownership until dependencies are validated. That refusal feels uncomfortable—but it’s better than yes-shaped noise. Over time, operationalized ownership becomes normal. You’ll see fewer “quick fixes” to dashboards, and more fixes to the underlying pipeline.
Snowflake's internal data team uses pipeline observability to prioritize platform improvements. When an ingestion job fails upstream and breaks an executive dashboard, it’s not just an outage—it’s a roadmap input. That feedback loop relies on clarity. It only works when failure is seen, acknowledged, and attributed early.
Governance doesn’t scale until it changes behavior
No organization skips the mess. Maturity models often pretend there’s a linear path from chaos to compliance. Real data organizations loop. They try to centralize, then federate, then centralize again. The pattern only breaks when one domain becomes obviously usable. That anchors belief. Every team starts asking why their data can’t work like that.
So don’t start with policy. Don’t start with structure. Pick a domain. Define the rules. Publish the SLA. Then ask: when this breaks, who cares?
That question reveals whether governance lives in process or practice.

Read next

Data as a Decision Infrastructure
Data Governance Framework Built On Who Decides, Not Who Complies
Most data governance programs produce policies nobody enforces. Fix that by assigning named owners with explicit decision rights before you buy a single tool.
4 min read

Data as a Decision Infrastructure
Data Governance Is Your Margin of Error
Checklist governance is stage props. Real governance is guardrails welded into every data decision — the difference between absorbing a failure and folding…
4 min read

Data as a Decision Infrastructure
Data Work Is Political
Every metric has a power base. Cleaning up lineage and aligning definitions isn't governance work — it's a turf war. Here's why data teams keep losing it.
3 min read