Key takeaways
- A passing dbt build only proves SQL compiled and structural tests held — it says nothing about whether the numbers match business reality.
- Standard pipeline tests are syntactic (unique, not_null). The failures that reach dashboards are semantic: logical drift, late data, silent source changes.
- Catch logical drift with custom generic dbt tests and Snowflake-native checks at the ingestion boundary, before bad loads reach your models.
- Define each metric once in a semantic layer and serve Power BI from a certified model — local DAX is where a single KPI quietly forks into three.
A passing data pipeline is not the same thing as correct data. A successful dbt build guarantees that your SQL compiled, your tables materialized, and your primary keys stayed unique and non-null. It does not verify that the raw data matches physical business reality, and it does nothing to stop analysts from writing divergent, ad-hoc metric definitions downstream in the BI tool.
The short answer: standard pipeline validation is syntactic, not semantic. It checks shapes and types, not meaning. The discrepancies that reach an executive dashboard — revenue that’s off by a region, an order total that doesn’t match its line items, a number that disagrees with the CRM — slip straight through structural tests. To catch them you need data quality gates that read the numbers, plus a single semantic layer so the metric is defined once and consumed everywhere.
Why does a dbt build pass when business metrics are wrong?
We see the same silent failure mode across the DACH mid-market. You open the monitoring dashboard: the orchestration DAG is green, the Snowflake warehouse reports zero query errors, dbt ran without a single warning. Yet sales ops flags that the weekly revenue figure in Power BI is lower than what the CRM records.
Nothing is broken in the engineering sense. That’s exactly the problem. If your pipeline only runs structural assertions like unique or not_null, there’s a blind spot between database success and analytical accuracy — and it’s where trust goes to die.
Same load, two checks
The schema is valid in both columns — only the semantic gate reads the numbers.
| Order | Σ items | Header total | Syntactic check | Semantic gate |
|---|---|---|---|---|
| ORD-9081 | €120.00 | €120.00 | Pass | Pass |
| ORD-9082 | €45.00 | €45.00 | Pass | Pass |
| ORD-9083 | €310.00 | €15.00 | Pass | Quarantined |
| ↳ assert_sums_match: Itemized sum is €310.00 but the order header records €15.00. | ||||
| ORD-9084 | €85.00 | €85.00 | Pass | Pass |
Every row above has valid types and no nulls, so a schema-only test reports a clean run. But ORD-9083 carries an order header of €15.00 against €310.00 of line items. The pipeline is green; the dashboard is wrong.
Architecture of a data quality gate in dbt and Snowflake
To stop logical discrepancies from corrupting dashboards, put automated guardrails at the boundaries of the warehouse instead of letting raw data flow unimpeded from source to semantic layer. We use three kinds of gate.
Volumetric drift
Track whether the daily ingested record count deviates from its moving historical average. A source that silently halves its load rarely throws an error — it just quietly understates everything downstream.
Relational integrity
Assert that values stay aligned across independent systems — payment-gateway logs against ERP orders, line items against order headers — so a mismatch is caught at ingest, not in a board deck.
Semantic tolerance
Halt downstream model refreshes when a core metric drifts outside a defined historical band. The gate fails loudly so a human looks before the number ships.
Building an advanced semantic test in dbt
Here’s a concrete implementation: a custom generic test that checks relational integrity across two models — validating that the sum of line-item amounts matches the transaction header, within a tolerance.
Create the test at tests/generic/assert_sums_match.sql. A generic test returns the failing rows, so the test fails whenever the query yields anything:
{% test assert_sums_match(model, column_name, compare_model, compare_column, group_by_column, tolerance=0.01) %}
with source_totals as (
select
{{ group_by_column }} as join_key,
sum({{ column_name }}) as primary_sum
from {{ model }}
group by 1
),
compare_totals as (
select
{{ group_by_column }} as join_key,
sum({{ compare_column }}) as reference_sum
from {{ compare_model }}
group by 1
),
discrepancies as (
select
s.join_key,
s.primary_sum,
c.reference_sum,
abs(s.primary_sum - c.reference_sum) as absolute_difference
from source_totals s
join compare_totals c on s.join_key = c.join_key
where abs(s.primary_sum - c.reference_sum) > {{ tolerance }}
)
select * from discrepancies
{% endtest %}
Then reference it from models/schema.yml so it runs on every deployment. Tag it so you can select just the gates in CI:
version: 2
models:
- name: fct_order_items
description: "Granular line items for customer orders."
columns:
- name: line_item_amount_eur
description: "Net amount of the individual line."
tests:
- assert_sums_match:
compare_model: ref('fct_orders')
compare_column: total_order_amount_eur
group_by_column: order_id
tolerance: 0.05
tags: ['dq_gate']
Quarantine at the ingestion layer with Snowflake
Running every semantic check inside dbt means anomalous rows have already landed in your modelled tables. For high-volume sources, push the cheapest gates upstream using Snowflake Streams and Tasks: a stream tracks new rows on the raw landing table, a scheduled task evaluates the integrity rule, and failing batches get routed to a quarantine table instead of flowing on. You spend less warehouse compute reprocessing bad loads, and dbt only ever sees data that already cleared the first gate.
Closing the loop: one semantic layer for Power BI
Even with clean data in Snowflake, metrics still drift the moment analysts compute their own KPIs locally in Power BI with custom DAX. One “revenue” measure quietly becomes three, each subtly different, and nobody can say which is right. Clean warehouse data is necessary but it is not enough — the definition has to be centralized too.
Define the metric once with the dbt Semantic Layer
With the dbt Semantic Layer you declare each measure once, in the code repository, and every consumer reads the same definition:
semantic_models:
- name: revenue_performance
model: ref('fct_orders')
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
measures:
- name: total_revenue
agg: sum
expr: total_order_amount_eur
metrics:
- name: sales_revenue
label: "Sales Revenue (EUR)"
type: simple
type_params:
measure: total_revenue
Serve Power BI from a certified model, not raw tables
To connect that centralized logic to Power BI without re-implementing it:
- 01
Enable the XMLA endpoint
On a Power BI Premium or Fabric workspace, turn on XMLA read/write so external tools can deploy and manage semantic models programmatically rather than through point-and-click.
- 02
Report from one certified semantic model
Build reports against a single, certified Power BI semantic model instead of importing raw database tables into each report. One definition, many reports — no per-report DAX forks.
- 03
Version definitions in Git with TMDL
Deploy and track the semantic schema as code using Tabular Model Definition Language (TMDL), edited with Tabular Editor. Metric changes then go through peer review and automated testing before they hit production — the same workflow as the rest of your engineering.
Once metric logic lives in Git-managed models, a change to “revenue” is a pull request, not a silent edit in someone’s local report.
Alerting: catch bad data before a stakeholder does
The last piece is proactive alerting. When a semantic gate fails, the notification should reach the on-call data engineer — not wait for an executive to notice a broken chart. Wire the gate selection into CI so a failure triggers the alert path:
# In the CI pipeline
dbt build --select tag:dq_gate || python scripts/alert_data_team.py --status failure
The alert script reports the exact failure so whoever picks it up already knows what broke:
import os
import requests
def send_teams_alert(failed_test_name, affected_rows):
webhook_url = os.environ["TEAMS_WEBHOOK_URL"]
payload = {
"title": "🚨 Data quality gate failure",
"text": (
f"Semantic validation **{failed_test_name}** failed on Snowflake. "
f"Found **{affected_rows}** mismatched records. Downstream refreshes halted."
),
}
requests.post(webhook_url, json=payload, timeout=10)
Restoring trust in the numbers
When a business loses faith in its analytics, the failure mode is predictable: teams revert to manual spreadsheets, data silos reform, and strategic alignment quietly erodes. Closing that gap means moving past syntax validation to automated semantic gates and a single, governed metric layer. It’s the same operational discipline that decides whether an AI pilot ever reaches production — the model, like the green build, is rarely where the project actually fails.
We design and implement cloud data warehouses, pipeline automation, and BI architectures for DACH mid-market companies — with data quality gates and semantic layers built in from the start, not bolted on after the first wrong board deck.
// SOURCES
- Add data tests to your DAG — dbt Labs, 2025
- About MetricFlow and the dbt Semantic Layer — dbt Labs, 2025
- Introduction to Streams — Snowflake, 2025
- Semantic model connectivity with the XMLA endpoint — Microsoft, 2026
- Tabular Model Definition Language (TMDL) overview — Microsoft, 2025
Frequently asked questions
-
Why does a dbt build pass when the business metrics are wrong?
A successful dbt build only guarantees that your SQL compiled, tables materialized, and structural tests (unique, not_null, relationships) held. It does not verify that the values match physical business reality, and it does not stop analysts from writing divergent metric definitions downstream in the BI tool. Those are semantic failures, and standard pipeline tests are syntactic. -
What is a data quality gate?
A data quality gate is an automated check placed at a boundary in your warehouse — typically ingestion or pre-publish — that halts or quarantines data when it violates a logical rule, not just a structural one. Examples: daily volume deviating from its moving average, transaction totals that disagree across two source systems, or a core metric drifting outside historical tolerance. -
How do you write a semantic test in dbt?
Use a custom generic test: a parameterized SQL macro in tests/generic/ that returns rows representing failures. For relational integrity you join two models on a key, compare aggregated values, and return any pair whose absolute difference exceeds a tolerance. You then attach the test to a column in schema.yml so it runs on every build. -
How do you stop metric drift in Power BI?
Define each measure once in a centralized semantic layer (the dbt Semantic Layer, or a single certified Power BI semantic model) and have all reports consume it rather than recomputing KPIs in local DAX. Manage those definitions in Git via TMDL so changes go through review and testing before production.
Was this helpful?