Why does a dbt build pass when the business metrics are wrong?

A successful dbt build only guarantees that your SQL compiled, tables materialized, and structural tests (unique, not_null, relationships) held. It does not verify that the values match physical business reality, and it does not stop analysts from writing divergent metric definitions downstream in the BI tool. Those are semantic failures, and standard pipeline tests are syntactic.

What is a data quality gate?

A data quality gate is an automated check placed at a boundary in your warehouse — typically ingestion or pre-publish — that halts or quarantines data when it violates a logical rule, not just a structural one. Examples: daily volume deviating from its moving average, transaction totals that disagree across two source systems, or a core metric drifting outside historical tolerance.

How do you write a semantic test in dbt?

Use a custom generic test: a parameterized SQL macro in tests/generic/ that returns rows representing failures. For relational integrity you join two models on a key, compare aggregated values, and return any pair whose absolute difference exceeds a tolerance. You then attach the test to a column in schema.yml so it runs on every build.

How do you stop metric drift in Power BI?

Define each measure once in a centralized semantic layer (the dbt Semantic Layer, or a single certified Power BI semantic model) and have all reports consume it rather than recomputing KPIs in local DAX. Manage those definitions in Git via TMDL so changes go through review and testing before production.

Data quality gates: dbt, Snowflake & Power BI

Key takeaways

A passing dbt build only proves SQL compiled and structural tests held — it says nothing about whether the numbers match business reality.
Standard pipeline tests are syntactic (unique, not_null). The failures that reach dashboards are semantic: logical drift, late data, silent source changes.
Catch logical drift with custom generic dbt tests and Snowflake-native checks at the ingestion boundary, before bad loads reach your models.
Define each metric once in a semantic layer and serve Power BI from a certified model — local DAX is where a single KPI quietly forks into three.

A passing data pipeline is not the same thing as correct data. A successful dbt build guarantees that your SQL compiled, your tables materialized, and your primary keys stayed unique and non-null. It does not verify that the raw data matches physical business reality, and it does nothing to stop analysts from writing divergent, ad-hoc metric definitions downstream in the BI tool.

The short answer: standard pipeline validation is syntactic, not semantic. It checks shapes and types, not meaning. The discrepancies that reach an executive dashboard — revenue that’s off by a region, an order total that doesn’t match its line items, a number that disagrees with the CRM — slip straight through structural tests. To catch them you need data quality gates that read the numbers, plus a single semantic layer so the metric is defined once and consumed everywhere.

Why does a dbt build pass when business metrics are wrong?

We see the same silent failure mode across the DACH mid-market. You open the monitoring dashboard: the orchestration DAG is green, the Snowflake warehouse reports zero query errors, dbt ran without a single warning. Yet sales ops flags that the weekly revenue figure in Power BI is lower than what the CRM records.

Nothing is broken in the engineering sense. That’s exactly the problem. If your pipeline only runs structural assertions like unique or not_null, there’s a blind spot between database success and analytical accuracy — and it’s where trust goes to die.

Same load, two checks

The schema is valid in both columns — only the semantic gate reads the numbers.

Order	Σ items	Header total	Syntactic check	Semantic gate
ORD-9081	€120.00	€120.00	Pass	Pass
ORD-9082	€45.00	€45.00	Pass	Pass
ORD-9083	€310.00	€15.00	Pass	Quarantined
↳ assert_sums_match: Itemized sum is €310.00 but the order header records €15.00.
ORD-9084	€85.00	€85.00	Pass	Pass

A schema test passes every row. Only a relational-integrity gate catches ORD-9083 before it reaches the dashboard.

Every row above has valid types and no nulls, so a schema-only test reports a clean run. But ORD-9083 carries an order header of €15.00 against €310.00 of line items. The pipeline is green; the dashboard is wrong.

Architecture of a data quality gate in dbt and Snowflake

To stop logical discrepancies from corrupting dashboards, put automated guardrails at the boundaries of the warehouse instead of letting raw data flow unimpeded from source to semantic layer. We use three kinds of gate.

Volumetric drift

Track whether the daily ingested record count deviates from its moving historical average. A source that silently halves its load rarely throws an error — it just quietly understates everything downstream.

Relational integrity

Assert that values stay aligned across independent systems — payment-gateway logs against ERP orders, line items against order headers — so a mismatch is caught at ingest, not in a board deck.

Semantic tolerance

Halt downstream model refreshes when a core metric drifts outside a defined historical band. The gate fails loudly so a human looks before the number ships.

Building an advanced semantic test in dbt

Here’s a concrete implementation: a custom generic test that checks relational integrity across two models — validating that the sum of line-item amounts matches the transaction header, within a tolerance.

Create the test at tests/generic/assert_sums_match.sql. A generic test returns the failing rows, so the test fails whenever the query yields anything:

{% test assert_sums_match(model, column_name, compare_model, compare_column, group_by_column, tolerance=0.01) %}

with source_totals as (
    select
        {{ group_by_column }} as join_key,
        sum({{ column_name }}) as primary_sum
    from {{ model }}
    group by 1
),

compare_totals as (
    select
        {{ group_by_column }} as join_key,
        sum({{ compare_column }}) as reference_sum
    from {{ compare_model }}
    group by 1
),

discrepancies as (
    select
        s.join_key,
        s.primary_sum,
        c.reference_sum,
        abs(s.primary_sum - c.reference_sum) as absolute_difference
    from source_totals s
    join compare_totals c on s.join_key = c.join_key
    where abs(s.primary_sum - c.reference_sum) > {{ tolerance }}
)

select * from discrepancies

{% endtest %}

Then reference it from models/schema.yml so it runs on every deployment. Tag it so you can select just the gates in CI:

version: 2

models:
  - name: fct_order_items
    description: "Granular line items for customer orders."
    columns:
      - name: line_item_amount_eur
        description: "Net amount of the individual line."
        tests:
          - assert_sums_match:
              compare_model: ref('fct_orders')
              compare_column: total_order_amount_eur
              group_by_column: order_id
              tolerance: 0.05
              tags: ['dq_gate']

Quarantine at the ingestion layer with Snowflake

Running every semantic check inside dbt means anomalous rows have already landed in your modelled tables. For high-volume sources, push the cheapest gates upstream using Snowflake Streams and Tasks: a stream tracks new rows on the raw landing table, a scheduled task evaluates the integrity rule, and failing batches get routed to a quarantine table instead of flowing on. You spend less warehouse compute reprocessing bad loads, and dbt only ever sees data that already cleared the first gate.

Closing the loop: one semantic layer for Power BI

Even with clean data in Snowflake, metrics still drift the moment analysts compute their own KPIs locally in Power BI with custom DAX. One “revenue” measure quietly becomes three, each subtly different, and nobody can say which is right. Clean warehouse data is necessary but it is not enough — the definition has to be centralized too.

Define the metric once with the dbt Semantic Layer

With the dbt Semantic Layer you declare each measure once, in the code repository, and every consumer reads the same definition:

semantic_models:
  - name: revenue_performance
    model: ref('fct_orders')
    dimensions:
      - name: order_date
        type: time
        type_params:
          time_granularity: day
    measures:
      - name: total_revenue
        agg: sum
        expr: total_order_amount_eur

metrics:
  - name: sales_revenue
    label: "Sales Revenue (EUR)"
    type: simple
    type_params:
      measure: total_revenue

Serve Power BI from a certified model, not raw tables

To connect that centralized logic to Power BI without re-implementing it:

01
Enable the XMLA endpoint
On a Power BI Premium or Fabric workspace, turn on XMLA read/write so external tools can deploy and manage semantic models programmatically rather than through point-and-click.
02
Report from one certified semantic model
Build reports against a single, certified Power BI semantic model instead of importing raw database tables into each report. One definition, many reports — no per-report DAX forks.
03
Version definitions in Git with TMDL
Deploy and track the semantic schema as code using Tabular Model Definition Language (TMDL), edited with Tabular Editor. Metric changes then go through peer review and automated testing before they hit production — the same workflow as the rest of your engineering.

Once metric logic lives in Git-managed models, a change to “revenue” is a pull request, not a silent edit in someone’s local report.

Alerting: catch bad data before a stakeholder does

The last piece is proactive alerting. When a semantic gate fails, the notification should reach the on-call data engineer — not wait for an executive to notice a broken chart. Wire the gate selection into CI so a failure triggers the alert path:

# In the CI pipeline
dbt build --select tag:dq_gate || python scripts/alert_data_team.py --status failure

The alert script reports the exact failure so whoever picks it up already knows what broke:

import os
import requests

def send_teams_alert(failed_test_name, affected_rows):
    webhook_url = os.environ["TEAMS_WEBHOOK_URL"]
    payload = {
        "title": "🚨 Data quality gate failure",
        "text": (
            f"Semantic validation **{failed_test_name}** failed on Snowflake. "
            f"Found **{affected_rows}** mismatched records. Downstream refreshes halted."
        ),
    }
    requests.post(webhook_url, json=payload, timeout=10)

Restoring trust in the numbers

When a business loses faith in its analytics, the failure mode is predictable: teams revert to manual spreadsheets, data silos reform, and strategic alignment quietly erodes. Closing that gap means moving past syntax validation to automated semantic gates and a single, governed metric layer. It’s the same operational discipline that decides whether an AI pilot ever reaches production — the model, like the green build, is rarely where the project actually fails.

We design and implement cloud data warehouses, pipeline automation, and BI architectures for DACH mid-market companies — with data quality gates and semantic layers built in from the start, not bolted on after the first wrong board deck.

Explore data engineering & analytics →Audit my pipeline architecture

// SOURCES

Add data tests to your DAG — dbt Labs, 2025
About MetricFlow and the dbt Semantic Layer — dbt Labs, 2025
Introduction to Streams — Snowflake, 2025
Semantic model connectivity with the XMLA endpoint — Microsoft, 2026
Tabular Model Definition Language (TMDL) overview — Microsoft, 2025

Frequently asked questions

Why does a dbt build pass when the business metrics are wrong?
A successful dbt build only guarantees that your SQL compiled, tables materialized, and structural tests (unique, not_null, relationships) held. It does not verify that the values match physical business reality, and it does not stop analysts from writing divergent metric definitions downstream in the BI tool. Those are semantic failures, and standard pipeline tests are syntactic.
What is a data quality gate?
A data quality gate is an automated check placed at a boundary in your warehouse — typically ingestion or pre-publish — that halts or quarantines data when it violates a logical rule, not just a structural one. Examples: daily volume deviating from its moving average, transaction totals that disagree across two source systems, or a core metric drifting outside historical tolerance.
How do you write a semantic test in dbt?
Use a custom generic test: a parameterized SQL macro in tests/generic/ that returns rows representing failures. For relational integrity you join two models on a key, compare aggregated values, and return any pair whose absolute difference exceeds a tolerance. You then attach the test to a column in schema.yml so it runs on every build.
How do you stop metric drift in Power BI?
Define each measure once in a centralized semantic layer (the dbt Semantic Layer, or a single certified Power BI semantic model) and have all reports consume it rather than recomputing KPIs in local DAX. Manage those definitions in Git via TMDL so changes go through review and testing before production.

// share

linkedin email

When the dbt build passes but the dashboards still lie