Home/Data Analyst/Data Cleaning & Validation

Quality Issue Escalation

Quality Issue Escalation

Technical Explanation

When data quality issues are discovered, knowing when and how to escalate is critical. Not every issue needs immediate action, but significant issues affecting business decisions must be communicated to stakeholders promptly.

Escalation Criteria

Factor Escalate If Monitor If
Impact Affects revenue/KPIs Minor metric discrepancy
Volume >1% of records affected <0.1% affected
Duration Ongoing for >24 hours Self-correcting
Correctability Cannot be fixed in query Can be filtered

Escalation Framework

Level Issue Response Time Contact
Critical Revenue data wrong 1 hour Data Lead + Stakeholders
High Key metric inaccurate 4 hours Analytics Lead
Medium Minor discrepancies 24 hours Team
Low Cosmetic issues Next sprint Backlog

Code Examples

Using the CatCafe dataset:

-- Identifying issues that need escalation

-- 1. Critical: Revenue-affecting discrepancies
WITH revenue_issues AS (
    SELECT
        COUNT(*) as affected_orders,
        SUM(total_amount) as affected_revenue,
        COUNT(DISTINCT customer_id) as affected_customers
    FROM orders
    WHERE status = 'completed'
    AND total_amount < 0  -- Negative revenue!
)
SELECT
    affected_orders,
    affected_revenue,
    affected_customers,
    CASE
        WHEN affected_orders > 0 THEN 'ESCALATE: Critical revenue issue'
        ELSE 'No critical issues'
    END as action_required
FROM revenue_issues;

-- 2. High: More than 5% missing data in key fields
SELECT
    'customers' as table_name,
    'email' as field,
    COUNT(*) as total_rows,
    COUNT(*) - COUNT(email) as missing_count,
    (COUNT(*) - COUNT(email)) * 100.0 / COUNT(*) as missing_pct,
    CASE
        WHEN (COUNT(*) - COUNT(email)) * 100.0 / COUNT(*) > 5
            THEN 'ESCALATE: >5% missing'
        ELSE 'Monitor'
    END as severity
FROM customers;

-- 3. Medium: Duplicate records that need investigation
WITH duplicate_check AS (
    SELECT
        email,
        COUNT(*) as occurrences
    FROM customers
    GROUP BY email
    HAVING COUNT(*) > 1
)
SELECT
    COUNT(*) as duplicate_emails,
    SUM(occurrences) as affected_rows,
    CASE
        WHEN COUNT(*) > 10 THEN 'ESCALATE: Significant duplicates'
        ELSE 'Monitor: Within acceptable range'
    END as action
FROM duplicate_check;

-- 4. Severity classification query
WITH issue_assessment AS (
    SELECT
        'Negative orders' as issue_type,
        COUNT(*) as count,
        SUM(total_amount) as financial_impact,
        CASE
            WHEN COUNT(*) > 0 THEN 'CRITICAL'
            ELSE 'None'
        END as severity
    FROM orders WHERE total_amount < 0

    UNION ALL

    SELECT
        'Missing customer refs' as issue_type,
        COUNT(*) as count,
        0 as financial_impact,
        CASE
            WHEN COUNT(*) > (SELECT COUNT(*) FROM orders) * 0.01
            THEN 'HIGH'
            ELSE 'LOW'
        END as severity
    FROM orders
    WHERE customer_id NOT IN (SELECT id FROM customers)

    UNION ALL

    SELECT
        'Future order dates' as issue_type,
        COUNT(*) as count,
        0 as financial_impact,
        CASE
            WHEN COUNT(*) > 0 THEN 'MEDIUM'
            ELSE 'None'
        END as severity
    FROM orders WHERE order_date > CURRENT_DATE
)
SELECT
    issue_type,
    count,
    financial_impact,
    severity,
    CASE
        WHEN severity = 'CRITICAL' THEN 'Immediate escalation'
        WHEN severity = 'HIGH' THEN 'Escalate within 4 hours'
        WHEN severity = 'MEDIUM' THEN 'Escalate within 24 hours'
        ELSE 'Document and monitor'
    END as recommended_action
FROM issue_assessment
WHERE severity != 'None'
ORDER BY
    CASE severity
        WHEN 'CRITICAL' THEN 1
        WHEN 'HIGH' THEN 2
        WHEN 'MEDIUM' THEN 3
    END;

-- 5. Creating an issue log for tracking
WITH issue_summary AS (
    SELECT
        'orders' as table_name,
        COUNT(*) FILTER (WHERE total_amount < 0) as negative_amounts,
        COUNT(*) FILTER (WHERE order_date > CURRENT_DATE) as future_dates,
        COUNT(*) FILTER (WHERE customer_id NOT IN (SELECT id FROM customers)) as invalid_refs
    FROM orders
)
SELECT
    table_name,
    negative_amounts,
    future_dates,
    invalid_refs,
    CURRENT_TIMESTAMP as detected_at,
    CURRENT_USER as detected_by,
    'Open' as status
FROM issue_summary;

The Cat Analogy

Cat shelter issue escalation:

Issue: 3 cats have wrong vaccination dates
  - Volume: 3 out of 100 cats = 3%
  - Impact: Could cause adoption delays
  - → MEDIUM: Log in issue tracker, fix within 24h

Issue: 50 out of 100 cats have wrong vaccination dates
  - Volume: 50% affected!
  - Impact: Shelter might have to close
  - → CRITICAL: Wake the manager at 3am if needed

Issue: "Whiskers" is spelled "Whiskers" in one table and "Whiskrs" in another
  - Volume: 1 cat
  - Impact: Just a spelling fix
  - → LOW: Add to backlog, fix next sprint

Escalation Communication Template

-- When escalating, include this information:

-- Issue Summary
-- What: Negative order amounts detected
-- How many: 3 orders
-- Financial impact: $150 in negative revenue
-- First detected: 2024-03-15 09:30 UTC
-- Status: Ongoing

-- Technical Details
-- Table: orders
-- Field: total_amount
-- Invalid values: -50, -75, -25

-- Business Impact
-- These orders affect Q1 revenue by $150
-- Cannot produce accurate revenue report until fixed

-- Recommended Action
-- 1. Identify root cause in payment processing
-- 2. Reverse the invalid transactions
-- 3. Notify affected customers

Common Pitfalls

Over-Escalating Minor Issues

-- WRONG: Escalating every tiny issue
-- 1 misspelled city name = not critical
SELECT 'Minor: 1 spelling error' WHERE 1=0;
-- Your credibility suffers if you escalate everything

-- RIGHT: Use thresholds and classification
-- Only escalate if >5% affected or revenue impact

Under-Escalating Real Problems

-- WRONG: "It's probably fine, let's just continue"
SELECT 'ISSUE: Revenue off by $50,000' WHERE revenue_impact > 10000;
-- Your job is at risk if you hide real problems

Exercises

Exercise 1

Classify these issues by severity:

  • 2 orders with negative amounts
  • 15% of customer emails missing
  • 1 duplicate customer record
  • Revenue totals off by $10,000

Exercise 2

Write a query to identify issues that need immediate escalation vs can wait.

Exercise 3

What information should you include when escalating a data quality issue?

Exercise 4

Write a query to create a daily data quality issue summary for stakeholders.

Exercise 5

How would you determine the financial impact of a data quality issue?


Key Takeaways

  • Escalate based on impact, volume, duration, and correctability
  • Use severity levels: Critical, High, Medium, Low
  • Include financial impact in escalation communications
  • Document issues with timestamps and affected counts
  • Over-escalation hurts credibility; under-escalation risks business
  • Create automated daily/weekly quality summaries