Home/Data Analyst/Data Cleaning & Validation

Business Rule Validation

Business Rule Validation

Technical Explanation

Business rule validation ensures data complies with defined business logic that goes beyond simple format checks. It verifies that relationships, calculations, and constraints reflect real-world business rules.

Examples of Business Rules

Rule Validation
Customers must be 18+ birth_date implies age >= 18
Order total = sum of line items total_amount = SUM(line_items)
Adoption fee >= $0 fee cannot be negative
Cats can't be adopted twice adoption_status logic
Revenue only from completed orders only completed generate revenue

Code Examples

Using the CatCafe dataset:

-- Tables:
-- orders: id, customer_id, order_date, total_amount, status
-- order_items: id, order_id, cat_id, quantity, unit_price
-- cats: id, name, adoption_status, adoption_date, adoption_fee
-- customers: id, name, registration_date, birth_date

-- 1. Validate order totals match line items
WITH line_item_totals AS (
    SELECT
        order_id,
        SUM(quantity * unit_price) as calculated_total
    FROM order_items
    GROUP BY order_id
)
SELECT
    o.id as order_id,
    o.total_amount as stated_total,
    lit.calculated_total,
    o.total_amount - lit.calculated_total as discrepancy,
    CASE
        WHEN o.total_amount = lit.calculated_total THEN 'VALID'
        ELSE 'INVALID: Total mismatch'
    END as status
FROM orders o
JOIN line_item_totals lit ON o.id = lit.order_id
WHERE o.total_amount != lit.calculated_total;

-- 2. Validate customer age (must be 18+)
SELECT
    id,
    name,
    birth_date,
    AGE(birth_date) as age,
    CASE
        WHEN birth_date IS NULL THEN 'UNKNOWN'
        WHEN birth_date > CURRENT_DATE - INTERVAL '18 years' THEN 'INVALID: Under 18'
        ELSE 'VALID'
    END as age_status
FROM customers;

-- 3. Validate adoption rules (cats not already adopted)
SELECT
    c.id,
    c.name,
    c.adoption_status,
    c.adoption_date,
    CASE
        WHEN c.adoption_status = 'adopted'
         AND c.adoption_date IS NULL
            THEN 'INVALID: Status says adopted but no date'
        WHEN c.adoption_status = 'available'
         AND c.adoption_date IS NOT NULL
            THEN 'INVALID: Available but has adoption date'
        WHEN c.adoption_status = 'adopted'
         AND c.adoption_date > CURRENT_DATE
            THEN 'INVALID: Future adoption date'
        ELSE 'VALID'
    END as adoption_rule_status
FROM cats c;

-- 4. Validate referential integrity rules
-- Each order must have at least one line item
SELECT
    o.id as order_id,
    COUNT(oi.id) as line_item_count,
    CASE
        WHEN COUNT(oi.id) = 0
            THEN 'INVALID: Order with no items'
        WHEN COUNT(oi.id) > 0
            THEN 'VALID'
    END as status
FROM orders o
LEFT JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id
HAVING COUNT(oi.id) = 0;

-- 5. Validate state-based rules (adoption fee based on status)
SELECT
    c.id,
    c.name,
    c.adoption_status,
    c.adoption_fee,
    CASE
        WHEN c.adoption_status = 'available' AND c.adoption_fee > 0
            THEN 'WARNING: Fee should be 0 for available cats'
        WHEN c.adoption_status = 'adopted' AND c.adoption_fee = 0
            THEN 'WARNING: Fee should be > 0 for adopted cats'
        WHEN c.adoption_status = 'adopted' AND c.adoption_fee < 0
            THEN 'INVALID: Negative fee'
        ELSE 'VALID'
    END as fee_rule_status
FROM cats c;

-- 6. Comprehensive business rule validation report
WITH rule_violations AS (
    -- Rule 1: Order total must match line items
    SELECT 'Order total mismatch' as rule_name, COUNT(*) as violations
    FROM (
        SELECT o.id
        FROM orders o
        JOIN (
            SELECT order_id, SUM(quantity * unit_price) as calc_total
            FROM order_items
            GROUP BY order_id
        ) lit ON o.id = lit.order_id
        WHERE o.total_amount != lit.calc_total
    ) violations

    UNION ALL

    -- Rule 2: Customer must be 18+
    SELECT 'Underage customers', COUNT(*)
    FROM customers
    WHERE birth_date > CURRENT_DATE - INTERVAL '18 years'
    AND birth_date IS NOT NULL

    UNION ALL

    -- Rule 3: No negative fees
    SELECT 'Negative adoption fees', COUNT(*)
    FROM cats
    WHERE adoption_fee < 0
)
SELECT
    rule_name,
    violations,
    CASE WHEN violations > 0 THEN 'FAIL' ELSE 'PASS' END as status
FROM rule_violations;

The Cat Analogy

Cat shelter business rules:

Rule: "Only adult cats (2+ years) can be adopted for premium fees"

Validation:
  - Whiskers (3 years, $200 fee) → VALID
  - Luna (1 year, $200 fee) → INVALID: Too young for premium
  - Mochi (5 years, $50 fee) → WARNING: Premium cat should cost more
  - Shadow (age unknown, $100 fee) → UNKNOWN: Can't validate

Rule: "Cats cannot be adopted twice"
  - Whiskers (status: 'adopted', adoption_date: '2024-01-15') → VALID
  - Luna (status: 'adopted', adoption_date: NULL) → INVALID: Missing date
  - Shadow (status: 'available', adoption_date: '2024-02-01') → INVALID: Wrong status

Common Pitfalls

Only Checking Format, Not Logic

-- WRONG: "Email has @ so it's valid"
SELECT *
FROM customers
WHERE email LIKE '%@%';  -- Doesn't check if domain exists!

-- RIGHT: Check business rules
SELECT *
FROM customers
WHERE email LIKE '%@%'
  AND email NOT LIKE '%@%@%'  -- No double @
  AND RIGHT(email, 4) IN ('.com', '.org', '.net', '.edu');  -- Valid TLD

Exercises

Exercise 1

Write a query to validate that each order's total_amount equals the sum of its line items.

Exercise 2

Write a query to validate that all customers with completed orders have valid customer_ids.

Exercise 3

What business rules would you validate for a subscription service?

Exercise 4

Write a query to find cats whose adoption status and fee don't match business rules.

Exercise 5

How would you handle rules that have exceptions (e.g., staff discounts)?


Key Takeaways

  • Business rule validation ensures data complies with domain logic
  • Rules go beyond format: they check relationships and constraints
  • Common rules: referential integrity, calculated fields, state transitions
  • Document business rules in a rules registry
  • Validate at data entry AND during ETL processes
  • Create automated rule validation reports