Home/Data Analyst/SQL Intermediate

CASE WHEN

CASE WHEN

Technical Explanation

CASE WHEN provides conditional logic in SQL queries, allowing you to create custom classifications, transform values, and implement business rules directly in your queries.

Syntax

-- Simple CASE
CASE expression
    WHEN value1 THEN result1
    WHEN value2 THEN result2
    [ELSE default_result]
END

-- Searched CASE (more flexible)
CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    [ELSE default_result]
END

Key Points

  1. Returns the result of the first matching condition
  2. If no condition matches and no ELSE, returns NULL
  3. Can be used in SELECT, WHERE, GROUP BY, HAVING, ORDER BY
  4. WHEN conditions are evaluated in order (top to bottom)

Code Examples

Using the CatCafe dataset:

-- Table: customers
-- id, name, email, city, registration_date, total_spent

-- Simple CASE: Categorize customers by city
SELECT
    name,
    city,
    CASE city
        WHEN 'Brooklyn' THEN 'NYC Metro'
        WHEN 'Manhattan' THEN 'NYC Metro'
        WHEN 'Queens' THEN 'NYC Metro'
        ELSE 'Other Region'
    END as region
FROM customers;

-- Searched CASE: Spending tiers
SELECT
    name,
    total_spent,
    CASE
        WHEN total_spent >= 500 THEN 'VIP'
        WHEN total_spent >= 200 THEN 'Regular'
        WHEN total_spent >= 50 THEN 'Occasional'
        WHEN total_spent > 0 THEN 'New'
        ELSE 'Never Purchased'
    END as customer_tier
FROM customers
ORDER BY total_spent DESC;

-- CASE in aggregation: Counting conditions
SELECT
    COUNT(*) as total_customers,
    SUM(CASE WHEN total_spent >= 200 THEN 1 ELSE 0 END) as vip_count,
    SUM(CASE WHEN total_spent >= 50 AND total_spent < 200 THEN 1 ELSE 0 END) as regular_count,
    AVG(CASE WHEN total_spent > 0 THEN 1.0 ELSE 0 END) as purchase_rate
FROM customers;

-- CASE with GROUP BY: Segmented metrics
SELECT
    CASE
        WHEN total_spent >= 500 THEN 'VIP'
        WHEN total_spent >= 200 THEN 'Regular'
        ELSE 'Other'
    END as tier,
    COUNT(*) as customer_count,
    AVG(total_spent) as avg_spent
FROM customers
GROUP BY
    CASE
        WHEN total_spent >= 500 THEN 'VIP'
        WHEN total_spent >= 200 THEN 'Regular'
        ELSE 'Other'
    END;

-- CASE in ORDER BY: Custom sort order
SELECT
    status,
    name,
    total_spent
FROM customers
ORDER BY
    CASE status
        WHEN 'active' THEN 1
        WHEN 'inactive' THEN 2
        WHEN 'banned' THEN 3
        ELSE 4
    END;

-- CASE for data cleaning: Standardize values
SELECT
    name,
    CASE
        WHEN LOWER(city) IN ('nyc', 'new york city', 'new york, ny') THEN 'New York'
        WHEN LOWER(city) IN ('la', 'los angeles', 'los angeles, ca') THEN 'Los Angeles'
        ELSE city
    END as standardized_city
FROM customers;

The Cat Analogy

Imagine you're a cat behaviorist classifying feline moods:

Simple CASE — Based on one observable trait:

CASE cat_noise
    WHEN 'meow' THEN 'Greeting'
    WHEN 'hiss' THEN 'Warning'
    WHEN 'purr' THEN 'Content'
    ELSE 'Unknown'
END

Searched CASE — Based on multiple factors:

CASE
    WHEN cat.hissing = TRUE AND cat.tail_up = FALSE THEN 'Aggressive'
    WHEN cat.purring = TRUE AND cat.eyes_closed = TRUE THEN 'Sleeping'
    WHEN cat.meowing = TRUE AND time > 6pm THEN 'Hungry'
    ELSE 'Neutral'
END

The cat's mood determines how you interact with it — just like CASE determines what value appears in your result column.


Common Pitfalls

Missing ELSE (Implicit NULL)

-- Without ELSE, unmatched rows return NULL
SELECT
    name,
    CASE
        WHEN total_spent > 100 THEN 'High'
        -- No ELSE!
    END as category
FROM customers;
-- Customers with <= 100 spent get NULL, not 'Low'!

-- Always consider all cases:
CASE
    WHEN total_spent > 100 THEN 'High'
    ELSE 'Low or NULL'  -- Explicit handling
END

Order of Conditions Matters

-- WRONG: More specific condition after general one
CASE
    WHEN total_spent > 0 THEN 'Has spent'      -- Catches everything!
    WHEN total_spent > 100 THEN 'VIP'          -- Never reached!
END

-- RIGHT: More specific conditions first
CASE
    WHEN total_spent > 100 THEN 'VIP'
    WHEN total_spent > 0 THEN 'Has spent'
    ELSE 'No purchases'
END

Using CASE vs COALESCE

-- COALESCE for NULL replacement only
COALESCE(null_column, 'default')  -- If null, use 'default'

-- CASE for multi-condition logic
CASE
    WHEN col IS NULL THEN 'No data'
    WHEN col < 0 THEN 'Negative'
    WHEN col = 0 THEN 'Zero'
    ELSE 'Positive'
END

Exercises

Exercise 1

Write a query that shows each customer's name, city, and a "region" column:

  • 'Northeast' for NY, NJ, CT
  • 'West Coast' for CA, WA, OR
  • 'Other' for everything else

Exercise 2

Using the orders table, write a query showing:

  • Total orders
  • Number of orders over $50
  • Number of orders under $20
  • Percentage of orders over $50

Exercise 3

Write a query to categorize cats by age:

  • 'Kitten' for age < 1
  • 'Young' for 1 <= age < 3
  • 'Adult' for 3 <= age < 8
  • 'Senior' for age >= 8

Exercise 4

Explain why these give different results:

-- A
SELECT SUM(CASE WHEN total_spent > 100 THEN 1 END) FROM customers;
-- B
SELECT SUM(CASE WHEN total_spent > 100 THEN 1 ELSE 0 END) FROM customers;

Exercise 5

Write a query using CASE in ORDER BY to sort customers by: VIP customers first, then by name alphabetically.


Key Takeaways

  • CASE WHEN allows conditional logic directly in SQL
  • WHEN conditions are evaluated top-to-bottom; first match wins
  • Missing ELSE returns NULL for unmatched rows
  • CASE can be used in SELECT, WHERE, GROUP BY, HAVING, ORDER BY
  • Put more specific conditions before general ones
  • Use SUM(CASE WHEN ... THEN 1 ELSE 0 END) to count conditions
  • COALESCE is for NULL replacement only; CASE for complex logic