Home/Data Analyst/SQL Fundamentals

WHERE vs HAVING

WHERE vs HAVING

Technical Explanation

WHERE and HAVING both filter data, but they operate at different stages of query execution and serve different purposes.

The Execution Order

SQL queries don't execute in the order you write them. The logical execution order is:

1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY

This means WHERE filters rows before aggregation, while HAVING filters groups after aggregation.

When to Use Each

Operator Filters Can use Aggregates? Use When
WHERE Individual rows No Filtering raw data before grouping
HAVING Groups (after GROUP BY) Yes Filtering based on aggregate results

Key Differences

  1. WHERE goes first — It evaluates each row's condition before any grouping happens
  2. HAVING evaluates after — It can only see the result of aggregations, not individual rows
  3. Performance — WHERE can use indexes; HAVING typically cannot

Using WHERE for row-level filtering and HAVING for group-level filtering is not just about correctness—it's about performance and readability.


Code Examples

Using the CatCafe dataset:

-- Dataset: customers
-- id, name, email, registration_date, city, total_spent

-- WHERE: Filter rows BEFORE grouping
-- Find customers from "Brooklyn" who spent more than $50
SELECT
    city,
    COUNT(*) as customer_count,
    SUM(total_spent) as total_revenue
FROM customers
WHERE city = 'Brooklyn'          -- Filters individual rows
GROUP BY city;
-- Result: Only Brooklyn customers, aggregated

-- HAVING: Filter groups AFTER grouping
-- Find cities with more than 10 customers
SELECT
    city,
    COUNT(*) as customer_count,
    SUM(total_spent) as total_revenue
FROM customers
GROUP BY city
HAVING COUNT(*) > 10;            -- Filters groups after aggregation
-- Result: Only cities with 10+ customers

-- Combining both: Use WHERE AND HAVING
-- Cities with 10+ customers from customers who registered in 2024
SELECT
    city,
    COUNT(*) as customer_count,
    AVG(total_spent) as avg_spent
FROM customers
WHERE registration_date >= '2024-01-01'  -- Row-level filter
GROUP BY city
HAVING COUNT(*) > 10;                    -- Group-level filter

Common Mistake

-- WRONG: Trying to filter on aggregate in WHERE
SELECT
    city,
    COUNT(*) as customer_count
FROM customers
WHERE COUNT(*) > 10    -- ERROR: aggregate not allowed in WHERE
GROUP BY city;

-- CORRECT: Use HAVING for aggregate filters
SELECT
    city,
    COUNT(*) as customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) > 10;  -- This works

The Cat Analogy

Imagine you're organizing a cat show.

WHERE is like sorting cats as they arrive:

  • "Only let cats with clean fur into the main hall"
  • "Only cats with collars can enter"
  • Each cat is evaluated individually before the show begins

HAVING is like deciding which groups get awards:

  • After cats are grouped by breed, you say "Only groups with more than 5 cats qualify"
  • After grouping by color, you say "Only groups where the average weight is above 5kg get a prize"
  • You're filtering based on properties of the group itself, not individual cats

You can't say "only groups with more than 5 cats" before you know how many cats are in each group—that's why HAVING comes after GROUP BY.


Exercises

Exercise 1

Given this reservations table:

-- id, cat_id, customer_id, reservation_date, duration_hours, status

Write a query that shows average duration per status, but only for reservations made in 2024, and only for statuses with more than 100 reservations.

Exercise 2

Explain why this query is problematic:

SELECT city, COUNT(*) as cnt
FROM customers
WHERE COUNT(*) > 5
GROUP BY city;

Exercise 3

Write a query using both WHERE and HAVING that:

  • Filters catcafe orders to only those with status = 'completed'
  • Groups by customer_id
  • Shows only customers with more than 5 completed orders
  • Includes their total spending

Exercise 4

In the CatCafe scenario, would you use WHERE or HAVING to:

  1. Exclude cancelled reservations?
  2. Find customers whose average order is over $50?
  3. Filter to a specific city before grouping?
  4. Find cities where total revenue exceeds $10,000?

Key Takeaways

  • WHERE filters rows before GROUP BY; HAVING filters groups after GROUP BY
  • WHERE cannot use aggregate functions; HAVING is designed for them
  • For best performance, filter early with WHERE whenever possible
  • Use both together: WHERE for row conditions, HAVING for aggregate conditions