COUNT vs COUNT DISTINCT
COUNT vs COUNT DISTINCT
Technical Explanation
Both COUNT(*) and COUNT(column) count rows, but they handle NULL values differently. COUNT(DISTINCT ...) adds another layer by counting only unique values.
The Three COUNT Variations
| Function | Counts | NULLs Included |
|---|---|---|
| COUNT(*) | Number of rows | Yes (all rows) |
| COUNT(column) | Non-NULL values in column | No (ignores NULL) |
| COUNT(DISTINCT column) | Unique non-NULL values | No |
Key Differences
-- COUNT(*): Counts every row
SELECT COUNT(*) FROM orders; -- Total number of orders
-- COUNT(column): Counts only non-NULL values
SELECT COUNT(cat_id) FROM orders; -- Orders where cat_id is not NULL
-- COUNT(DISTINCT column): Unique non-NULL values
SELECT COUNT(DISTINCT customer_id) FROM orders; -- Unique customers who ordered
Why the Difference Matters
-- If orders has 100 rows:
-- - 10 have cat_id = NULL
-- - 5 have cat_id = NULL
SELECT
COUNT(*), -- 100: all rows
COUNT(cat_id), -- 90: non-NULL cat_ids
COUNT(DISTINCT cat_id) -- 15: unique non-NULL cat_ids
FROM orders;
Code Examples
Using the CatCafe dataset:
-- Table: orders
-- id, customer_id, cat_id, order_date, total_amount, status
-- COUNT(*): Total number of orders
SELECT COUNT(*) as total_orders
FROM orders;
-- Returns: Total rows regardless of NULLs
-- COUNT(DISTINCT customer_id): Unique customers who placed orders
SELECT COUNT(DISTINCT customer_id) as unique_customers
FROM orders;
-- Count of individual customers, not total orders
-- Finding customers who ordered multiple items
SELECT
customer_id,
COUNT(*) as order_count,
COUNT(DISTINCT cat_id) as unique_cats_ordered
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > COUNT(DISTINCT cat_id);
-- Customers who ordered the same cat more than once
-- Orders per cat, showing adoption stats
SELECT
cats.name,
cats.breed,
COUNT(DISTINCT orders.customer_id) as unique_buyers,
COUNT(orders.id) as total_orders
FROM cats
LEFT JOIN orders
ON cats.id = orders.cat_id
GROUP BY cats.id, cats.name, cats.breed
ORDER BY total_orders DESC;
-- Common pattern: Conversion rate calculation
SELECT
COUNT(DISTINCT customer_id) as total_visitors,
COUNT(DISTINCT CASE WHEN status = 'completed' THEN customer_id END) as converters,
COUNT(DISTINCT CASE WHEN status = 'completed' THEN customer_id END) * 100.0
/ NULLIF(COUNT(DISTINCT customer_id), 0) as conversion_rate
FROM orders;
The Cat Analogy
You're counting cats at a cafe:
COUNT(*) = Total cats in the building
- Cats in the play area, cats napping, cats hiding under furniture
- "We have 20 cat encounters today"
COUNT(breed) = Cats with known breeds
- Excludes cats where breed is unknown/not recorded
- "We can identify the breed of 17 cats"
COUNT(DISTINCT breed) = Unique breeds present
- Each breed counted only once
- "We have 5 different breeds: Siamese, Persian, Maine Coon, Tabby, Sphynx"
Example scenario:
- You saw 20 cat entries in your log
- 3 entries had no cat (just people visiting)
- 2 cats had no recorded breed
- The 17 identifiable cats were: Siamese, Persian, Siamese, Maine Coon, Tabby, Siamese, Tabby, Tabby...
COUNT(*) = 20 (total log entries)
COUNT(breed) = 17 (entries with a cat)
COUNT(DISTINCT breed) = 4 (Siamese, Persian, Maine Coon, Tabby)
Common Mistakes
Confusing COUNT(*) and COUNT(column)
-- If total_spent has NULLs for some customers:
SELECT
COUNT(*), -- All customers
COUNT(total_spent) -- Only customers with non-NULL spending
FROM customers;
-- These will differ if there are NULL values
Forgetting DISTINCT When Needed
-- "How many cats has each customer interacted with?"
-- WRONG: Counts total interactions, not unique cats
SELECT
customer_id,
COUNT(cat_id) as cat_count
FROM orders
GROUP BY customer_id;
-- RIGHT: Count unique cats
SELECT
customer_id,
COUNT(DISTINCT cat_id) as unique_cat_count
FROM orders
GROUP BY customer_id;
NULLIF to Avoid Division by Zero
-- Calculate percentage of orders with cats
SELECT
COUNT(*) as total_orders,
COUNT(cat_id) as orders_with_cats,
COUNT(cat_id) * 100.0 / NULLIF(COUNT(*), 0) as pct_with_cats
FROM orders;
-- NULLIF(COUNT(*), 0) returns NULL if COUNT(*) is 0, avoiding division by zero
Exercises
Exercise 1
Given these results, explain the difference:
SELECT
COUNT(*) as a,
COUNT(DISTINCT status) as b,
COUNT(status) as c
FROM orders;
-- Assume: 100 rows, 3 distinct statuses, 5 rows with NULL status
Exercise 2
Write a query to find the number of unique customers who:
- Placed more than 5 orders
- Ordered more than 3 different cats
Exercise 3
Write a query showing "cat popularity" using COUNT(DISTINCT customer_id) per cat, not COUNT(*).
Exercise 4
Explain why these give different results:
SELECT COUNT(DISTINCT city) FROM customers;
SELECT COUNT(DISTINCT customer_id) FROM customers WHERE city IS NOT NULL;
Exercise 5
Write a query to find what percentage of customers have made more than one purchase.
Key Takeaways
- COUNT(*) counts all rows including NULLs
- COUNT(column) counts only non-NULL values in that column
- COUNT(DISTINCT column) counts unique non-NULL values
- Use COUNT(DISTINCT ...) when you need unique counts (customers, products, etc.)
- Use NULLIF to avoid division by zero in percentage calculations
- Be intentional: COUNT(*) vs COUNT(column) can give very different results