Home/Data Analyst/SQL Fundamentals

NULL Handling

NULL Handling

Technical Explanation

NULL represents the absence of a value—it's not zero, not an empty string, not false. NULL means "this value is unknown or not applicable." Understanding NULL behavior is critical because it affects comparisons, aggregations, and JOINs.

The Three-Valued Logic

SQL uses three-valued logic with NULL:

Expression Result
TRUE AND NULL UNKNOWN
FALSE AND NULL FALSE
TRUE OR NULL TRUE
FALSE OR NULL UNKNOWN
NOT NULL UNKNOWN

This means WHERE conditions with NULL often exclude rows you might expect.

NULL Comparison Rules

-- NULL is never equal to anything
NULL = NULL        -- Returns UNKNOWN (not TRUE!)
NULL != NULL      -- Returns UNKNOWN (not TRUE!)
NULL <> NULL      -- Returns UNKNOWN (not TRUE!)

-- Correct ways to check for NULL
column IS NULL
column IS NOT NULL

NULL in Aggregations

Function NULL Behavior
COUNT(column) Ignores NULLs
SUM(column) Ignores NULLs
AVG(column) Ignores NULLs
COUNT(*) Counts all rows including NULL
MIN/MAX Ignores NULLs

NULL in JOINs

-- NULL does NOT match NULL in standard joins
SELECT *
FROM a
JOIN b
    ON a.id = b.id;
-- If a.id = NULL and b.id = NULL, they WON'T match!

Code Examples

Using the CatCafe dataset:

-- Table: customers
-- id, name, email, city, registration_date, phone, total_spent

-- Finding NULLs: Customers without a phone number
SELECT *
FROM customers
WHERE phone IS NULL;

-- Handling NULLs: Replace with default value
SELECT
    name,
    COALESCE(city, 'Unknown') as city,
    COALESCE(phone, 'No phone') as phone
FROM customers;

-- NULLIF: Handle division by zero
SELECT
    SUM(total_spent) / NULLIF(COUNT(*), 0) as avg_spent
FROM customers;
-- Returns NULL instead of error when COUNT(*) = 0

-- CASE WHEN with NULL handling
SELECT
    name,
    total_spent,
    CASE
        WHEN total_spent IS NULL THEN 'No purchases'
        WHEN total_spent < 50 THEN 'Low value'
        WHEN total_spent < 200 THEN 'Medium value'
        ELSE 'High value'
    END as customer_tier
FROM customers;

-- Aggregate with NULL filtering
SELECT
    COUNT(*) as total_customers,
    COUNT(phone) as customers_with_phone,
    COUNT(*) - COUNT(phone) as customers_without_phone,
    AVG(total_spent) as avg_spent_all,
    AVG(total_spent) as avg_spent_with_purchases
FROM customers;

-- NULLs in JOINs: Customers with their latest order
SELECT
    customers.name,
    orders.order_date,
    orders.total_amount
FROM customers
LEFT JOIN orders
    ON customers.id = orders.customer_id
    AND orders.order_date = (
        SELECT MAX(order_date)
        FROM orders o2
        WHERE o2.customer_id = orders.customer_id
    );
-- This ensures we get only the latest order, even if customer has NULL orders

-- Using DISTINCT ON for latest per group (PostgreSQL)
SELECT DISTINCT ON (customers.id)
    customers.name,
    orders.total_amount,
    orders.order_date
FROM customers
LEFT JOIN orders
    ON customers.id = orders.customer_id
ORDER BY customers.id, orders.order_date DESC;

The Cat Analogy

Think of NULL as a cat's mood:

  • NULL is NOT "the cat is calm"
  • NULL is NOT "the cat is angry"
  • NULL is "we don't know what the cat's mood is"

Comparing NULLs:

"Is Whiskers' mood the same as Mittens' mood?"
"We don't know if Whiskers is calm" = "We don't know if Mittens is calm"
→ UNKNOWN (not "yes, they're the same")

NULL in aggregations: If you're tallying up cat snacks:

  • SUM(snacks) ignores bowls with NULL (no snack recorded)
  • COUNT(*) counts all bowls, including empty ones
  • COUNT(snacks) counts only bowls where snacks were recorded

NULL in comparisons:

-- "Show me all unassigned cats"
WHERE assigned_to IS NULL    -- Correct: find cats with no assignment
WHERE assigned_to = NULL     -- Wrong: this never matches anything!

-- "Show me cats with assignments"
WHERE assigned_to IS NOT NULL

Common Pitfalls

Comparing with NULL

-- WRONG: This never returns rows!
SELECT *
FROM cats
WHERE adopted_date = NULL;

-- RIGHT:
SELECT *
FROM cats
WHERE adopted_date IS NULL;

NULL in String Concatenation

-- Concatenation with NULL produces NULL
SELECT 'Name: ' || name || ', Age: ' || age
FROM cats;
-- If age is NULL, entire row becomes NULL!

-- Solution: Use COALESCE
SELECT 'Name: ' || name || ', Age: ' || COALESCE(age::text, 'Unknown')
FROM cats;

NOT IN with NULLs

-- Dangerous: If subquery returns NULL, NOT IN fails
SELECT *
FROM cats
WHERE id NOT IN (SELECT cat_id FROM adoption_fees WHERE cat_id IS NULL);
-- This returns nothing if any cat_id is NULL!

-- Safer: Use NOT EXISTS or exclude NULLs explicitly
SELECT *
FROM cats
WHERE id NOT IN (
    SELECT cat_id
    FROM adoption_fees
    WHERE cat_id IS NOT NULL
);
-- Or better:
SELECT *
FROM cats c
WHERE NOT EXISTS (
    SELECT 1 FROM adoption_fees af WHERE af.cat_id = c.id
);

Exercises

Exercise 1

Write a query to find all customers who:

  • Have never placed an order (no matching records in orders table)
  • Have placed orders but never provided a phone number

Exercise 2

Explain why these return different results:

-- Query A
SELECT COUNT(*) FROM orders WHERE customer_id NOT IN (1, 2, NULL);

-- Query B
SELECT COUNT(*) FROM orders WHERE customer_id NOT IN (1, 2) AND customer_id IS NOT NULL;

Exercise 3

Write a query showing each customer's name, total spent, and a "spending tier" (use CASE WHEN with COALESCE for NULL total_spent).

Exercise 4

Why does this query return no results?

SELECT *
FROM customers
WHERE total_spent > 100
   OR total_spent = NULL;

Exercise 5

Write a query to find the percentage of customers with NULL phone numbers.


Key Takeaways

  • NULL means "absence of value," not zero or empty string
  • Always use IS NULL / IS NOT NULL for NULL comparisons
  • NULL comparisons always return UNKNOWN, never TRUE or FALSE directly
  • COALESCE and NULLIF are essential for NULL handling
  • Aggregates (SUM, AVG, COUNT(col)) ignore NULLs; COUNT(*) counts them
  • Be careful with NOT IN and subqueries that might contain NULLs
  • String concatenation with NULL produces NULL—use COALESCE