Home/Data Analyst/SQL Intermediate

Subquery vs CTE

Subquery vs CTE

Technical Explanation

Both subqueries and CTEs (Common Table Expressions) allow you to use the result of one query within another. The key differences are in readability, reusability, and some database-specific optimizations.

Types of Subqueries

Type Description Example
Scalar Returns single value SELECT MAX(x) FROM...
Table Returns result set FROM (SELECT...)
Correlated References outer query WHERE x = (SELECT...)

Subquery vs CTE Comparison

-- SUBQUERY: Nested directly in WHERE/FROM
SELECT *
FROM customers
WHERE total_spent > (
    SELECT AVG(total_spent)
    FROM customers
);

-- CTE: Named, defined at the top
WITH avg_spent AS (
    SELECT AVG(total_spent) as avg_value
    FROM customers
)
SELECT *
FROM customers, avg_spent
WHERE customers.total_spent > avg_spent.avg_value;

Code Examples

Using the CatCafe dataset:

-- Tables: customers, orders, cats

-- SCALAR SUBQUERY: Single value in WHERE
-- Find customers who spent more than average
SELECT
    name,
    total_spent,
    (SELECT AVG(total_spent) FROM customers) as avg_spent,
    total_spent - (SELECT AVG(total_spent) FROM customers) as vs_avg
FROM customers
WHERE total_spent > (
    SELECT AVG(total_spent)
    FROM customers
);

-- TABLE SUBQUERY: Entire result set as table
-- Join against aggregated data
SELECT
    c.name,
    c.total_spent,
    monthly.order_month,
    monthly.order_count
FROM customers c
JOIN (
    SELECT
        customer_id,
        DATE_TRUNC('month', order_date) as order_month,
        COUNT(*) as order_count
    FROM orders
    WHERE status = 'completed'
    GROUP BY customer_id, DATE_TRUNC('month', order_date)
) monthly ON c.id = monthly.customer_id;

-- CTE equivalent of above
WITH monthly_orders AS (
    SELECT
        customer_id,
        DATE_TRUNC('month', order_date) as order_month,
        COUNT(*) as order_count
    FROM orders
    WHERE status = 'completed'
    GROUP BY customer_id, DATE_TRUNC('month', order_date)
)
SELECT
    c.name,
    c.total_spent,
    monthly.order_month,
    monthly.order_count
FROM customers c
JOIN monthly_orders monthly ON c.id = monthly.customer_id;

-- CORRELATED SUBQUERY: References outer query
-- For each customer, find their latest order date
SELECT
    name,
    total_spent,
    (
        SELECT MAX(order_date)
        FROM orders
        WHERE customer_id = customers.id
    ) as last_order_date
FROM customers;

-- Find customers whose total exceeds their city's average
SELECT
    c1.name,
    c1.city,
    c1.total_spent
FROM customers c1
WHERE c1.total_spent > (
    SELECT AVG(c2.total_spent)
    FROM customers c2
    WHERE c2.city = c1.city
)
ORDER BY c1.city, c1.total_spent DESC;

The Cat Analogy

You're a cat shelter manager analyzing records:

Subquery approach — Everything in one sentence:

"Show me cats that weigh more than the average weight of all cats
 that have been adopted in the last 30 days from shelters within 50 miles."

Long, nested, hard to parse.

CTE approach — Breaking into steps:

STEP 1: Find the average weight of adopted cats in the last 30 days
STEP 2: Get cats within 50 miles
STEP 3: Compare each cat to the average

CTEs are like outlining your thoughts before writing the full query.


When to Use Each

Scenario Best Choice Why
Readability in long queries CTE Named steps are self-documenting
Reusing same subquery multiple times CTE Define once, reference many times
Single use, simple query Subquery Less overhead, same result
Recursive data CTE Recursive CTEs are powerful
Correlated comparisons Either Depends on readability needs

Common Pitfalls

Subquery Referenced Multiple Times

-- INEFFICIENT: Same subquery runs twice
SELECT
    customers.name,
    (SELECT COUNT(*) FROM orders WHERE customer_id = customers.id) as order_count,
    (SELECT SUM(total_amount) FROM orders WHERE customer_id = customers.id) as total_spent
FROM customers;

-- BETTER: CTE or JOIN
WITH customer_stats AS (
    SELECT
        customer_id,
        COUNT(*) as order_count,
        SUM(total_amount) as total_spent
    FROM orders
    GROUP BY customer_id
)
SELECT
    customers.name,
    COALESCE(cs.order_count, 0),
    COALESCE(cs.total_spent, 0)
FROM customers
LEFT JOIN customer_stats cs ON customers.id = cs.customer_id;

Correlated Subquery Performance

-- May run the subquery for EACH row!
SELECT
    name,
    (SELECT MAX(order_date) FROM orders WHERE customer_id = customers.id) as last_order
FROM customers;
-- If 1000 customers, this subquery runs 1000 times!

-- Better: Use window function
SELECT
    name,
    MAX(order_date) OVER (PARTITION BY customer_id) as last_order
FROM customers
LEFT JOIN orders ON customers.id = orders.customer_id;

Exercises

Exercise 1

Rewrite this subquery using a CTE:

SELECT *
FROM orders
WHERE customer_id IN (
    SELECT id FROM customers WHERE city = 'Brooklyn'
)
AND total_amount > (
    SELECT AVG(total_amount) FROM orders
);

Exercise 2

Explain the difference between a scalar subquery and a table subquery.

Exercise 3

Write a query to find the city with the highest average customer spending. Use a CTE.

Exercise 4

Why is this correlated subquery potentially slow?

SELECT
    c.name,
    (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) as order_count
FROM customers c;

Exercise 5

When might a subquery actually be faster than a CTE?


Key Takeaways

  • Subqueries and CTEs can often accomplish the same results
  • CTEs improve readability and allow reuse within the query
  • Scalar subqueries return one value; table subqueries return a result set
  • Correlated subqueries run once per row (can be slow)
  • Use CTEs when you need to reference the same subquery multiple times
  • Use subqueries for simple, single-use cases
  • Some databases optimize CTEs as materialized vs inline (performance varies)