CTE Purpose
CTE Purpose
Technical Explanation
A Common Table Expression (CTE) is a named temporary result set that exists only for the duration of a single query. It improves readability, enables recursion, and allows you to reference the same logic multiple times within a query.
Syntax
WITH cte_name AS (
-- CTE query
SELECT ...
FROM ...
WHERE ...
)
-- Main query using CTE
SELECT *
FROM cte_name;
When to Use CTEs
| Benefit | Use Case |
|---|---|
| Readability | Breaking complex queries into named steps |
| Reusability | Referencing the same subquery multiple times |
| Recursion | Hierarchical data (org charts, trees) |
| Self-reference | Querying the same table multiple times |
CTEs vs Subqueries
-- Subquery approach (nested, harder to read)
SELECT *
FROM (
SELECT *
FROM (
SELECT *
FROM orders
WHERE status = 'completed'
) inner_q
WHERE inner_q.amount > 50
) outer_q;
-- CTE approach (named, readable)
WITH filtered_orders AS (
SELECT *
FROM orders
WHERE status = 'completed'
),
high_value_orders AS (
SELECT *
FROM filtered_orders
WHERE total_amount > 50
)
SELECT *
FROM high_value_orders;
Code Examples
Using the CatCafe dataset:
-- Tables:
-- customers: id, name, city, registration_date, total_spent
-- orders: id, customer_id, cat_id, order_date, total_amount, status
-- Basic CTE: Step-by-step transformation
WITH recent_customers AS (
SELECT *
FROM customers
WHERE registration_date >= '2024-01-01'
),
customer_totals AS (
SELECT
customer_id,
SUM(total_amount) as total_spent,
COUNT(*) as order_count
FROM orders
WHERE customer_id IN (SELECT id FROM recent_customers)
GROUP BY customer_id
)
SELECT
recent_customers.name,
customer_totals.total_spent,
customer_totals.order_count
FROM recent_customers
JOIN customer_totals
ON recent_customers.id = customer_totals.customer_id
ORDER BY customer_totals.total_spent DESC;
-- Multiple CTEs: Separating concerns
WITH
-- Step 1: Get completed orders
completed_orders AS (
SELECT
customer_id,
COUNT(*) as order_count,
SUM(total_amount) as total_spent
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
),
-- Step 2: Find high-value customers
high_value AS (
SELECT customer_id
FROM completed_orders
WHERE total_spent > 200
),
-- Step 3: Get their details
customer_details AS (
SELECT
c.name,
c.city,
c.total_spent as lifetime_spent
FROM customers c
JOIN high_value hv ON c.id = hv.customer_id
)
SELECT * FROM customer_details ORDER BY lifetime_spent DESC;
-- CTE for complex grouping
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(total_amount) as revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', order_date)
),
averages AS (
SELECT
AVG(revenue) as avg_monthly_revenue,
AVG(unique_customers) as avg_monthly_customers
FROM monthly_sales
)
SELECT
month,
revenue,
unique_customers,
revenue - averages.avg_monthly_revenue as vs_avg_revenue,
unique_customers - averages.avg_monthly_customers as vs_avg_customers
FROM monthly_sales, averages
ORDER BY month DESC;
The Cat Analogy
Think of a CTE like a cat cafe's prep station workflow:
Without CTE (chaotic kitchen):
"First I need flour, but to get flour I need to open the pantry,
but to open the pantry I need the key, but..."
Everything nested, hard to follow.
With CTEs (organized stations):
STATION 1 - Ingredients:
Get flour, sugar, eggs
STATION 2 - Mix batter:
Combine ingredients from Station 1
STATION 3 - Bake:
Take batter from Station 2, apply heat
FINAL: Cat-shaped cookies!
Each station (CTE) has a clear purpose, is named, and feeds into the next step. You can also go back to any station if you need to check your work.
Common Pitfalls
CTE vs Subquery Performance
-- CTEs are often optimized the same as subqueries
-- But some databases/materialize CTEs:
WITH cte AS (
SELECT * FROM large_table -- May be materialized
)
SELECT * FROM cte WHERE x = 1
UNION ALL
SELECT * FROM cte WHERE x = 2;
-- This might scan the table twice in some databases!
-- Alternative: Reference the CTE once
WITH cte AS (
SELECT * FROM large_table
)
SELECT * FROM cte WHERE x = 1
UNION ALL
SELECT * FROM cte WHERE x = 2;
-- Better: only materialized once, referenced twice
Forgetting the Main Query
-- OOPS: CTE defined but not used!
WITH cte AS (
SELECT * FROM customers
)
-- Missing: SELECT * FROM cte;
SELECT * FROM customers; -- CTE was useless here!
-- RIGHT:
WITH cte AS (
SELECT * FROM customers
)
SELECT * FROM cte; -- Actually using it
Exercises
Exercise 1
Rewrite this nested query using CTEs:
SELECT *
FROM (
SELECT *
FROM (
SELECT * FROM customers WHERE city = 'Brooklyn'
) c
WHERE c.total_spent > 100
) result
ORDER BY total_spent DESC;
Exercise 2
Write a CTE-based query that finds:
- Customers with more than 3 orders
- Their average order value
- Only for customers who registered in the last 6 months
Exercise 3
What are the advantages of CTEs over subqueries? When might a subquery be better?
Exercise 4
Write a query using multiple CTEs to compare each month's revenue to the previous month.
Exercise 5
Explain why this CTE doesn't work as intended:
WITH cat_totals AS (
SELECT customer_id, SUM(total_amount) as total
FROM orders
GROUP BY customer_id
)
SELECT * FROM cat_totals WHERE total > 100;
What if some customers have no orders?
Key Takeaways
- CTEs improve readability by naming intermediate steps
- They exist only for the duration of one query
- CTEs can reference earlier CTEs in the same query
- Useful for recursive queries (hierarchical data)
- Some databases materialize CTEs (performance consideration)
- A CTE is not required to be used after definition—don't define unused CTEs
- CTEs can make complex queries self-documenting through naming