PARTITION BY
PARTITION BY
Technical Explanation
PARTITION BY is a clause in window functions that divides the result set into groups (partitions) over which the window function operates independently. It's similar to GROUP BY, but unlike GROUP BY, it doesn't collapse rows—it keeps all rows while computing the window function for each partition.
Syntax
function_name() OVER (
PARTITION BY column1, column2, ...
[ORDER BY ...]
[...]
)
Key Differences: PARTITION BY vs GROUP BY
| Aspect | PARTITION BY | GROUP BY |
|---|---|---|
| Rows in result | All rows preserved | Collapsed to one per group |
| Placement | Only in window functions | Standalone clause |
| Combining | Can mix with regular columns | Cannot mix with ungrouped columns |
Code Examples
Using the CatCafe dataset:
-- Tables:
-- customers: id, name, city, total_spent
-- orders: id, customer_id, cat_id, order_date, total_amount, status
-- PARTITION BY: Per-group calculations while keeping all rows
SELECT
o.id,
o.order_date,
o.total_amount,
c.name as customer_name,
c.city,
-- Total spent by customer
SUM(o.total_amount) OVER (
PARTITION BY o.customer_id
) as customer_total_spent,
-- Total spent by city
SUM(o.total_amount) OVER (
PARTITION BY c.city
) as city_total_spent,
-- Customer's order count
COUNT(*) OVER (
PARTITION BY o.customer_id
) as customer_order_count,
-- Percentage of customer's total
o.total_amount * 100.0 / NULLIF(
SUM(o.total_amount) OVER (PARTITION BY o.customer_id),
0
) as pct_of_customer_total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed';
-- Multiple PARTITION BY columns
SELECT
c.city,
c.name,
o.total_amount,
SUM(o.total_amount) OVER (
PARTITION BY c.city, DATE_TRUNC('month', o.order_date)
) as monthly_city_revenue
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
ORDER BY c.city, o.order_date;
-- PARTITION BY with ORDER BY: Running totals per group
SELECT
c.name,
o.order_date,
o.total_amount,
SUM(o.total_amount) OVER (
PARTITION BY c.id
ORDER BY o.order_date
) as running_total_per_customer,
AVG(o.total_amount) OVER (
PARTITION BY c.id
ORDER BY o.order_date
) as running_avg_per_customer
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
ORDER BY c.name, o.order_date;
-- Finding outliers: Values far from partition average
SELECT
c.name,
c.city,
o.total_amount,
AVG(o.total_amount) OVER (PARTITION BY c.city) as city_avg,
o.total_amount - AVG(o.total_amount) OVER (PARTITION BY c.city) as vs_city_avg
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
ORDER BY c.city, vs_city_avg DESC;
The Cat Analogy
Think of PARTITION BY like a cat show with multiple categories:
GROUP BY = One winner per category
Persian: Whiskers (Best in Show)
Siamese: Luna (Best in Show)
Tabby: Shadow (Best in Show)
One row per group—you lose the individual cats.
PARTITION BY = All cats stay, results per category shown
Whiskers (Persian) | 95 pts | #1 of 5 Persians
Mochi (Persian) | 90 pts | #2 of 5 Persians
Luna (Siamese) | 98 pts | #1 of 3 Siamese
Shadow (Tabby) | 85 pts | #1 of 8 Tabbies
Every cat stays in the competition, but each cat now knows:
- Their score
- How they rank within their breed
- Their breed's total participants
PARTITION BY adds "within your category" context without eliminating other categories.
Common Use Cases
| Use Case | Window Function |
|---|---|
| Per-customer totals | SUM() OVER (PARTITION BY customer_id) |
| Running totals | SUM() OVER (PARTITION BY ... ORDER BY ...) |
| Row numbers per group | ROW_NUMBER() OVER (PARTITION BY ...) |
| Percent of group total | value / SUM(value) OVER (PARTITION BY ...) |
| Compare to group average | value - AVG(value) OVER (PARTITION BY ...) |
Exercises
Exercise 1
Write a query showing each order with:
- The order amount
- The customer's total spent
- The city's total spent
- The order's percentage of city total
Exercise 2
Explain the difference between:
-- A
SELECT city, SUM(total_spent) FROM customers GROUP BY city;
-- B
SELECT name, city, SUM(total_spent) OVER (PARTITION BY city) FROM customers;
Exercise 3
Write a query to show each cat's number of orders and revenue, plus the breed's totals, while keeping all individual cat rows.
Exercise 4
Write a query using PARTITION BY with ORDER BY to show a running total of revenue per customer, ordered by date.
Exercise 5
What does this query return?
SELECT
name,
city,
total_spent,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY total_spent DESC)
FROM customers;
Key Takeaways
- PARTITION BY divides data into groups for window functions
- Unlike GROUP BY, rows are NOT collapsed—each row gets the partition's calculation
- Multiple columns: PARTITION BY col1, col2 creates groups for each unique combination
- PARTITION BY + ORDER BY = running calculations within groups
- PARTITION BY is only used with window functions (not standalone)
- Common patterns: running totals, per-group totals, row numbers per group