Home/Data Analyst/SQL Intermediate

PARTITION BY

PARTITION BY

Technical Explanation

PARTITION BY is a clause in window functions that divides the result set into groups (partitions) over which the window function operates independently. It's similar to GROUP BY, but unlike GROUP BY, it doesn't collapse rows—it keeps all rows while computing the window function for each partition.

Syntax

function_name() OVER (
    PARTITION BY column1, column2, ...
    [ORDER BY ...]
    [...]
)

Key Differences: PARTITION BY vs GROUP BY

Aspect PARTITION BY GROUP BY
Rows in result All rows preserved Collapsed to one per group
Placement Only in window functions Standalone clause
Combining Can mix with regular columns Cannot mix with ungrouped columns

Code Examples

Using the CatCafe dataset:

-- Tables:
-- customers: id, name, city, total_spent
-- orders: id, customer_id, cat_id, order_date, total_amount, status

-- PARTITION BY: Per-group calculations while keeping all rows
SELECT
    o.id,
    o.order_date,
    o.total_amount,
    c.name as customer_name,
    c.city,
    -- Total spent by customer
    SUM(o.total_amount) OVER (
        PARTITION BY o.customer_id
    ) as customer_total_spent,
    -- Total spent by city
    SUM(o.total_amount) OVER (
        PARTITION BY c.city
    ) as city_total_spent,
    -- Customer's order count
    COUNT(*) OVER (
        PARTITION BY o.customer_id
    ) as customer_order_count,
    -- Percentage of customer's total
    o.total_amount * 100.0 / NULLIF(
        SUM(o.total_amount) OVER (PARTITION BY o.customer_id),
        0
    ) as pct_of_customer_total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed';

-- Multiple PARTITION BY columns
SELECT
    c.city,
    c.name,
    o.total_amount,
    SUM(o.total_amount) OVER (
        PARTITION BY c.city, DATE_TRUNC('month', o.order_date)
    ) as monthly_city_revenue
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
ORDER BY c.city, o.order_date;

-- PARTITION BY with ORDER BY: Running totals per group
SELECT
    c.name,
    o.order_date,
    o.total_amount,
    SUM(o.total_amount) OVER (
        PARTITION BY c.id
        ORDER BY o.order_date
    ) as running_total_per_customer,
    AVG(o.total_amount) OVER (
        PARTITION BY c.id
        ORDER BY o.order_date
    ) as running_avg_per_customer
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
ORDER BY c.name, o.order_date;

-- Finding outliers: Values far from partition average
SELECT
    c.name,
    c.city,
    o.total_amount,
    AVG(o.total_amount) OVER (PARTITION BY c.city) as city_avg,
    o.total_amount - AVG(o.total_amount) OVER (PARTITION BY c.city) as vs_city_avg
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'completed'
ORDER BY c.city, vs_city_avg DESC;

The Cat Analogy

Think of PARTITION BY like a cat show with multiple categories:

GROUP BY = One winner per category

Persian: Whiskers (Best in Show)
Siamese: Luna (Best in Show)
Tabby: Shadow (Best in Show)

One row per group—you lose the individual cats.

PARTITION BY = All cats stay, results per category shown

Whiskers (Persian) | 95 pts | #1 of 5 Persians
Mochi (Persian)    | 90 pts | #2 of 5 Persians
Luna (Siamese)     | 98 pts | #1 of 3 Siamese
Shadow (Tabby)      | 85 pts | #1 of 8 Tabbies

Every cat stays in the competition, but each cat now knows:

  • Their score
  • How they rank within their breed
  • Their breed's total participants

PARTITION BY adds "within your category" context without eliminating other categories.


Common Use Cases

Use Case Window Function
Per-customer totals SUM() OVER (PARTITION BY customer_id)
Running totals SUM() OVER (PARTITION BY ... ORDER BY ...)
Row numbers per group ROW_NUMBER() OVER (PARTITION BY ...)
Percent of group total value / SUM(value) OVER (PARTITION BY ...)
Compare to group average value - AVG(value) OVER (PARTITION BY ...)

Exercises

Exercise 1

Write a query showing each order with:

  • The order amount
  • The customer's total spent
  • The city's total spent
  • The order's percentage of city total

Exercise 2

Explain the difference between:

-- A
SELECT city, SUM(total_spent) FROM customers GROUP BY city;
-- B
SELECT name, city, SUM(total_spent) OVER (PARTITION BY city) FROM customers;

Exercise 3

Write a query to show each cat's number of orders and revenue, plus the breed's totals, while keeping all individual cat rows.

Exercise 4

Write a query using PARTITION BY with ORDER BY to show a running total of revenue per customer, ordered by date.

Exercise 5

What does this query return?

SELECT
    name,
    city,
    total_spent,
    ROW_NUMBER() OVER (PARTITION BY city ORDER BY total_spent DESC)
FROM customers;

Key Takeaways

  • PARTITION BY divides data into groups for window functions
  • Unlike GROUP BY, rows are NOT collapsed—each row gets the partition's calculation
  • Multiple columns: PARTITION BY col1, col2 creates groups for each unique combination
  • PARTITION BY + ORDER BY = running calculations within groups
  • PARTITION BY is only used with window functions (not standalone)
  • Common patterns: running totals, per-group totals, row numbers per group