Window Functions Intro
Window Functions Intro
Technical Explanation
Window functions perform calculations across a set of rows related to the current row, without collapsing those rows like GROUP BY does. They allow you to see both detail and aggregate data in the same row.
Key Concepts
| Concept | Description |
|---|---|
| Window | Set of rows related to current row |
| OVER() | Defines the window (partition, order) |
| PARTITION BY | Groups rows (like GROUP BY) |
| ORDER BY | Orders rows within partition |
Window vs Aggregation
GROUP BY: Rows collapse into groups (1 row per group)
Window: Rows stay as-is, aggregate added to each row
Code Examples
Using the CatCafe dataset:
-- Table: orders
-- id, customer_id, cat_id, order_date, total_amount, status
-- Basic window: Add total revenue to each row
SELECT
id,
customer_id,
order_date,
total_amount,
SUM(total_amount) OVER () as total_revenue
FROM orders
WHERE status = 'completed';
-- Window with PARTITION: Per-customer totals
SELECT
id,
customer_id,
order_date,
total_amount,
SUM(total_amount) OVER (PARTITION BY customer_id) as customer_total,
COUNT(*) OVER (PARTITION BY customer_id) as customer_order_count,
AVG(total_amount) OVER (PARTITION BY customer_id) as customer_avg
FROM orders
WHERE status = 'completed';
-- Running total: Cumulative sum over time
SELECT
id,
customer_id,
order_date,
total_amount,
SUM(total_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) as running_total
FROM orders
WHERE status = 'completed'
ORDER BY customer_id, order_date;
-- Row numbers: Assign sequence within group
SELECT
id,
customer_id,
order_date,
total_amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) as order_seq -- 1 = most recent
FROM orders
WHERE status = 'completed';
The Cat Analogy
Think of window functions like a cat show ranking system:
GROUP BY = Eliminating competition
Winner: Whiskers (1 trophy)
Second: Mochi (0.5 trophies)
Third: Luna (0.25 trophies)
You see only the winners, not all contestants.
Window = Keeping everyone, adding ranking
Whiskers | 1st place (of 10) | Trophy count: 1
Mochi | 2nd place (of 10) | Trophy count: 0.5
Luna | 3rd place (of 10) | Trophy count: 0.25
Shadow | 4th place (of 10) | Trophy count: 0.1
...
Every cat stays in the list, but each now has context about their position relative to others.
Common Window Functions
| Function | Purpose |
|---|---|
| ROW_NUMBER() | Sequential number (1, 2, 3) |
| RANK() | Rank with gaps (1, 2, 2, 4) |
| DENSE_RANK() | Rank without gaps (1, 2, 2, 3) |
| SUM() OVER | Running or total sum |
| AVG() OVER | Moving or total average |
| LAG(col) | Previous row's value |
| LEAD(col) | Next row's value |
| FIRST_VALUE(col) | First value in window |
| LAST_VALUE(col) | Last value in window |
Exercises
Exercise 1
Write a query showing each order with:
- The order's total_amount
- The customer's average order value
- The percentage this order represents of the customer's total
Exercise 2
Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK().
Exercise 3
Write a query using LAG() to show each order alongside the previous order's amount for the same customer.
Exercise 4
Using the orders table, write a query that shows the running total of revenue per day (cumulative sum ordered by date).
Exercise 5
What happens if you use SUM() OVER() without PARTITION BY? How is this different from GROUP BY with a regular SUM() in SELECT?
Key Takeaways
- Window functions compute across rows without collapsing them
- OVER() defines the window; PARTITION BY groups; ORDER BY orders
- Unlike GROUP BY, window functions keep all rows in the result
- Common functions: ROW_NUMBER, RANK, SUM, AVG, LAG, LEAD
- Window functions execute after FROM, WHERE, GROUP BY, HAVING
- Use window functions instead of correlated subqueries for better performance