Home/Data Analyst/SQL Intermediate

Window Functions Intro

Window Functions Intro

Technical Explanation

Window functions perform calculations across a set of rows related to the current row, without collapsing those rows like GROUP BY does. They allow you to see both detail and aggregate data in the same row.

Key Concepts

Concept Description
Window Set of rows related to current row
OVER() Defines the window (partition, order)
PARTITION BY Groups rows (like GROUP BY)
ORDER BY Orders rows within partition

Window vs Aggregation

GROUP BY:    Rows collapse into groups (1 row per group)
Window:      Rows stay as-is, aggregate added to each row

Code Examples

Using the CatCafe dataset:

-- Table: orders
-- id, customer_id, cat_id, order_date, total_amount, status

-- Basic window: Add total revenue to each row
SELECT
    id,
    customer_id,
    order_date,
    total_amount,
    SUM(total_amount) OVER () as total_revenue
FROM orders
WHERE status = 'completed';

-- Window with PARTITION: Per-customer totals
SELECT
    id,
    customer_id,
    order_date,
    total_amount,
    SUM(total_amount) OVER (PARTITION BY customer_id) as customer_total,
    COUNT(*) OVER (PARTITION BY customer_id) as customer_order_count,
    AVG(total_amount) OVER (PARTITION BY customer_id) as customer_avg
FROM orders
WHERE status = 'completed';

-- Running total: Cumulative sum over time
SELECT
    id,
    customer_id,
    order_date,
    total_amount,
    SUM(total_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) as running_total
FROM orders
WHERE status = 'completed'
ORDER BY customer_id, order_date;

-- Row numbers: Assign sequence within group
SELECT
    id,
    customer_id,
    order_date,
    total_amount,
    ROW_NUMBER() OVER (
        PARTITION BY customer_id
        ORDER BY order_date DESC
    ) as order_seq  -- 1 = most recent
FROM orders
WHERE status = 'completed';

The Cat Analogy

Think of window functions like a cat show ranking system:

GROUP BY = Eliminating competition

Winner: Whiskers (1 trophy)
Second: Mochi (0.5 trophies)
Third: Luna (0.25 trophies)

You see only the winners, not all contestants.

Window = Keeping everyone, adding ranking

Whiskers   | 1st place (of 10) | Trophy count: 1
Mochi      | 2nd place (of 10) | Trophy count: 0.5
Luna       | 3rd place (of 10) | Trophy count: 0.25
Shadow     | 4th place (of 10) | Trophy count: 0.1
...

Every cat stays in the list, but each now has context about their position relative to others.


Common Window Functions

Function Purpose
ROW_NUMBER() Sequential number (1, 2, 3)
RANK() Rank with gaps (1, 2, 2, 4)
DENSE_RANK() Rank without gaps (1, 2, 2, 3)
SUM() OVER Running or total sum
AVG() OVER Moving or total average
LAG(col) Previous row's value
LEAD(col) Next row's value
FIRST_VALUE(col) First value in window
LAST_VALUE(col) Last value in window

Exercises

Exercise 1

Write a query showing each order with:

  • The order's total_amount
  • The customer's average order value
  • The percentage this order represents of the customer's total

Exercise 2

Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK().

Exercise 3

Write a query using LAG() to show each order alongside the previous order's amount for the same customer.

Exercise 4

Using the orders table, write a query that shows the running total of revenue per day (cumulative sum ordered by date).

Exercise 5

What happens if you use SUM() OVER() without PARTITION BY? How is this different from GROUP BY with a regular SUM() in SELECT?


Key Takeaways

  • Window functions compute across rows without collapsing them
  • OVER() defines the window; PARTITION BY groups; ORDER BY orders
  • Unlike GROUP BY, window functions keep all rows in the result
  • Common functions: ROW_NUMBER, RANK, SUM, AVG, LAG, LEAD
  • Window functions execute after FROM, WHERE, GROUP BY, HAVING
  • Use window functions instead of correlated subqueries for better performance