Cohort Analysis
Cohort Analysis
Technical Explanation
Cohort analysis groups customers by a shared characteristic (typically acquisition date) and tracks their behavior over time. It reveals how different customer groups behave relative to each other, helping identify if you're improving or declining in customer quality.
Cohort Types
| Type | Groups By | Use Case |
|---|---|---|
| Acquisition | Signup date | Compare customer quality over time |
| Behavioral | First purchase date | Compare by when first purchase occurred |
| Size | Cohort size | See if volume affects behavior |
Why Cohort Analysis Matters
Regular metrics: "Churn is 5% this month"
Cohort insight: "Churn is 5% overall, but customers from January have 12% churn
while customers from June have only 2% churn"
Same 5% hides a significant problem in older cohorts!
Code Examples
Using the CatCafe dataset:
-- Table: customers (id, registration_date)
-- Table: orders (id, customer_id, order_date, total_amount)
-- Monthly acquisition cohort table
WITH cohorts AS (
-- Assign cohort month to each customer
SELECT
c.id as customer_id,
DATE_TRUNC('month', c.registration_date) as cohort_month
FROM customers c
),
orders_with_cohorts AS (
-- Track each order with cohort info
SELECT
o.customer_id,
o.order_date,
o.total_amount,
c.cohort_month,
-- Months since acquisition
EXTRACT(
YEAR FROM (o.order_date - c.cohort_month)
) * 12 + EXTRACT(
MONTH FROM (o.order_date - c.cohort_month)
) as months_since_signup
FROM orders o
JOIN cohorts c ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
),
cohort_sizes AS (
SELECT cohort_month, COUNT(DISTINCT customer_id) as cohort_size
FROM cohorts
GROUP BY cohort_month
),
cohort_revenue AS (
SELECT
cohort_month,
months_since_signup,
COUNT(DISTINCT customer_id) as active_customers,
SUM(total_amount) as revenue
FROM orders_with_cohorts
GROUP BY cohort_month, months_since_signup
)
SELECT
cr.cohort_month,
cs.cohort_size,
cr.months_since_signup,
cr.active_customers,
cr.revenue,
cr.active_customers * 100.0 / NULLIF(cs.cohort_size, 0) as retention_rate
FROM cohort_revenue cr
JOIN cohort_sizes cs ON cr.cohort_month = cs.cohort_month
ORDER BY cr.cohort_month, cr.months_since_signup;
-- Cohort retention matrix
WITH cohorts AS (
SELECT
c.id as customer_id,
DATE_TRUNC('month', c.registration_date) as cohort_month
FROM customers c
),
orders_with_cohorts AS (
SELECT
o.customer_id,
c.cohort_month,
-- Month number (0 = acquisition month)
EXTRACT(
YEAR FROM (o.order_date - c.cohort_month)
) * 12 + EXTRACT(
MONTH FROM (o.order_date - c.cohort_month)
) as month_num
FROM orders o
JOIN cohorts c ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
),
cohort_sizes AS (
SELECT cohort_month, COUNT(DISTINCT customer_id) as size
FROM cohorts
GROUP BY cohort_month
),
monthly_retention AS (
SELECT
cohort_month,
month_num,
COUNT(DISTINCT customer_id) as active_customers
FROM orders_with_cohorts
GROUP BY cohort_month, month_num
)
SELECT
mr.cohort_month,
cs.size as cohort_size,
SUM(CASE WHEN month_num = 0 THEN active_customers END) as m0,
SUM(CASE WHEN month_num = 1 THEN active_customers END) as m1,
SUM(CASE WHEN month_num = 2 THEN active_customers END) as m2,
SUM(CASE WHEN month_num = 3 THEN active_customers END) as m3
FROM monthly_retention mr
JOIN cohort_sizes cs ON mr.cohort_month = cs.cohort_month
GROUP BY mr.cohort_month, cs.size
ORDER BY mr.cohort_month;
The Cat Analogy
Think of cohorts like cats from the same litter:
January Litter: Whiskers, Luna, Shadow (3 kittens)
Month 0: All 3 healthy
Month 1: All 3 healthy
Month 2: Luna got sick (2/3 healthy = 67% retention)
Month 3: Whiskers moved to new home (still counts)
Month 4: Shadow adopted (1/3 = 33% retention)
June Litter: Mochi, Bella, Oliver, Max (4 kittens)
Month 0: All 4 healthy
Month 1: All 4 healthy
Month 2: All 4 healthy
Month 3: All 4 healthy
Month 4: 3/4 adopted (75% retention)
Cohort insight: June litter is healthier and more adoptable! Despite overall 50% retention, we know June cats perform better.
Common Pitfalls
Incomplete Cohort Period
-- Don't compare cohorts with different ages!
-- January cohort has 12 months of data
-- June cohort has only 6 months
-- RIGHT: Only compare periods with equal age
SELECT *
FROM cohort_retention
WHERE EXTRACT(YEAR FROM cohort_month) * 12 + EXTRACT(MONTH FROM cohort_month)
>= EXTRACT(YEAR FROM CURRENT_DATE) * 12 + EXTRACT(MONTH FROM CURRENT_DATE) - 6;
Small Cohort Sizes
-- Cohort of 2 customers is meaningless
SELECT *
FROM cohort_retention
WHERE cohort_size < 10; -- Exclude small cohorts
Exercises
Exercise 1
Build a cohort retention table for monthly acquisition cohorts. Show retention for months 0, 1, 2, 3.
Exercise 2
What does it mean if newer cohorts have lower retention than older cohorts?
Exercise 3
Write a query to compare revenue per customer between cohorts.
Exercise 4
Why is it misleading to compare January cohort month 12 with June cohort month 6 directly?
Exercise 5
Design a cohort analysis for customer spending patterns by acquisition source.
Key Takeaways
- Cohort analysis groups customers by acquisition time and tracks behavior
- Reveals if customer quality is improving or declining over time
- Always compare cohorts at the same age (month 3 vs month 3)
- Watch for small cohort sizes (exclude cohorts < 10 customers)
- Cohort retention shows which acquisition periods produce loyal customers
- Revenue cohorts show if newer cohorts are more or less valuable