Home/Data Analyst/Business Metrics

Cohort Analysis

Cohort Analysis

Technical Explanation

Cohort analysis groups customers by a shared characteristic (typically acquisition date) and tracks their behavior over time. It reveals how different customer groups behave relative to each other, helping identify if you're improving or declining in customer quality.

Cohort Types

Type Groups By Use Case
Acquisition Signup date Compare customer quality over time
Behavioral First purchase date Compare by when first purchase occurred
Size Cohort size See if volume affects behavior

Why Cohort Analysis Matters

Regular metrics: "Churn is 5% this month"
Cohort insight: "Churn is 5% overall, but customers from January have 12% churn
                while customers from June have only 2% churn"

Same 5% hides a significant problem in older cohorts!


Code Examples

Using the CatCafe dataset:

-- Table: customers (id, registration_date)
-- Table: orders (id, customer_id, order_date, total_amount)

-- Monthly acquisition cohort table
WITH cohorts AS (
    -- Assign cohort month to each customer
    SELECT
        c.id as customer_id,
        DATE_TRUNC('month', c.registration_date) as cohort_month
    FROM customers c
),
orders_with_cohorts AS (
    -- Track each order with cohort info
    SELECT
        o.customer_id,
        o.order_date,
        o.total_amount,
        c.cohort_month,
        -- Months since acquisition
        EXTRACT(
            YEAR FROM (o.order_date - c.cohort_month)
        ) * 12 + EXTRACT(
            MONTH FROM (o.order_date - c.cohort_month)
        ) as months_since_signup
    FROM orders o
    JOIN cohorts c ON o.customer_id = c.customer_id
    WHERE o.status = 'completed'
),
cohort_sizes AS (
    SELECT cohort_month, COUNT(DISTINCT customer_id) as cohort_size
    FROM cohorts
    GROUP BY cohort_month
),
cohort_revenue AS (
    SELECT
        cohort_month,
        months_since_signup,
        COUNT(DISTINCT customer_id) as active_customers,
        SUM(total_amount) as revenue
    FROM orders_with_cohorts
    GROUP BY cohort_month, months_since_signup
)
SELECT
    cr.cohort_month,
    cs.cohort_size,
    cr.months_since_signup,
    cr.active_customers,
    cr.revenue,
    cr.active_customers * 100.0 / NULLIF(cs.cohort_size, 0) as retention_rate
FROM cohort_revenue cr
JOIN cohort_sizes cs ON cr.cohort_month = cs.cohort_month
ORDER BY cr.cohort_month, cr.months_since_signup;

-- Cohort retention matrix
WITH cohorts AS (
    SELECT
        c.id as customer_id,
        DATE_TRUNC('month', c.registration_date) as cohort_month
    FROM customers c
),
orders_with_cohorts AS (
    SELECT
        o.customer_id,
        c.cohort_month,
        -- Month number (0 = acquisition month)
        EXTRACT(
            YEAR FROM (o.order_date - c.cohort_month)
        ) * 12 + EXTRACT(
            MONTH FROM (o.order_date - c.cohort_month)
        ) as month_num
    FROM orders o
    JOIN cohorts c ON o.customer_id = c.customer_id
    WHERE o.status = 'completed'
),
cohort_sizes AS (
    SELECT cohort_month, COUNT(DISTINCT customer_id) as size
    FROM cohorts
    GROUP BY cohort_month
),
monthly_retention AS (
    SELECT
        cohort_month,
        month_num,
        COUNT(DISTINCT customer_id) as active_customers
    FROM orders_with_cohorts
    GROUP BY cohort_month, month_num
)
SELECT
    mr.cohort_month,
    cs.size as cohort_size,
    SUM(CASE WHEN month_num = 0 THEN active_customers END) as m0,
    SUM(CASE WHEN month_num = 1 THEN active_customers END) as m1,
    SUM(CASE WHEN month_num = 2 THEN active_customers END) as m2,
    SUM(CASE WHEN month_num = 3 THEN active_customers END) as m3
FROM monthly_retention mr
JOIN cohort_sizes cs ON mr.cohort_month = cs.cohort_month
GROUP BY mr.cohort_month, cs.size
ORDER BY mr.cohort_month;

The Cat Analogy

Think of cohorts like cats from the same litter:

January Litter: Whiskers, Luna, Shadow (3 kittens)
  Month 0: All 3 healthy
  Month 1: All 3 healthy
  Month 2: Luna got sick (2/3 healthy = 67% retention)
  Month 3: Whiskers moved to new home (still counts)
  Month 4: Shadow adopted (1/3 = 33% retention)

June Litter: Mochi, Bella, Oliver, Max (4 kittens)
  Month 0: All 4 healthy
  Month 1: All 4 healthy
  Month 2: All 4 healthy
  Month 3: All 4 healthy
  Month 4: 3/4 adopted (75% retention)

Cohort insight: June litter is healthier and more adoptable! Despite overall 50% retention, we know June cats perform better.


Common Pitfalls

Incomplete Cohort Period

-- Don't compare cohorts with different ages!
-- January cohort has 12 months of data
-- June cohort has only 6 months

-- RIGHT: Only compare periods with equal age
SELECT *
FROM cohort_retention
WHERE EXTRACT(YEAR FROM cohort_month) * 12 + EXTRACT(MONTH FROM cohort_month)
      >= EXTRACT(YEAR FROM CURRENT_DATE) * 12 + EXTRACT(MONTH FROM CURRENT_DATE) - 6;

Small Cohort Sizes

-- Cohort of 2 customers is meaningless
SELECT *
FROM cohort_retention
WHERE cohort_size < 10;  -- Exclude small cohorts

Exercises

Exercise 1

Build a cohort retention table for monthly acquisition cohorts. Show retention for months 0, 1, 2, 3.

Exercise 2

What does it mean if newer cohorts have lower retention than older cohorts?

Exercise 3

Write a query to compare revenue per customer between cohorts.

Exercise 4

Why is it misleading to compare January cohort month 12 with June cohort month 6 directly?

Exercise 5

Design a cohort analysis for customer spending patterns by acquisition source.


Key Takeaways

  • Cohort analysis groups customers by acquisition time and tracks behavior
  • Reveals if customer quality is improving or declining over time
  • Always compare cohorts at the same age (month 3 vs month 3)
  • Watch for small cohort sizes (exclude cohorts < 10 customers)
  • Cohort retention shows which acquisition periods produce loyal customers
  • Revenue cohorts show if newer cohorts are more or less valuable