Home/Data Analyst/Business Metrics

Segmentation Importance

Segmentation Importance

Technical Explanation

Segmentation divides customers or data into meaningful groups based on shared characteristics. Aggregated metrics hide important differences; segmentation reveals the truth beneath the surface.

Why Aggregate Metrics Mislead

Overall conversion rate: 3.2%

But:
- Email campaign customers: 8.5% conversion
- Organic customers: 2.1% conversion
- Paid ads customers: 1.2% conversion

Same 3.2% average — but optimizing paid ads vs email campaigns are completely different strategies!

Common Segmentation Dimensions

Dimension Examples
Demographic Age, gender, location, income
Behavioral Purchase frequency, categories, recency
Acquisition Source, campaign, channel
Value Lifetime value, average order value
Lifecycle New, active, lapsed, churned

Code Examples

Using the CatCafe dataset:

-- VANITY: Average metrics across all customers
SELECT
    AVG(total_spent) as avg_spent,  -- Misleading!
    AVG(order_count) as avg_orders
FROM customers;

-- SEGMENTED: Average by customer segment
SELECT
    segment,
    COUNT(*) as customers,
    AVG(total_spent) as avg_spent,
    AVG(order_count) as avg_orders
FROM customers
GROUP BY segment
ORDER BY avg_spent DESC;

-- SEGMENTED: RFM Analysis (Recency, Frequency, Monetary)
WITH rfm AS (
    SELECT
        customer_id,
        MAX(order_date) as last_order_date,
        COUNT(*) as frequency,
        SUM(total_amount) as monetary
    FROM orders
    WHERE status = 'completed'
    GROUP BY customer_id
)
SELECT
    customer_id,
    last_order_date,
    CASE
        WHEN last_order_date >= CURRENT_DATE - INTERVAL '30 days' THEN 'Active'
        WHEN last_order_date >= CURRENT_DATE - INTERVAL '90 days' THEN 'At Risk'
        ELSE 'Churned'
    END as recency_segment,
    CASE
        WHEN frequency >= 10 THEN 'Frequent'
        WHEN frequency >= 3 THEN 'Regular'
        ELSE 'Occasional'
    END as frequency_segment,
    CASE
        WHEN monetary >= 500 THEN 'High Value'
        WHEN monetary >= 100 THEN 'Medium Value'
        ELSE 'Low Value'
    END as monetary_segment
FROM rfm;

-- SEGMENTATION: Cohort + Segment combined
SELECT
    DATE_TRUNC('month', c.registration_date) as cohort_month,
    c.segment,
    COUNT(*) as customers,
    AVG(o.total_amount) as avg_order_value
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY DATE_TRUNC('month', c.registration_date), c.segment
ORDER BY cohort_month, c.segment;

-- SEGMENTED: Conversion rates by channel
WITH channel_customers AS (
    SELECT
        customer_id,
        acquisition_channel,
        CASE
            WHEN COUNT(*) >= 5 THEN 'Power User'
            WHEN COUNT(*) >= 2 THEN 'Regular'
            ELSE 'One-time'
        END as user_type
    FROM orders
    WHERE status = 'completed'
    GROUP BY customer_id, acquisition_channel
)
SELECT
    acquisition_channel,
    user_type,
    COUNT(*) as customers,
    COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY acquisition_channel) as pct_of_channel
FROM channel_customers
GROUP BY acquisition_channel, user_type
ORDER BY acquisition_channel, user_type;

The Cat Analogy

Think of segmentation like cat personality types:

Aggregate view: "Cats sleep 12 hours a day"

  • Implies all cats sleep 12 hours
  • Misses important variations

Segmented view:

Kittens (0-1 yr): Sleep 16 hours (growing!)
Adults (1-7 yrs): Sleep 12 hours
Seniors (7+ yrs): Sleep 16 hours (low energy)
Indoor cats: Sleep 14 hours
Outdoor cats: Sleep 8 hours (more activity)

Same average of 12 hours — but completely different realities!

Why it matters:

  • Marketing to kittens: Emphasize "energy and play"
  • Marketing to seniors: Emphasize "comfort and health"
  • Same message to everyone = ineffective messaging

Common Pitfalls

Segmenting by Too Many Dimensions

-- WRONG: 47 segments means no actionable insight
SELECT
    city, age_group, income_bracket, channel, product_category, ...
FROM customers
GROUP BY city, age_group, income_bracket, channel, product_category, ...;
-- Analysis paralysis!

-- RIGHT: Start with 2-3 key dimensions, expand as needed
SELECT
    acquisition_channel,
    CASE WHEN lifetime_value > 200 THEN 'High' ELSE 'Standard' END as value_tier
FROM customers
GROUP BY acquisition_channel, value_tier;

Segmenting Without Statistical Significance

-- WRONG: Acting on 2-person segment
SELECT segment, COUNT(*) as customers
FROM customers
GROUP BY segment
HAVING COUNT(*) < 30;  -- Too small to be significant!

Exercises

Exercise 1

Write a query to segment customers by:

  • Recency: Active (ordered in last 30 days), At Risk (60 days), Churned (90+ days)
  • Frequency: High (5+ orders), Medium (2-4), Low (1) Then show average lifetime value per segment.

Exercise 2

Why might looking at overall average conversion rate be misleading?

Exercise 3

Write a query to find which customer segment has the highest month-over-month growth.

Exercise 4

What dimensions would you use to segment customers for a cat food subscription service?

Exercise 5

Write a query comparing retention rates across customer segments.


Key Takeaways

  • Aggregate metrics hide important differences between groups
  • Segmentation reveals the "why" behind the numbers
  • Common dimensions: demographic, behavioral, acquisition, value, lifecycle
  • Start with 2-3 dimensions, expand only when actionable
  • Ensure segments are large enough for statistical significance
  • Segments should inform different actions, not just be interesting facts