Segmentation Importance
Segmentation Importance
Technical Explanation
Segmentation divides customers or data into meaningful groups based on shared characteristics. Aggregated metrics hide important differences; segmentation reveals the truth beneath the surface.
Why Aggregate Metrics Mislead
Overall conversion rate: 3.2%
But:
- Email campaign customers: 8.5% conversion
- Organic customers: 2.1% conversion
- Paid ads customers: 1.2% conversion
Same 3.2% average — but optimizing paid ads vs email campaigns are completely different strategies!
Common Segmentation Dimensions
| Dimension | Examples |
|---|---|
| Demographic | Age, gender, location, income |
| Behavioral | Purchase frequency, categories, recency |
| Acquisition | Source, campaign, channel |
| Value | Lifetime value, average order value |
| Lifecycle | New, active, lapsed, churned |
Code Examples
Using the CatCafe dataset:
-- VANITY: Average metrics across all customers
SELECT
AVG(total_spent) as avg_spent, -- Misleading!
AVG(order_count) as avg_orders
FROM customers;
-- SEGMENTED: Average by customer segment
SELECT
segment,
COUNT(*) as customers,
AVG(total_spent) as avg_spent,
AVG(order_count) as avg_orders
FROM customers
GROUP BY segment
ORDER BY avg_spent DESC;
-- SEGMENTED: RFM Analysis (Recency, Frequency, Monetary)
WITH rfm AS (
SELECT
customer_id,
MAX(order_date) as last_order_date,
COUNT(*) as frequency,
SUM(total_amount) as monetary
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
)
SELECT
customer_id,
last_order_date,
CASE
WHEN last_order_date >= CURRENT_DATE - INTERVAL '30 days' THEN 'Active'
WHEN last_order_date >= CURRENT_DATE - INTERVAL '90 days' THEN 'At Risk'
ELSE 'Churned'
END as recency_segment,
CASE
WHEN frequency >= 10 THEN 'Frequent'
WHEN frequency >= 3 THEN 'Regular'
ELSE 'Occasional'
END as frequency_segment,
CASE
WHEN monetary >= 500 THEN 'High Value'
WHEN monetary >= 100 THEN 'Medium Value'
ELSE 'Low Value'
END as monetary_segment
FROM rfm;
-- SEGMENTATION: Cohort + Segment combined
SELECT
DATE_TRUNC('month', c.registration_date) as cohort_month,
c.segment,
COUNT(*) as customers,
AVG(o.total_amount) as avg_order_value
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY DATE_TRUNC('month', c.registration_date), c.segment
ORDER BY cohort_month, c.segment;
-- SEGMENTED: Conversion rates by channel
WITH channel_customers AS (
SELECT
customer_id,
acquisition_channel,
CASE
WHEN COUNT(*) >= 5 THEN 'Power User'
WHEN COUNT(*) >= 2 THEN 'Regular'
ELSE 'One-time'
END as user_type
FROM orders
WHERE status = 'completed'
GROUP BY customer_id, acquisition_channel
)
SELECT
acquisition_channel,
user_type,
COUNT(*) as customers,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY acquisition_channel) as pct_of_channel
FROM channel_customers
GROUP BY acquisition_channel, user_type
ORDER BY acquisition_channel, user_type;
The Cat Analogy
Think of segmentation like cat personality types:
Aggregate view: "Cats sleep 12 hours a day"
- Implies all cats sleep 12 hours
- Misses important variations
Segmented view:
Kittens (0-1 yr): Sleep 16 hours (growing!)
Adults (1-7 yrs): Sleep 12 hours
Seniors (7+ yrs): Sleep 16 hours (low energy)
Indoor cats: Sleep 14 hours
Outdoor cats: Sleep 8 hours (more activity)
Same average of 12 hours — but completely different realities!
Why it matters:
- Marketing to kittens: Emphasize "energy and play"
- Marketing to seniors: Emphasize "comfort and health"
- Same message to everyone = ineffective messaging
Common Pitfalls
Segmenting by Too Many Dimensions
-- WRONG: 47 segments means no actionable insight
SELECT
city, age_group, income_bracket, channel, product_category, ...
FROM customers
GROUP BY city, age_group, income_bracket, channel, product_category, ...;
-- Analysis paralysis!
-- RIGHT: Start with 2-3 key dimensions, expand as needed
SELECT
acquisition_channel,
CASE WHEN lifetime_value > 200 THEN 'High' ELSE 'Standard' END as value_tier
FROM customers
GROUP BY acquisition_channel, value_tier;
Segmenting Without Statistical Significance
-- WRONG: Acting on 2-person segment
SELECT segment, COUNT(*) as customers
FROM customers
GROUP BY segment
HAVING COUNT(*) < 30; -- Too small to be significant!
Exercises
Exercise 1
Write a query to segment customers by:
- Recency: Active (ordered in last 30 days), At Risk (60 days), Churned (90+ days)
- Frequency: High (5+ orders), Medium (2-4), Low (1) Then show average lifetime value per segment.
Exercise 2
Why might looking at overall average conversion rate be misleading?
Exercise 3
Write a query to find which customer segment has the highest month-over-month growth.
Exercise 4
What dimensions would you use to segment customers for a cat food subscription service?
Exercise 5
Write a query comparing retention rates across customer segments.
Key Takeaways
- Aggregate metrics hide important differences between groups
- Segmentation reveals the "why" behind the numbers
- Common dimensions: demographic, behavioral, acquisition, value, lifecycle
- Start with 2-3 dimensions, expand only when actionable
- Ensure segments are large enough for statistical significance
- Segments should inform different actions, not just be interesting facts