Home/Data Analyst/SQL Intermediate

One-to-Many JOIN Impact

One-to-Many JOIN Impact

Technical Explanation

A one-to-many (or one-to-N) relationship means one row in table A can be associated with multiple rows in table B. When you JOIN these tables, each matching row in B creates a new row in the result, multiplying the rows from A.

Understanding Cardinality

customers (1) ───────< orders (many)

1 customer → many orders
When JOINed: 1 customer row becomes N rows (one per order)

Why This Matters

Relationship Type JOIN Result
1:1 Rows don't multiply
1:Many A's rows multiply by number of B matches
Many:Many Potentially massive row explosion

Code Examples

Using the CatCafe dataset:

-- Tables:
-- cats: id, name, breed, age
-- orders: id, customer_id, cat_id, order_date, total_amount, status
-- order_items: id, order_id, cat_id, quantity

-- PROBLEM: One customer with 5 orders → 5 rows for that customer
SELECT
    c.name as customer_name,
    c.total_spent,
    o.id as order_id,
    o.total_amount
FROM customers c
JOIN orders o ON c.id = o.customer_id;
-- Customer Alice with 5 orders = 5 rows in result

-- If you just want customer stats, you'll overcount without aggregation
SELECT
    c.name,
    SUM(o.total_amount) as total_revenue  -- Correctly aggregates
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;

-- Without aggregation, you get duplicate rows
SELECT
    c.name,
    o.total_amount  -- Each order is a separate row!
FROM customers c
JOIN orders o ON c.id = o.customer_id;
-- If Alice has 5 orders of $20 each, she appears 5 times with $20 each!

-- Many-to-Many: orders ↔ cats through order_items
-- Each order can have multiple cats, each cat can be in multiple orders
-- This can cause MASSIVE row explosion!

-- Safe approach: Always aggregate before or after JOIN
WITH order_totals AS (
    -- Pre-aggregate to one row per order
    SELECT
        order_id,
        SUM(total_amount) as order_total
    FROM order_items
    GROUP BY order_id
)
SELECT
    c.name,
    COUNT(DISTINCT o.id) as order_count,
    SUM(ot.order_total) as total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_totals ot ON o.id = ot.order_id
GROUP BY c.id, c.name;

The Cat Analogy

You're making a cat cafe seating chart:

One-to-Many: One room, many tables

Room A (the customer room)
    ├── Table 1 (Alice + Whiskers)
    ├── Table 2 (Alice + Luna)      ← Alice appears again!
    └── Table 3 (Bob + Shadow)

If you list "who's at each table":
- Alice at Table 1
- Alice at Table 2  ← Alice counted twice!
- Bob at Table 3

If you just list tables (without counting), Alice appears once per table she occupies. To know how many customers actually came, you need COUNT(DISTINCT customer_name).

Same with JOINs: If you JOIN customers to orders without aggregating, Alice (who has 5 orders) appears 5 times. Her "total spent" appears as 5 separate $50 entries, not one $250 total.


Diagnosing and Fixing

Diagnose: Check for Unexpected Multiplication

-- See how many orders each customer has
SELECT
    customer_id,
    COUNT(*) as order_count,
    COUNT(DISTINCT order_id) as distinct_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > COUNT(DISTINCT order_id);
-- If these differ, you have duplicate-order issues

Fix: Aggregate Before JOIN

-- Instead of JOINing all orders, pre-aggregate
WITH customer_stats AS (
    SELECT
        customer_id,
        COUNT(*) as order_count,
        SUM(total_amount) as total_spent
    FROM orders
    GROUP BY customer_id
)
SELECT
    c.name,
    cs.order_count,
    cs.total_spent
FROM customers c
LEFT JOIN customer_stats cs ON c.id = cs.customer_id;
-- One row per customer, no multiplication

Fix: Use DISTINCT

-- When you need actual rows but want unique customers
SELECT DISTINCT
    c.name,
    c.email
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed';
-- Even if customer has 10 orders, they appear only once

Exercises

Exercise 1

If a customer has 3 orders ($50, $75, $100), and you JOIN customers to orders without GROUP BY, how many rows does that customer appear in, and what's wrong with naively summing their order amounts?

Exercise 2

Write a query to find the total number of unique customers who adopted cats, using JOINs that might cause multiplication.

Exercise 3

What is the difference between these two queries:

-- A
SELECT COUNT(*) FROM customers JOIN orders ON customers.id = orders.customer_id;
-- B
SELECT COUNT(DISTINCT customers.id) FROM customers JOIN orders ON customers.id = orders.customer_id;

Exercise 4

Write a query to get the number of orders per customer WITHOUT causing row multiplication.

Exercise 5

A customer has 2 orders, each with 3 order_items (cats). If you JOIN customers → orders → order_items without aggregation, how many rows result?


Key Takeaways

  • 1:Many JOINs multiply rows from the "one" side
  • Always aggregate after a JOIN on 1:Many relationships
  • Use COUNT(DISTINCT) to count unique entities in JOINed tables
  • Pre-aggregating before JOINing prevents row explosion
  • Be suspicious when row counts seem higher than expected
  • DISTINCT can help when you need unique pairs, not aggregates