SUM vs AVG
SUM vs AVG
Technical Explanation
SUM and AVG are aggregate functions that compute summary statistics over a set of values. The key difference is what they represent mathematically and how they handle the data.
SUM: Total
Adds all non-NULL values in a column.
SUM(column)
AVG: Mean
Divides the sum of non-NULL values by the count of non-NULL values.
AVG(column) = SUM(column) / COUNT(column)
NULL Handling
Both functions ignore NULL values:
-- If values are: 10, 20, NULL, 30
SUM(values) = 60 (10 + 20 + 30)
AVG(values) = 20 (60 / 3, NULL excluded from both sum and count)
Code Examples
Using the CatCafe dataset:
-- Table: orders
-- id, customer_id, cat_id, order_date, total_amount, status
-- Basic SUM: Total revenue
SELECT
SUM(total_amount) as total_revenue
FROM orders
WHERE status = 'completed';
-- Basic AVG: Average order value
SELECT
AVG(total_amount) as avg_order_value
FROM orders
WHERE status = 'completed';
-- SUM vs AVG for the same data
SELECT
COUNT(*) as order_count,
SUM(total_amount) as total_revenue,
AVG(total_amount) as avg_order_value,
SUM(total_amount) / COUNT(*) as manual_avg
FROM orders
WHERE status = 'completed';
-- SUM per group
SELECT
customer_id,
SUM(total_amount) as customer_lifetime_value
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
ORDER BY customer_lifetime_value DESC;
-- AVG per group: Average order value per customer
SELECT
customer_id,
COUNT(*) as order_count,
SUM(total_amount) as total_spent,
AVG(total_amount) as avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
HAVING AVG(total_amount) > 50;
-- Combining SUM and AVG
SELECT
cats.name as cat_name,
COUNT(orders.id) as times_ordered,
SUM(orders.total_amount) as total_revenue,
AVG(orders.total_amount) as avg_revenue_per_order
FROM cats
LEFT JOIN orders
ON cats.id = orders.cat_id
WHERE orders.status = 'completed'
GROUP BY cats.id, cats.name
ORDER BY total_revenue DESC;
The Cat Analogy
You're tracking how much cat treats each cat eats:
SUM = "Total treats eaten by all cats combined"
Whiskers: 50 treats
Mochi: 30 treats
Luna: 45 treats
--------------------
SUM = 125 treats (total treats consumed)
AVG = "Average treats per cat"
SUM / COUNT(cats) = 125 / 3 = ~41.67 treats per cat
When they're different: If Whiskers eats 100 treats, Mochi eats 10, and Luna eats 5:
SUM = 115 treats total
AVG = 38.3 treats per cat
SUM tells you the total impact; AVG tells you the typical value.
Common Pitfalls
Confusing Count in AVG
-- AVG of averages is usually wrong!
SELECT
AVG(avg_order_value_per_customer) -- Wrong: average of averages
FROM (
SELECT
customer_id,
AVG(total_amount) as avg_order_value
FROM orders
GROUP BY customer_id
) customer_avgs;
-- Instead, calculate overall average:
SELECT
SUM(total_amount) / COUNT(DISTINCT customer_id) as true_avg_per_customer
FROM orders;
SUM with Zero Rows
-- If no rows match WHERE, SUM returns NULL, not 0
SELECT SUM(total_amount) FROM orders WHERE status = 'nonexistent';
-- Returns: NULL
-- Handle with COALESCE
SELECT COALESCE(SUM(total_amount), 0) FROM orders WHERE status = 'nonexistent';
-- Returns: 0
SUM/AVG of Flags (Boolean Columns)
-- Counting "completed" orders with SUM of a flag
SELECT
COUNT(*) as total_orders,
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as completed_orders,
AVG(CASE WHEN status = 'completed' THEN 1.0 ELSE 0 END) as completion_rate
FROM orders;
-- completion_rate gives you percentage as decimal
Exercises
Exercise 1
Write a query showing:
- Total revenue (SUM)
- Number of orders (COUNT)
- Average order value (AVG)
- Average value per customer (SUM / COUNT DISTINCT)
Exercise 2
Explain why AVG(total_spent) in a grouped query gives different results than a query without GROUP BY.
Exercise 3
Write a query to find the customer with the highest average order value (minimum 3 orders).
Exercise 4
If you have 100 orders totaling $5,000, and 20 of those orders (totaling $1,000) have status = 'refunded', what is the true average order value for completed orders?
Exercise 5
Write a query to show per-cat statistics: total revenue, number of orders, average revenue per order, for cats that have been ordered at least 3 times.
Key Takeaways
- SUM = total of all non-NULL values
- AVG = mean (SUM / COUNT of non-NULLs)
- Both ignore NULL values (don't treat them as zero)
- SUM returns NULL on empty sets; use COALESCE to default to 0
- Be careful not to average averages (usually wrong)
- Use SUM with CASE WHEN to count conditions as 1/0
- For percentages, AVG of a 0/1 flag gives you the rate directly