Home/Data Analyst/SQL Fundamentals

SUM vs AVG

SUM vs AVG

Technical Explanation

SUM and AVG are aggregate functions that compute summary statistics over a set of values. The key difference is what they represent mathematically and how they handle the data.

SUM: Total

Adds all non-NULL values in a column.

SUM(column)

AVG: Mean

Divides the sum of non-NULL values by the count of non-NULL values.

AVG(column) = SUM(column) / COUNT(column)

NULL Handling

Both functions ignore NULL values:

-- If values are: 10, 20, NULL, 30
SUM(values)  = 60   (10 + 20 + 30)
AVG(values)  = 20   (60 / 3, NULL excluded from both sum and count)

Code Examples

Using the CatCafe dataset:

-- Table: orders
-- id, customer_id, cat_id, order_date, total_amount, status

-- Basic SUM: Total revenue
SELECT
    SUM(total_amount) as total_revenue
FROM orders
WHERE status = 'completed';

-- Basic AVG: Average order value
SELECT
    AVG(total_amount) as avg_order_value
FROM orders
WHERE status = 'completed';

-- SUM vs AVG for the same data
SELECT
    COUNT(*) as order_count,
    SUM(total_amount) as total_revenue,
    AVG(total_amount) as avg_order_value,
    SUM(total_amount) / COUNT(*) as manual_avg
FROM orders
WHERE status = 'completed';

-- SUM per group
SELECT
    customer_id,
    SUM(total_amount) as customer_lifetime_value
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
ORDER BY customer_lifetime_value DESC;

-- AVG per group: Average order value per customer
SELECT
    customer_id,
    COUNT(*) as order_count,
    SUM(total_amount) as total_spent,
    AVG(total_amount) as avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
HAVING AVG(total_amount) > 50;

-- Combining SUM and AVG
SELECT
    cats.name as cat_name,
    COUNT(orders.id) as times_ordered,
    SUM(orders.total_amount) as total_revenue,
    AVG(orders.total_amount) as avg_revenue_per_order
FROM cats
LEFT JOIN orders
    ON cats.id = orders.cat_id
WHERE orders.status = 'completed'
GROUP BY cats.id, cats.name
ORDER BY total_revenue DESC;

The Cat Analogy

You're tracking how much cat treats each cat eats:

SUM = "Total treats eaten by all cats combined"

Whiskers: 50 treats
Mochi: 30 treats
Luna: 45 treats
--------------------
SUM = 125 treats (total treats consumed)

AVG = "Average treats per cat"

SUM / COUNT(cats) = 125 / 3 = ~41.67 treats per cat

When they're different: If Whiskers eats 100 treats, Mochi eats 10, and Luna eats 5:

SUM = 115 treats total
AVG = 38.3 treats per cat

SUM tells you the total impact; AVG tells you the typical value.


Common Pitfalls

Confusing Count in AVG

-- AVG of averages is usually wrong!
SELECT
    AVG(avg_order_value_per_customer)  -- Wrong: average of averages
FROM (
    SELECT
        customer_id,
        AVG(total_amount) as avg_order_value
    FROM orders
    GROUP BY customer_id
) customer_avgs;

-- Instead, calculate overall average:
SELECT
    SUM(total_amount) / COUNT(DISTINCT customer_id) as true_avg_per_customer
FROM orders;

SUM with Zero Rows

-- If no rows match WHERE, SUM returns NULL, not 0
SELECT SUM(total_amount) FROM orders WHERE status = 'nonexistent';
-- Returns: NULL

-- Handle with COALESCE
SELECT COALESCE(SUM(total_amount), 0) FROM orders WHERE status = 'nonexistent';
-- Returns: 0

SUM/AVG of Flags (Boolean Columns)

-- Counting "completed" orders with SUM of a flag
SELECT
    COUNT(*) as total_orders,
    SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as completed_orders,
    AVG(CASE WHEN status = 'completed' THEN 1.0 ELSE 0 END) as completion_rate
FROM orders;
-- completion_rate gives you percentage as decimal

Exercises

Exercise 1

Write a query showing:

  • Total revenue (SUM)
  • Number of orders (COUNT)
  • Average order value (AVG)
  • Average value per customer (SUM / COUNT DISTINCT)

Exercise 2

Explain why AVG(total_spent) in a grouped query gives different results than a query without GROUP BY.

Exercise 3

Write a query to find the customer with the highest average order value (minimum 3 orders).

Exercise 4

If you have 100 orders totaling $5,000, and 20 of those orders (totaling $1,000) have status = 'refunded', what is the true average order value for completed orders?

Exercise 5

Write a query to show per-cat statistics: total revenue, number of orders, average revenue per order, for cats that have been ordered at least 3 times.


Key Takeaways

  • SUM = total of all non-NULL values
  • AVG = mean (SUM / COUNT of non-NULLs)
  • Both ignore NULL values (don't treat them as zero)
  • SUM returns NULL on empty sets; use COALESCE to default to 0
  • Be careful not to average averages (usually wrong)
  • Use SUM with CASE WHEN to count conditions as 1/0
  • For percentages, AVG of a 0/1 flag gives you the rate directly