Home/Data Analyst/SQL Fundamentals

LIMIT Purpose

LIMIT Purpose

Technical Explanation

LIMIT restricts the number of rows returned by a query. It's essential for pagination, Top-N queries, and sampling data.

Syntax

SELECT columns
FROM table
[WHERE ...]
[ORDER BY ...]
LIMIT n [OFFSET m];

Key Concepts

  1. LIMIT n — Return at most n rows
  2. OFFSET m — Skip the first m rows
  3. ORDER BY + LIMIT — Essential for meaningful Top-N results
  4. LIMIT with ties — May return more rows if multiple rows have the same value

Database Variations

Database Syntax
PostgreSQL, MySQL LIMIT n OFFSET m
SQL Server FETCH FIRST n ROWS ONLY + OFFSET m ROWS
Oracle FETCH FIRST n ROWS ONLY + OFFSET m ROWS

Code Examples

Using the CatCafe dataset:

-- Table: orders
-- id, customer_id, cat_id, order_date, total_amount, status

-- Basic LIMIT: Just get a sample
SELECT *
FROM customers
LIMIT 10;  -- First 10 rows (no guaranteed order)

-- Top-N with ORDER BY: Best customers
SELECT
    name,
    total_spent
FROM customers
ORDER BY total_spent DESC
LIMIT 5;

-- Pagination: Page 2 (rows 11-20)
SELECT *
FROM customers
ORDER BY id
LIMIT 10 OFFSET 10;

-- Better pagination (PostgreSQL)
SELECT *
FROM customers
ORDER BY id
LIMIT 10 OFFSET (2-1) * 10;  -- Page 2

-- Complex example: Top 3 cats per breed by order count
WITH ranked_cats AS (
    SELECT
        cats.id,
        cats.name,
        cats.breed,
        COUNT(orders.id) as order_count,
        ROW_NUMBER() OVER (
            PARTITION BY cats.breed
            ORDER BY COUNT(orders.id) DESC
        ) as rank
    FROM cats
    LEFT JOIN orders
        ON cats.id = orders.cat_id
    GROUP BY cats.id, cats.name, cats.breed
)
SELECT *
FROM ranked_cats
WHERE rank <= 3
ORDER BY breed, rank;

-- Random sampling (PostgreSQL)
SELECT *
FROM customers
ORDER BY RANDOM()
LIMIT 100;  -- 100 random customers

The Cat Analogy

Think of LIMIT like a cat carrier with limited space:

LIMIT 3 = Your carrier holds exactly 3 cats

  • You can only bring 3 cats to the show
  • Which 3 you bring matters (ORDER BY determines which!)

OFFSET 5 = Already have 5 cats in the van

  • You skip them and grab the next ones
  • OFFSET says "start from position N+1"

LIMIT 3 OFFSET 5 = Get cats 6, 7, 8

Cat positions: 1  2  3  4  5 | 6  7  8  9  10
                              ^
                              OFFSET 5 = skip first 5
 Cats in carrier:              [6] [7] [8]   → LIMIT 3 = take 3

Without ORDER BY, LIMIT just grabs "whoever fits first" — which cats exactly depends on storage order!


Common Pitfalls

LIMIT Without ORDER BY

-- No guaranteed results!
SELECT * FROM customers LIMIT 5;
-- Could return any 5 rows

-- Always ORDER BY when order matters:
SELECT * FROM customers ORDER BY id LIMIT 5;
-- Deterministic: same 5 rows every time

OFFSET for Deep Pagination

-- Problem: OFFSET jumps over rows, then discards them
SELECT *
FROM orders
ORDER BY id
LIMIT 10 OFFSET 100000;  -- Counts and discards first 100k rows!

-- Better for deep pagination: Keyset pagination
SELECT *
FROM orders
WHERE id > :last_seen_id
ORDER BY id
LIMIT 10;
-- Uses index, doesn't skip rows

LIMIT and HAVING Interaction

-- LIMIT is applied AFTER HAVING, so this is valid:
SELECT
    city,
    COUNT(*) as customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) > 10
LIMIT 5;
-- Returns 5 cities with >10 customers, sorted arbitrarily

Exercises

Exercise 1

Write a query to get page 3 of customers (20 per page), sorted by registration date descending.

Exercise 2

What does this query return? Is it deterministic?

SELECT name FROM customers LIMIT 5;

Exercise 3

Write a query to get the second-highest spending customer (without using subqueries).

Exercise 4

Explain why deep OFFSET pagination is slow. What should you use instead?

Exercise 5

Write a query to get the 3 most recent orders, but only for customers who have spent over $100 total.


Key Takeaways

  • LIMIT restricts rows returned; OFFSET skips rows
  • Always use ORDER BY with LIMIT when order matters
  • LIMIT is applied late in query execution (after SELECT)
  • OFFSET discards rows (inefficient for large offsets)
  • For deep pagination, use keyset/cursor-based pagination instead
  • Different databases have different syntax (LIMIT vs FETCH FIRST)