Published on

Writing SQL You'll Thank Yourself For

Authors

SQL is the most widely used programming language that most developers never really study. People learn enough to get data out and move on. The result is queries that work in development but behave unexpectedly at scale, or are so cryptic that nobody wants to touch them six months later.

Here are the practices that make a real difference.

Be Explicit About What You're Selecting

-- Don't do this
SELECT * FROM orders JOIN users ON orders.user_id = users.id

-- Do this
SELECT
  orders.id,
  orders.total,
  orders.created_at,
  users.email,
  users.name
FROM orders
JOIN users ON orders.user_id = users.id

SELECT * couples your application to the table schema. When someone adds a column, your query silently returns more data. When you add a second table with a column named id, your SELECT * now returns two ambiguous id columns. Being explicit is self-documenting and prevents surprises.

Use CTEs to Build Up Complexity

Common Table Expressions (CTEs) let you name intermediate results and compose them, rather than nesting subqueries:

-- Hard to read: nested subqueries
SELECT * FROM (
  SELECT user_id, COUNT(*) as order_count FROM (
    SELECT * FROM orders WHERE status = 'completed'
  ) completed_orders
  GROUP BY user_id
) counts WHERE order_count >= 5

-- Easy to read: CTEs
WITH completed_orders AS (
  SELECT user_id, id
  FROM orders
  WHERE status = 'completed'
),
high_value_users AS (
  SELECT user_id, COUNT(*) AS order_count
  FROM completed_orders
  GROUP BY user_id
  HAVING COUNT(*) >= 5
)
SELECT
  users.email,
  high_value_users.order_count
FROM high_value_users
JOIN users ON users.id = high_value_users.user_id

CTEs read like a story. Each step is named and explicit. You can test each CTE in isolation by running it independently.

Understand NULL Before It Bites You

NULL is not a value, it's the absence of a value. Comparisons with NULL always return NULL, not true or false:

SELECT NULL = NULL   -- NULL (not TRUE)
SELECT NULL != NULL  -- NULL (not TRUE)
SELECT NULL IS NULL  -- TRUE

This trips people up in WHERE clauses:

-- This does NOT find rows where discount is NULL
SELECT * FROM orders WHERE discount != 0

-- This does
SELECT * FROM orders WHERE discount != 0 OR discount IS NULL

And in aggregations:

-- COUNT(*) counts all rows including NULLs
-- COUNT(column) counts only non-NULL values
SELECT
  COUNT(*) AS total_orders,
  COUNT(discount) AS orders_with_discount
FROM orders

Use COALESCE(value, default) to handle NULLs explicitly:

SELECT
  order_id,
  COALESCE(discount, 0) AS discount  -- treats NULL as 0
FROM orders

Avoid Functions on Indexed Columns in WHERE

Wrapping a column in a function in a WHERE clause often prevents the index from being used:

-- ❌ Can't use index on created_at, function call makes it a computed value
SELECT * FROM orders
WHERE EXTRACT(YEAR FROM created_at) = 2024

-- ✓ Range condition uses the index
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'

The rule: put transformations on the literal side, not the column side.

Use Window Functions Instead of Self-Joins

Window functions let you compute aggregate values across a set of rows while keeping each row:

-- Get each order and its rank within that customer's orders by date
SELECT
  order_id,
  customer_id,
  created_at,
  total,
  ROW_NUMBER() OVER (
    PARTITION BY customer_id
    ORDER BY created_at DESC
  ) AS order_rank
FROM orders

This returns every order with a order_rank of 1 for each customer's most recent order, 2 for the second most recent, etc. Without window functions, you'd need a self-join or subquery.

Common window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER (...), AVG() OVER (...).

Format Your Queries

Nobody wants to debug a 200-character single-line query. A formatting convention I use:

SELECT                        -- keywords capitalized
  o.id,                       -- one column per line
  o.total,
  u.email
FROM orders o                 -- aliases for shorter table names
JOIN users u ON u.id = o.user_id
WHERE
  o.status = 'completed'
  AND o.created_at >= '2024-01-01'
ORDER BY o.created_at DESC
LIMIT 100

Indentation, line breaks, and consistent keyword casing make queries scannable. SQL isn't compiled, it's read by people.

The Mindset Shift

SQL is a declarative language: you describe what you want, not how to get it. The query optimizer figures out the plan. Your job is to write queries that clearly express intent, give the optimizer enough information to use indexes efficiently, and remain readable when you come back to them in three months.

Treat SQL like any other code, it deserves the same care you'd give a function.