Tally Count — DBMS · Engineering Playbook

What it is#

The tally count pattern is the simplest aggregation in SQL: count how many rows fall into each value of a grouping column. The skeleton is three clauses long.

SELECT   col, COUNT(*) AS cnt
FROM     t
GROUP BY col
ORDER BY cnt DESC;

Every other aggregation pattern in this workbook builds on it. The shape — a discrete grouping column, a count, an ordering — is what your eye should pattern-match on the moment an interviewer says “how many users have …” or “top 10 products by …”. It is also the first query most SQL learners write that produces something non-trivial, and the first that exposes the three subtly different counting functions: COUNT(*), COUNT(col), and COUNT(DISTINCT col).

-- Orders per customer
SELECT   customer_id, COUNT(*) AS orders
FROM     orders
GROUP BY customer_id
ORDER BY orders DESC
LIMIT    10;

The query plan is almost always the same: a sort or hash on the grouping column, an aggregate, an order-by, a limit. On a B-tree index over the grouping column, an engine can skip the sort entirely and stream rows in order — see B-tree indexes for why.

When to use it#

Reach for a tally count when:

You have a column with bounded cardinality (status codes, country, plan tier, day-of-week) and want a breakdown per value.
You have a column with high cardinality (user_id, product_id) and want the top-N most common, with ORDER BY cnt DESC LIMIT N.
You want a frequency histogram — counts paired with percentages of the total.
The interviewer asks any of: “how many X per Y”, “top N Y by count of X”, “which Y has the most X”.

If instead you need to bucket a continuous column (age, price, latency), you want the Group Bucket pattern — same skeleton, but with a CASE expression in place of the bare column. If you need to count related rows from another table without joining, see Existence Check.

How it works#

The template has four named parts. Memorise them in this order:

The grouping column in the SELECT and GROUP BY clauses. Same expression on both sides.
The count expression — COUNT(*), COUNT(col), or COUNT(DISTINCT col). They are not interchangeable (see pitfalls below).
The filter, in either WHERE (before grouping) or HAVING (after grouping).
The ordering and limit for top-N style answers.

SELECT   status, COUNT(*) AS cnt
FROM     orders
WHERE    created_at >= DATE '2026-01-01'   -- (3) pre-group filter
GROUP BY status                            -- (1) grouping
HAVING   COUNT(*) > 100                    -- (3) post-group filter
ORDER BY cnt DESC;                         -- (4) ordering

WHERE vs HAVING. WHERE runs before aggregation and filters individual rows. HAVING runs after aggregation and filters whole groups. “Orders placed in 2026” is a WHERE; “statuses with more than 100 orders” is a HAVING. Mixing them up is one of the most common interview slips.

Counts with percentages. A tally is often more useful as a percentage of the total. Use a window function for the total:

SELECT   status,
         COUNT(*) AS cnt,
         ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct
FROM     orders
GROUP BY status
ORDER BY cnt DESC;

The SUM(COUNT(*)) OVER () looks odd but is exactly right: the outer SUM runs after the GROUP BY, summing the per-group counts into a grand total. See Window functions for why the empty OVER () clause computes a single value across the whole result set.

Variants#

COUNT with a filter inside. Most engines support COUNT(*) FILTER (WHERE ...) for conditional counts. Standard SQL, supported in Postgres and SQLite; in MySQL and SQL Server use SUM(CASE WHEN ... THEN 1 ELSE 0 END).

-- Postgres: paid vs free conversions per plan
SELECT   plan,
         COUNT(*) FILTER (WHERE status = 'paid')  AS paid,
         COUNT(*) FILTER (WHERE status = 'free')  AS free
FROM     signups
GROUP BY plan;

-- MySQL / SQL Server: same query, portable form
SELECT   plan,
         SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) AS paid,
         SUM(CASE WHEN status = 'free' THEN 1 ELSE 0 END) AS free
FROM     signups
GROUP BY plan;

COUNT(DISTINCT …). Counts unique non-null values. Useful when each “thing” can have multiple rows.

-- Distinct customers who ordered today
SELECT   DATE(created_at) AS day,
         COUNT(DISTINCT customer_id) AS unique_customers,
         COUNT(*)                    AS total_orders
FROM     orders
WHERE    created_at >= CURRENT_DATE
GROUP BY DATE(created_at);

COUNT(DISTINCT col) on a billion-row table is expensive — it must materialise the distinct set. For approximate counts at warehouse scale, BigQuery has APPROX_COUNT_DISTINCT, Snowflake has HLL, Postgres has the hll extension. All use HyperLogLog and are typically within 1–2% of the true value.

Multi-column grouping. Tally over a pair of columns to get a contingency table.

SELECT   country, plan, COUNT(*) AS cnt
FROM     users
GROUP BY country, plan
ORDER BY country, cnt DESC;

GROUPING SETS, ROLLUP, CUBE. Compute multiple aggregations in one query — including subtotals. Standard since SQL:1999, supported in Postgres, SQL Server, Oracle, BigQuery, Snowflake. MySQL has WITH ROLLUP but not full GROUPING SETS.

SELECT   country, plan, COUNT(*) AS cnt
FROM     users
GROUP BY ROLLUP (country, plan);
-- Rows: every (country, plan) pair, plus (country, NULL) subtotals, plus (NULL, NULL) grand total.

Counts per period. Often a tally is paired with a date truncation.

-- Daily signup count
SELECT   DATE_TRUNC('day', created_at) AS day,
         COUNT(*) AS signups
FROM     users
GROUP BY 1
ORDER BY 1;

DATE_TRUNC is Postgres syntax. MySQL: DATE(created_at) for day, or DATE_FORMAT(created_at, '%Y-%m-01') for month. SQL Server: CAST(created_at AS DATE) or DATEFROMPARTS(YEAR(d), MONTH(d), 1). BigQuery: DATE_TRUNC(created_at, DAY).

If the date sequence has gaps (no signups on a given day), the tally will skip that day — you need the Patch Gaps pattern to fill it.

Common pitfalls#

The three COUNTs are not the same. Memorise this contrast.

COUNT(*) — every row.

Counts rows, ignoring whether columns are NULL. Always equal to the number of rows in the group.

SELECT COUNT(*) FROM t;
-- Includes rows where every column is NULL.

COUNT(col) — non-null values; COUNT(DISTINCT col) — unique non-null values.

COUNT(col) skips rows where col IS NULL. COUNT(DISTINCT col) skips nulls and duplicates.

-- Rows with a phone number recorded
SELECT COUNT(phone) FROM users;
-- Distinct phone numbers
SELECT COUNT(DISTINCT phone) FROM users;

When in doubt, use COUNT(*). It is unambiguous and the engine optimises it specially — it never needs to read column values.

Forgetting a WHERE filter on partitioned tables. A bare COUNT(*) FROM events on a fact table can scan billions of rows. Always include a time filter on event-style tables; the planner can prune partitions or use a B-tree index range scan.

Using HAVING for row-level filters. HAVING amount > 100 works in some engines because of relaxed semantics but is wrong in spirit and slower (filters after aggregation). Push pre-aggregation filters into WHERE.

Ordering by an alias in non-standard ways. Standard SQL allows ORDER BY cnt (the alias from SELECT) and ORDER BY 2 (the column position). Both are portable. ORDER BY COUNT(*) is also legal everywhere. Avoid ORDER BY with an expression that doesn’t appear in SELECT — it works but obscures intent.

Counting joined rows blows up. If you join orders to order_items and then COUNT(*) GROUP BY customer_id, you get items per customer, not orders per customer. Either count DISTINCT orders.id or aggregate before joining.

-- WRONG: counts items per customer
SELECT   o.customer_id, COUNT(*) AS orders
FROM     orders o
JOIN     order_items i ON i.order_id = o.id
GROUP BY o.customer_id;

-- RIGHT: distinct order id
SELECT   o.customer_id, COUNT(DISTINCT o.id) AS orders
FROM     orders o
JOIN     order_items i ON i.order_id = o.id
GROUP BY o.customer_id;

NULL is its own group. GROUP BY status collapses all NULL statuses into a single group with status = NULL. That row shows up at the top or bottom of the ordering depending on engine defaults. Postgres puts NULLs last on ORDER BY ... ASC by default; MySQL puts them first. Add NULLS LAST (Postgres, Oracle) or wrap with COALESCE(status, 'unknown') for portability.

Practice problem#

You have an orders table:

CREATE TABLE orders (
  id          BIGSERIAL PRIMARY KEY,
  user_id     BIGINT  NOT NULL,
  status      TEXT    NOT NULL,
  amount      NUMERIC NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL
);

Write a query that returns the top 10 users by number of orders placed, ordered descending, with ties broken by user_id ascending. Include the count and the percentage of all orders that user accounts for.

Solution

SELECT   user_id,
         COUNT(*) AS orders,
         ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct_of_total
FROM     orders
GROUP BY user_id
ORDER BY orders DESC, user_id ASC
LIMIT    10;

Tally on user_id, percentage from the windowed grand total, double sort key for tie-breaking. The SUM(COUNT(*)) OVER () computes the total order count across the whole result set without a separate scan.

Group Bucket — same shape, but the grouping key is a CASE WHEN expression over a continuous column.
Existence Check — when you want to count rows in t1 that have any matching row in t2, not all of them.
Rolling Totals — running counts over an ordered window.
Patch Gaps — fill missing dates in a daily tally with zeros.
Aggregates, GROUP BY, HAVING — the building block this pattern is built on.
Window functions — for SUM() OVER () and other windowed aggregates used in count-with-percentage.

What it is#

When to use it#

How it works#

Variants#

Common pitfalls#

Practice problem#

Related patterns#