Tally Count
Count occurrences of values in a column. The COUNT(*) GROUP BY template every SQL interview opens with.
What it is#
The tally count pattern is the simplest aggregation in SQL: count how many rows fall into each value of a grouping column. The skeleton is three clauses long.
SELECT col, COUNT(*) AS cntFROM tGROUP BY colORDER BY cnt DESC;Every other aggregation pattern in this workbook builds on it. The shape — a discrete grouping column, a count, an ordering — is what your eye should pattern-match on the moment an interviewer says “how many users have …” or “top 10 products by …”. It is also the first query most SQL learners write that produces something non-trivial, and the first that exposes the three subtly different counting functions: COUNT(*), COUNT(col), and COUNT(DISTINCT col).
-- Orders per customerSELECT customer_id, COUNT(*) AS ordersFROM ordersGROUP BY customer_idORDER BY orders DESCLIMIT 10;The query plan is almost always the same: a sort or hash on the grouping column, an aggregate, an order-by, a limit. On a B-tree index over the grouping column, an engine can skip the sort entirely and stream rows in order — see B-tree indexes for why.
When to use it#
Reach for a tally count when:
- You have a column with bounded cardinality (status codes, country, plan tier, day-of-week) and want a breakdown per value.
- You have a column with high cardinality (user_id, product_id) and want the top-N most common, with
ORDER BY cnt DESC LIMIT N. - You want a frequency histogram — counts paired with percentages of the total.
- The interviewer asks any of: “how many X per Y”, “top N Y by count of X”, “which Y has the most X”.
If instead you need to bucket a continuous column (age, price, latency), you want the Group Bucket pattern — same skeleton, but with a CASE expression in place of the bare column. If you need to count related rows from another table without joining, see Existence Check.
How it works#
The template has four named parts. Memorise them in this order:
- The grouping column in the
SELECTandGROUP BYclauses. Same expression on both sides. - The count expression —
COUNT(*),COUNT(col), orCOUNT(DISTINCT col). They are not interchangeable (see pitfalls below). - The filter, in either
WHERE(before grouping) orHAVING(after grouping). - The ordering and limit for top-N style answers.
SELECT status, COUNT(*) AS cntFROM ordersWHERE created_at >= DATE '2026-01-01' -- (3) pre-group filterGROUP BY status -- (1) groupingHAVING COUNT(*) > 100 -- (3) post-group filterORDER BY cnt DESC; -- (4) orderingWHERE vs HAVING. WHERE runs before aggregation and filters individual rows. HAVING runs after aggregation and filters whole groups. “Orders placed in 2026” is a WHERE; “statuses with more than 100 orders” is a HAVING. Mixing them up is one of the most common interview slips.
Counts with percentages. A tally is often more useful as a percentage of the total. Use a window function for the total:
SELECT status, COUNT(*) AS cnt, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pctFROM ordersGROUP BY statusORDER BY cnt DESC;The SUM(COUNT(*)) OVER () looks odd but is exactly right: the outer SUM runs after the GROUP BY, summing the per-group counts into a grand total. See Window functions for why the empty OVER () clause computes a single value across the whole result set.
Variants#
COUNT with a filter inside. Most engines support COUNT(*) FILTER (WHERE ...) for conditional counts. Standard SQL, supported in Postgres and SQLite; in MySQL and SQL Server use SUM(CASE WHEN ... THEN 1 ELSE 0 END).
-- Postgres: paid vs free conversions per planSELECT plan, COUNT(*) FILTER (WHERE status = 'paid') AS paid, COUNT(*) FILTER (WHERE status = 'free') AS freeFROM signupsGROUP BY plan;
-- MySQL / SQL Server: same query, portable formSELECT plan, SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) AS paid, SUM(CASE WHEN status = 'free' THEN 1 ELSE 0 END) AS freeFROM signupsGROUP BY plan;COUNT(DISTINCT …). Counts unique non-null values. Useful when each “thing” can have multiple rows.
-- Distinct customers who ordered todaySELECT DATE(created_at) AS day, COUNT(DISTINCT customer_id) AS unique_customers, COUNT(*) AS total_ordersFROM ordersWHERE created_at >= CURRENT_DATEGROUP BY DATE(created_at);COUNT(DISTINCT col) on a billion-row table is expensive — it must materialise the distinct set. For approximate counts at warehouse scale, BigQuery has APPROX_COUNT_DISTINCT, Snowflake has HLL, Postgres has the hll extension. All use HyperLogLog and are typically within 1–2% of the true value.
Multi-column grouping. Tally over a pair of columns to get a contingency table.
SELECT country, plan, COUNT(*) AS cntFROM usersGROUP BY country, planORDER BY country, cnt DESC;GROUPING SETS, ROLLUP, CUBE. Compute multiple aggregations in one query — including subtotals. Standard since SQL:1999, supported in Postgres, SQL Server, Oracle, BigQuery, Snowflake. MySQL has WITH ROLLUP but not full GROUPING SETS.
SELECT country, plan, COUNT(*) AS cntFROM usersGROUP BY ROLLUP (country, plan);-- Rows: every (country, plan) pair, plus (country, NULL) subtotals, plus (NULL, NULL) grand total.Counts per period. Often a tally is paired with a date truncation.
-- Daily signup countSELECT DATE_TRUNC('day', created_at) AS day, COUNT(*) AS signupsFROM usersGROUP BY 1ORDER BY 1;DATE_TRUNC is Postgres syntax. MySQL: DATE(created_at) for day, or DATE_FORMAT(created_at, '%Y-%m-01') for month. SQL Server: CAST(created_at AS DATE) or DATEFROMPARTS(YEAR(d), MONTH(d), 1). BigQuery: DATE_TRUNC(created_at, DAY).
If the date sequence has gaps (no signups on a given day), the tally will skip that day — you need the Patch Gaps pattern to fill it.
Common pitfalls#
The three COUNTs are not the same. Memorise this contrast.
COUNT(*) — every row.
Counts rows, ignoring whether columns are NULL. Always equal to the number of rows in the group.
SELECT COUNT(*) FROM t;-- Includes rows where every column is NULL.COUNT(col) — non-null values; COUNT(DISTINCT col) — unique non-null values.
COUNT(col) skips rows where col IS NULL. COUNT(DISTINCT col) skips nulls and duplicates.
-- Rows with a phone number recordedSELECT COUNT(phone) FROM users;-- Distinct phone numbersSELECT COUNT(DISTINCT phone) FROM users;When in doubt, use COUNT(*). It is unambiguous and the engine optimises it specially — it never needs to read column values.
Forgetting a WHERE filter on partitioned tables. A bare COUNT(*) FROM events on a fact table can scan billions of rows. Always include a time filter on event-style tables; the planner can prune partitions or use a B-tree index range scan.
Using HAVING for row-level filters. HAVING amount > 100 works in some engines because of relaxed semantics but is wrong in spirit and slower (filters after aggregation). Push pre-aggregation filters into WHERE.
Ordering by an alias in non-standard ways. Standard SQL allows ORDER BY cnt (the alias from SELECT) and ORDER BY 2 (the column position). Both are portable. ORDER BY COUNT(*) is also legal everywhere. Avoid ORDER BY with an expression that doesn’t appear in SELECT — it works but obscures intent.
Counting joined rows blows up. If you join orders to order_items and then COUNT(*) GROUP BY customer_id, you get items per customer, not orders per customer. Either count DISTINCT orders.id or aggregate before joining.
-- WRONG: counts items per customerSELECT o.customer_id, COUNT(*) AS ordersFROM orders oJOIN order_items i ON i.order_id = o.idGROUP BY o.customer_id;
-- RIGHT: distinct order idSELECT o.customer_id, COUNT(DISTINCT o.id) AS ordersFROM orders oJOIN order_items i ON i.order_id = o.idGROUP BY o.customer_id;NULL is its own group. GROUP BY status collapses all NULL statuses into a single group with status = NULL. That row shows up at the top or bottom of the ordering depending on engine defaults. Postgres puts NULLs last on ORDER BY ... ASC by default; MySQL puts them first. Add NULLS LAST (Postgres, Oracle) or wrap with COALESCE(status, 'unknown') for portability.
Practice problem#
You have an orders table:
CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, user_id BIGINT NOT NULL, status TEXT NOT NULL, amount NUMERIC NOT NULL, created_at TIMESTAMPTZ NOT NULL);Write a query that returns the top 10 users by number of orders placed, ordered descending, with ties broken by user_id ascending. Include the count and the percentage of all orders that user accounts for.
Solution
SELECT user_id, COUNT(*) AS orders, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct_of_totalFROM ordersGROUP BY user_idORDER BY orders DESC, user_id ASCLIMIT 10;Tally on user_id, percentage from the windowed grand total, double sort key for tie-breaking. The SUM(COUNT(*)) OVER () computes the total order count across the whole result set without a separate scan.
Related patterns#
- Group Bucket — same shape, but the grouping key is a
CASE WHENexpression over a continuous column. - Existence Check — when you want to count rows in
t1that have any matching row int2, not all of them. - Rolling Totals — running counts over an ordered window.
- Patch Gaps — fill missing dates in a daily tally with zeros.
- Aggregates, GROUP BY, HAVING — the building block this pattern is built on.
- Window functions — for
SUM() OVER ()and other windowed aggregates used in count-with-percentage.