- SQL Chips and Chunks Newsletter
- Posts
- Week 25 - SQL Chips and Chunks Newsletter
Week 25 - SQL Chips and Chunks Newsletter
SQL Insights: The Power of COUNT and DISTINCT
1. SQL Chip: Counting Your Data with Precision
Understanding the number of records in your tables is one of the first and most critical steps in data analysis. The COUNT()
function is your go-to tool for this. It can give you a simple count of all rows or be used to count only the unique values within a column. The DISTINCT
keyword is the key to this second function, allowing you to filter out duplicates before the count is performed. Using these together gives you a powerful way to summarize your data's volume and uniqueness.
2. SQL Chunk: Analyzing Sales Records
Let's use a robust query to count transactions and unique customers from a sales
table. This shows the difference between counting all records and counting unique ones.
SELECT
product_category,
COUNT(*) AS total_transactions,
COUNT(DISTINCT customer_id) AS unique_customers_who_bought
FROM
sales
WHERE
transaction_date >= '2025-06-01'
GROUP BY
product_category
HAVING
COUNT(DISTINCT customer_id) > 10
ORDER BY
unique_customers_who_bought DESC;
This query first filters for recent sales. It then groups the results by product_category
and uses COUNT(*)
to get the total number of transactions in each category, while using COUNT(DISTINCT customer_id)
to get the number of unique customers who made those purchases. The HAVING
clause then filters to show only categories with more than 10 unique customers.
3. SQL Challenge: Conditional Counting with CASE
What if you needed to count unique customers who made a purchase on a specific day of the week? You can combine COUNT()
with a CASE
statement to perform conditional counting, a more advanced technique.
SELECT
product_category,
COUNT(DISTINCT CASE WHEN DAYOFWEEK(transaction_date) = 2 THEN customer_id ELSE NULL END) AS unique_monday_shoppers,
COUNT(DISTINCT customer_id) AS total_unique_customers
FROM
sales
WHERE
transaction_date BETWEEN '2025-06-01' AND '2025-06-30'
GROUP BY
product_category
ORDER BY
unique_monday_shoppers DESC
LIMIT 10;
This query uses COUNT(DISTINCT)
with a CASE
statement. The CASE
returns a customer_id
only if the transaction was on a Monday (Day of Week = 2), otherwise it returns NULL
. COUNT()
ignores NULL
values, so this effectively gives you a count of only unique Monday shoppers within each product category. (the DAYOFWEEK() function is MySQL-specific)
4. SQL Mistake: Misunderstanding COUNT(*)
vs. COUNT(column)
A common pitfall is to think COUNT(*)
and COUNT(column)
are interchangeable. They are not. COUNT(*)
is a special case that counts all rows, including those where a column might be NULL
. COUNT(column)
, however, will only count rows that have a non-NULL
value in the specified column.
Incorrect Example (may not get the full row count):
SELECT COUNT(email_address) FROM user_profiles;
This query would miss counting users who have a profile but no email address on file.
Correct Example (to count all profiles):
SELECT COUNT(*) FROM user_profiles;
This correctly returns the total number of rows (profiles), regardless of whether the email_address
is populated.
5. SQL in Practice: E-commerce Customer Analysis
In e-commerce, a business analyst often needs to understand customer reach. COUNT()
and DISTINCT
are perfect for this. You might want to analyze your customer base by country to see how many total orders you have versus your unique customer base.
-- Assume a table named 'ecom_orders'
SELECT
country,
COUNT(order_id) AS total_orders,
COUNT(DISTINCT customer_id) AS unique_customers,
COUNT(DISTINCT customer_id) * 100.0 / COUNT(order_id) AS uniqueness_ratio
FROM
ecom_orders
WHERE
order_date >= '2025-01-01'
GROUP BY
country
HAVING
COUNT(order_id) > 50
ORDER BY
unique_customers DESC
LIMIT 10;
This query shows a real-world use case by calculating not only the number of total orders and unique customers per country but also a "uniqueness ratio," which indicates how many orders are placed per unique customer. This can help a business understand if they are getting repeat business or attracting new customers in different regions.
6. SQL Resource:
For hands-on practice with COUNT()
and DISTINCT
, here are some great options:
SQLBolt: This is a fantastic place to start. Their interactive lessons—particularly Lessons 4-6, which cover filtering and aggregation—give you immediate feedback on your queries, making them perfect for reinforcing what we've covered.
W3Schools SQL Tutorial: A classic for a reason, this resource offers quick reference examples you can run in their online editor. It's excellent for reviewing syntax and seeing simple queries in action.