Week 25 - SQL Chips and Chunks Newsletter

SQL Insights: The Power of COUNT and DISTINCT

1. SQL Chip: Counting Your Data with Precision

Understanding the number of records in your tables is one of the first and most critical steps in data analysis. The COUNT() function is your go-to tool for this. It can give you a simple count of all rows or be used to count only the unique values within a column. The DISTINCT keyword is the key to this second function, allowing you to filter out duplicates before the count is performed. Using these together gives you a powerful way to summarize your data's volume and uniqueness.

2. SQL Chunk: Analyzing Sales Records

Let's use a robust query to count transactions and unique customers from a sales table. This shows the difference between counting all records and counting unique ones.

SELECT
    product_category,
    COUNT(*) AS total_transactions,
    COUNT(DISTINCT customer_id) AS unique_customers_who_bought
FROM
    sales
WHERE
    transaction_date >= '2025-06-01'
GROUP BY
    product_category
HAVING
    COUNT(DISTINCT customer_id) > 10
ORDER BY
    unique_customers_who_bought DESC;

This query first filters for recent sales. It then groups the results by product_category and uses COUNT(*) to get the total number of transactions in each category, while using COUNT(DISTINCT customer_id) to get the number of unique customers who made those purchases. The HAVING clause then filters to show only categories with more than 10 unique customers.

3. SQL Challenge: Conditional Counting with CASE

What if you needed to count unique customers who made a purchase on a specific day of the week? You can combine COUNT() with a CASE statement to perform conditional counting, a more advanced technique.

SELECT
    product_category,
    COUNT(DISTINCT CASE WHEN DAYOFWEEK(transaction_date) = 2 THEN customer_id ELSE NULL END) AS unique_monday_shoppers,
    COUNT(DISTINCT customer_id) AS total_unique_customers
FROM
    sales
WHERE
    transaction_date BETWEEN '2025-06-01' AND '2025-06-30'
GROUP BY
    product_category
ORDER BY
    unique_monday_shoppers DESC
LIMIT 10;

This query uses COUNT(DISTINCT) with a CASE statement. The CASE returns a customer_id only if the transaction was on a Monday (Day of Week = 2), otherwise it returns NULL. COUNT() ignores NULL values, so this effectively gives you a count of only unique Monday shoppers within each product category. (the DAYOFWEEK() function is MySQL-specific)

4. SQL Mistake: Misunderstanding COUNT(*) vs. COUNT(column)

A common pitfall is to think COUNT(*) and COUNT(column) are interchangeable. They are not. COUNT(*) is a special case that counts all rows, including those where a column might be NULL. COUNT(column), however, will only count rows that have a non-NULL value in the specified column.

Incorrect Example (may not get the full row count):

SELECT COUNT(email_address) FROM user_profiles;

This query would miss counting users who have a profile but no email address on file.

Correct Example (to count all profiles):

SELECT COUNT(*) FROM user_profiles;

This correctly returns the total number of rows (profiles), regardless of whether the email_address is populated.

5. SQL in Practice: E-commerce Customer Analysis

In e-commerce, a business analyst often needs to understand customer reach. COUNT() and DISTINCT are perfect for this. You might want to analyze your customer base by country to see how many total orders you have versus your unique customer base.

-- Assume a table named 'ecom_orders'
SELECT
    country,
    COUNT(order_id) AS total_orders,
    COUNT(DISTINCT customer_id) AS unique_customers,
    COUNT(DISTINCT customer_id) * 100.0 / COUNT(order_id) AS uniqueness_ratio
FROM
    ecom_orders
WHERE
    order_date >= '2025-01-01'
GROUP BY
    country
HAVING
    COUNT(order_id) > 50
ORDER BY
    unique_customers DESC
LIMIT 10;

This query shows a real-world use case by calculating not only the number of total orders and unique customers per country but also a "uniqueness ratio," which indicates how many orders are placed per unique customer. This can help a business understand if they are getting repeat business or attracting new customers in different regions.

6. SQL Resource:

For hands-on practice with COUNT() and DISTINCT, here are some great options:

  • SQLBolt: This is a fantastic place to start. Their interactive lessons—particularly Lessons 4-6, which cover filtering and aggregation—give you immediate feedback on your queries, making them perfect for reinforcing what we've covered.

  • W3Schools SQL Tutorial: A classic for a reason, this resource offers quick reference examples you can run in their online editor. It's excellent for reviewing syntax and seeing simple queries in action.