Week 9 - SQL Chips and Chunks Newsletter

1. SQL Chip: Understanding the ROW_NUMBER() Function The ROW_NUMBER() function is a useful window function in SQL that assigns a unique sequential integer to each row within a partition. Unlike RANK() and DENSE_RANK(), ROW_NUMBER() does not allow duplicate rankings, ensuring each row gets a distinct value.

Here’s an example that assigns a row number to each employee based on their hire date within their department:

SELECT
department,
name,
hire_date,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY hire_date ASC) AS row_num
FROM employees;

This query assigns a sequential number to each employee in a department, starting from the earliest hire date.

2. SQL Chunk: ROW_NUMBER() with PARTITION BY The PARTITION BY clause is key when using ROW_NUMBER() because it resets the numbering within each partition, rather than numbering the entire dataset.

For example, let’s assign a row number to each sales transaction within each region:

SELECT
region,
sale_date,
sales_amount,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY sale_date DESC) AS row_num
FROM sales;

This query assigns a unique row number for sales within each region, ordering transactions from the latest sale to the earliest.

3. SQL Challenge: Identifying the First Purchase of Each Customer Challenge: Write a query to find the first recorded purchase of each customer using ROW_NUMBER().

Hint: Use PARTITION BY customer_id and ORDER BY purchase_date, and filter using a common table expression (CTE).

WITH RankedPurchases AS (
 SELECT
customer_id,
purchase_date,
sales_amount,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date ASC) AS purchase_rank
 FROM customer_purchases
)
SELECT customer_id, purchase_date, sales_amount
FROM RankedPurchases
WHERE purchase_rank = 1;

This query first assigns a row number to each purchase per customer, then filters for rows where the row number is 1, ensuring we get only the first purchase per customer.

4. Common SQL Mistake: Using ROW_NUMBER() Without ORDER BY A common mistake is forgetting to include an ORDER BY clause in ROW_NUMBER(), which leads to arbitrary numbering.

Incorrect:

SELECT
name,
salary,
 ROW_NUMBER() OVER (PARTITION BY department) AS row_num
FROM employees;

Correct:

SELECT
name,
salary,
 ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

Without ORDER BY, SQL has no clear logic for assigning row numbers, leading to inconsistent results.

5. SQL in Practice: Removing Duplicate Records ROW_NUMBER() is commonly used to remove duplicate records by selecting only the first occurrence of each entry.

WITH RankedEmployees AS (
 SELECT
employee_id,
name,
department,
salary,
 ROW_NUMBER() OVER (PARTITION BY name, department ORDER BY salary DESC) AS row_num
 FROM employees
)
SELECT * FROM RankedEmployees WHERE row_num = 1;

This query helps eliminate duplicates by selecting only the first occurrence of an employee based on salary within their department.

6. SQL Resource: DB Fiddle - Online SQL Sandbox If you want to test ROW_NUMBER() and other SQL functions without setting up a local database, DB Fiddle is an excellent free tool. It allows users to write and run SQL queries in different database systems, including PostgreSQL, MySQL, and SQL Server.

DB Fiddle is especially useful for experimenting with window functions like ROW_NUMBER(), RANK(), and DENSE_RANK() in a hands-on environment.

Try it out here: DB Fiddle