Week 8 - SQL Chips and Chunks Newsletter

1. SQL Chip: Understanding the LAG() Function LAG() is a powerful SQL window function that retrieves the value from the previous row in a result set. It’s useful for comparing a current row’s data with a previous row’s data within the same partition.

Here’s an example that retrieves the current sales amount and the previous sales amount for each region:

SELECT 
    region, 
    sale_date, 
    sales_amount, 
    LAG(sales_amount) OVER (PARTITION BY region ORDER BY sale_date) AS prev_sales_amount
FROM sales;

In this query, LAG(sales_amount) fetches the sales amount from the previous row, allowing trend analysis and comparisons over time.

2. SQL Chunk: LAG() with PARTITION BY The PARTITION BY clause allows you to segment the data before applying LAG(), ensuring that comparisons are made within each group rather than across the entire dataset.

For example, let’s find the previous salary of an employee within each department:

SELECT 
    department, 
    name, 
    salary, 
    LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS prev_salary
FROM employees;

This query ranks employees within each department based on salary and retrieves the previous employee’s salary within the same department.

3. SQL Challenge: Identifying Sales Drops Challenge: Write a query that identifies when a product’s sales have decreased compared to the previous entry in the sales history.

Hint: Use LAG() to fetch the previous sales amount and compare it to the current sales amount using a subquery or a common table expression (CTE).

WITH SalesHistory AS (
    SELECT 
        product_id, 
        sale_date, 
        sales_amount, 
        LAG(sales_amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS prev_sales_amount
    FROM sales
)
SELECT * 
FROM SalesHistory
WHERE sales_amount < prev_sales_amount;

This query first calculates the previous sales amount using LAG() inside a CTE, then filters out rows where the current sales amount is lower than the previous one, helping detect sales drops.

4. Common SQL Mistake: Forgetting ORDER BY in LAG() A common mistake is using LAG() without specifying an ORDER BY clause, which can result in unpredictable results.

Incorrect:

SELECT 
    name, 
    salary, 
    LAG(salary) OVER (PARTITION BY department) AS prev_salary
FROM employees;

Correct:

SELECT 
    name, 
    salary, 
    LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS prev_salary
FROM employees;

Without ORDER BY, SQL doesn’t know how to determine the previous value, which may lead to incorrect results.

5. SQL in Practice: Tracking Customer Purchase Patterns LAG() is commonly used for customer behavior analysis. For example, tracking the previous purchase date of a customer:

SELECT 
    customer_id, 
    purchase_date, 
    LAG(purchase_date) OVER (PARTITION BY customer_id ORDER BY purchase_date) AS prev_purchase_date,
    DATEDIFF(purchase_date, LAG(purchase_date) OVER (PARTITION BY customer_id ORDER BY purchase_date)) AS days_between_purchases
FROM customer_purchases;

This query calculates the number of days between purchases, which is useful for analyzing customer retention and engagement.

6. SQL Resource: SQLPad - Interactive SQL Editor For those who want to practice SQL window functions like LAG() in a browser-based environment, SQLPad is a great free tool that allows users to run SQL queries interactively without needing to install a database locally. It provides an intuitive interface where users can write, execute, and analyze SQL queries on sample or connected datasets.

SQLPad supports multiple database connections, including PostgreSQL, MySQL, SQL Server, and others, making it a versatile tool for both beginners and experienced SQL users. It also offers visualization capabilities, allowing users to generate insights from their query results.

If you are learning window functions like LAG(), LEAD(), and ROW_NUMBER(), SQLPad is an excellent hands-on resource to test and refine your skills.

Check it out here: SQLPad