Week 30 - SQL Chips and Chunks Newsletter

SQL Activity Analytics: Finding Your Power Users

1. SQL Chip: Understanding Window Functions for Ranking

Window functions allow you to perform calculations across a set of table rows that are somehow related to the current row. Unlike GROUP BY, which collapses rows, window functions retain the individual rows, and the results of the calculation (like a rank or a running total) are returned for each row. They are perfect for comparative analysis, like ranking clients by usage. The key is the OVER() clause, which defines the 'window' or set of rows the function operates on.

2. SQL Chunk: Calculating Client Usage Rank

Imagine we have a table called client_usage_logs that tracks the monthly "Usage Score" (a combined metric of logins, CPU hours, and storage used) for various clients. This query uses the RANK() window function to assign a rank to each client based on their total usage score.

SELECT
    client_id,
    client_name,
    SUM(monthly_usage_score) AS total_usage,
    RANK() OVER (ORDER BY SUM(monthly_usage_score) DESC) AS usage_rank
FROM
    client_usage_logs
GROUP BY
    client_id, client_name
ORDER BY
    usage_rank
LIMIT 10;

3. SQL Challenge: Getting the Top Client for Each Service

What if IBM Cloud offers multiple services (e.g., 'Storage', 'AI', 'Compute'), and you want to find the single highest-usage client per service? This is a classic use case for the ROW_NUMBER() window function combined with partitioning. The PARTITION BY service_name resets the ranking for each unique service, and then the outer query selects only rank 1 for each partition.

SELECT
    client_id,
    client_name,
    service_name,
    total_service_usage
FROM (
    SELECT
        client_id,
        client_name,
        service_name,
        SUM(usage_hours) AS total_service_usage,
        ROW_NUMBER() OVER (PARTITION BY service_name ORDER BY SUM(usage_hours) DESC) AS service_rank
    FROM
        service_logs
    GROUP BY
        client_id, client_name, service_name
) AS ranked_services
WHERE
    service_rank = 1;

4. SQL Mistake: Misusing GROUP BY with Window Functions

A very common mistake when first learning window functions is attempting to use them directly with columns that are aggregated without being in the GROUP BY clause.

Incorrect Query (Illustrative Mistake):

SELECT
    client_id,
    service_name,
    MAX(login_count),
    ROW_NUMBER() OVER (PARTITION BY service_name ORDER BY MAX(login_count) DESC)
FROM
    client_logs
GROUP BY
    client_id; -- **Missing 'service_name' from GROUP BY!**

The GROUP BY clause must include all non-aggregated columns. When combining aggregation (MAX(), SUM()) and window functions, it's often best to perform the aggregation first (like in our Challenge example's inner query) and then apply the window function to those aggregated results. The window function then operates on the grouped data.

5. SQL in Practice: Telecommunications Network Analysis

In a telecommunications company, knowing the most active clients is vital for network planning and targeted sales. A high-activity client might be one running massive data transfers or using a lot of premium features. We can calculate their historical rolling average usage over the last three months to identify those with consistently high activity.

SELECT
    month,
    client_id,
    data_transfer_GB,
    AVG(data_transfer_GB) OVER (
        PARTITION BY client_id
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS three_month_rolling_avg_GB
FROM
    telecom_usage_records
ORDER BY
    client_id, month DESC;

This query shows the current month's usage alongside the 3-month average, instantly highlighting clients whose recent usage is trending up or down, which is valuable for account management.

6. SQL Resource

This week's focus shifts from pure querying to understanding the underlying architecture of your database, which is crucial for advanced topics like backup, recovery, and performance tuning.

Resource Name: MySQL Reference Manual: The MySQL Data Directory (Adjust link to the latest stable version's data directory section)

Hyperlink: https://dev.mysql.com/doc/refman/8.0/en/data-directory.html (The official MySQL 8.0 documentation on this topic)

How it Helps: For beginners focused on data careers, understanding where the data lives on the server is an important leap into database administration (DBA) concepts. The default location on most Linux systems is /var/lib/mysql. This official documentation breaks down the contents of this Data Directory (datadir), explaining the purpose of different files and subdirectories:

  • It clarifies the difference between files storing table definitions (.frm), MyISAM data (.MYD), and InnoDB tablespace files (like ibdata1 or individual .ibd files).

  • It teaches you how to find the exact location using the SQL command SELECT @@datadir;.

  • This advanced knowledge is essential for safely performing physical backups, troubleshooting service failures (by checking error logs), and managing disk space, making you a much more capable data professional.