Week 51 - SQL Chips and Chunks Newsletter

1. SQL Chip: The Power of the "Pre-Summary"

Think of a FROM subquery (often called a derived table) as a temporary, specialized table you build on the fly to do one specific job before the rest of your query even starts. Instead of trying to calculate complex totals and join raw data all at once, you use a subquery to "pre-summarize" the math. This keeps your main query clean, readable, and—most importantly—accurate. By isolating the logic for things like "first interview date" or "total pass counts" into its own little bubble, you ensure the rest of your data doesn't get messy or duplicated.

2. SQL Chunk: Calculating Pass Rates by Stage

To understand where candidates are dropping out of the hiring funnel for roles like AI Data Quality Analyst, we first need to count how many interviews happened and how many resulted in a "pass".

SELECT 
    s.stage, 
    s.total_interviews,
    s.passes,
    CAST(s.passes AS decimal) / NULLIF(s.total_interviews, 0) AS pass_rate
FROM (
    /* This subquery summarizes the 'signal' before the main report */
    SELECT 
        stage, 
        COUNT(*) AS total_interviews,
        SUM(CASE WHEN outcome = 'pass' THEN 1 ELSE 0 END) AS passes
    FROM interviews
    GROUP BY stage
) AS s
ORDER BY pass_rate DESC;

3. SQL Challenge: Speed to First Interview

Now, let's take it up a notch. We want to know how long it takes for a candidate to get their first interview after applying. This requires finding the minimum interview date for every candidate first, then joining that back to the main candidate list to compare it against their application date.

SELECT 
    r.role_name,
    AVG(DATEDIFF(day, c.applied_date, fi.first_interview_date)) AS avg_days_to_start,
    COUNT(DISTINCT c.candidate_id) AS total_candidates
FROM candidates AS c
JOIN roles AS r 
    ON c.role_id = r.role_id
JOIN (
    /* Extracting the single earliest interview for every applicant */
    SELECT 
        candidate_id, 
        MIN(interview_date) AS first_interview_date
    FROM interviews
    GROUP BY candidate_id
) AS fi 
    ON c.candidate_id = fi.candidate_id
GROUP BY r.role_name
ORDER BY avg_days_to_start ASC;

4. SQL Mistake: The Granularity Multiplier

When working with hiring analytics, your results are only as good as your "grain"—the level of detail in each row. A frequent mistake occurs when writing a FROM subquery that isn't as "flat" as you think it is, leading to a "fan-out" effect when joined back to your main table.

The Error: You intend to get "one row per candidate" (e.g., their earliest interview date). However, if your subquery accidentally groups by candidate_id AND stage, it will return multiple rows for the same person if they progressed through several rounds. When you join this back to the candidates table, the candidate’s name and application date will silently duplicate for every matching row found in the subquery. This causes any COUNT or AVG in your final report to inflate, making it look like you have more candidates than you actually do.

The Fix: You must enforce the specific grain you want inside the derived table. By grouping strictly by the candidate_id and using an aggregate like MIN(interview_date), you ensure the subquery only passes exactly one row per person back to the main query.

/* THE FIX: Enforcing a single row per candidate key (MySQL Syntax) */
SELECT 
    c.full_name,
    /* MySQL DATEDIFF calculates (end_date - start_date) in days */
    DATEDIFF(first_i.earliest_interview, c.applied_date) AS days_to_first_contact
FROM candidates c
INNER JOIN (
    /* We isolate the logic here to ensure only ONE row per candidate_id */
    SELECT 
        candidate_id, 
        MIN(interview_date) AS earliest_interview
    FROM interviews
    GROUP BY candidate_id 
) AS first_i 
    ON c.candidate_id = first_i.candidate_id;

5. SQL in Practice: Interviewer Load and Consistency

In a high-volume environment like Innodata, managing the workload of interviewers for positions like Language Data Scientist is crucial. We need to identify interviewers who have handled at least 20 interviews and see their pass rate to ensure we aren't being too easy or too harsh.

SELECT 
    summary.interviewer,
    summary.total_load,
    summary.pass_percentage
FROM (
    SELECT 
        interviewer,
        COUNT(*) AS total_load,
        CAST(SUM(CASE WHEN outcome = 'pass' THEN 1 ELSE 0 END) AS decimal) 
            / NULLIF(COUNT(*), 0) AS pass_percentage
    FROM interviews
    GROUP BY interviewer
) AS summary
WHERE summary.total_load >= 20
ORDER BY summary.pass_percentage DESC;

6. SQL Resource: The Secret Interview Weapon

VerveCopilot: SQL Subquery Interview Guide – This is a strategic playbook for your next career move. In high-stakes data interviews, senior analysts look for candidates who can manipulate data architecture on the fly using FROM subqueries. This guide breaks down how to "modularize" complex business problems—like calculating time-to-hire or pass rates—into small, solvable steps. It also provides the exact language you need to explain your logic clearly to a hiring manager, helping you demonstrate that you’re ready to handle sophisticated data pipelines for companies like Innodata.