Week 13 - SQL Chips and Chunks Newsletter

1. SQL Chip: Understanding UNION and UNION ALL

Takeaway: Both UNION and UNION ALL combine the results of two or more SELECT statements into a single result set. The key difference is that UNION automatically removes duplicate rows, while UNION ALL keeps all rows, including duplicates. Use UNION ALL when you want all data, and UNION when you need a distinct list.

2. SQL Chunk: Combining Data with UNION ALL

To see this in action, let's imagine we have an Employees table that stores information about both current and former employees, distinguished by whether their end_date is NULL. We want to get a single list of all employees and their status

SELECT employee_id, first_name, last_name, 'Current Employee' AS employment_status
FROM Employees
WHERE end_date IS NULL

UNION ALL

SELECT employee_id, first_name, last_name, 'Former Employee' AS employment_status
FROM Employees
WHERE end_date IS NOT NULL;

3. SQL Challenge: Filtering Combined Results with UNION and a WHERE Clause

Let's say we want to see a distinct list of all employees (both current and former) who have a 'Developer' role. This combines the UNION concept with a WHERE clause for filtering before the union operation.

SELECT employee_id, first_name, last_name, 'Current Employee' AS employment_status
FROM Employees
WHERE end_date IS NULL AND role = 'Developer'

UNION

SELECT employee_id, first_name, last_name, 'Former Employee' AS employment_status
FROM Employees
WHERE end_date IS NOT NULL AND role = 'Developer';

4. SQL Mistake: Misunderstanding UNION vs. UNION ALL Performance

A common mistake beginners make is using UNION when UNION ALL would be more appropriate, especially for very large datasets. Because UNION has to do extra work to identify and remove duplicate rows, it can be significantly slower than UNION ALL when duplicates aren't a concern. Always consider if you truly need to eliminate duplicates before choosing UNION. If not, UNION ALL is the more efficient choice.

5. SQL Practice: Analyzing Social Media Post Engagement

Imagine you're working for a social media company. You have two tables: DailyPosts (for current day's posts) and ArchivedPosts (for older posts). You want to see a combined list of all posts that received more than 100 likes, regardless of whether they are recent or archived, including any duplicate post IDs if they somehow appear in both (though typically they wouldn't; for this example, we'll use UNION ALL to illustrate keeping all entries).

SELECT post_id, post_text, likes, comments
FROM DailyPosts
WHERE likes > 100

UNION ALL

SELECT post_id, post_text, likes, comments
FROM ArchivedPosts
WHERE likes > 100;

6. SQL Resource: SQLBolt

Name: SQLBolt Link: https://sqlbolt.com/ Description: SQLBolt is an excellent, free, and interactive website designed specifically for beginners. It provides a series of well-structured lessons with clear explanations and immediate hands-on exercises right in your browser. You get instant feedback on your queries, making it a fantastic tool for quickly grasping fundamental SQL concepts like SELECT, WHERE, GROUP BY, ORDER BY, and joins without needing to set up any software. It's concise and to the point, perfect for building a solid foundation.