- SQL Chips and Chunks Newsletter
- Posts
- Week 13 - SQL Chips and Chunks Newsletter
Week 13 - SQL Chips and Chunks Newsletter
1. SQL Chip: Understanding UNION
and UNION ALL
Takeaway: Both UNION
and UNION ALL
combine the results of two or more SELECT
statements into a single result set. The key difference is that UNION
automatically removes duplicate rows, while UNION ALL
keeps all rows, including duplicates. Use UNION ALL
when you want all data, and UNION
when you need a distinct list.
2. SQL Chunk: Combining Data with UNION ALL
To see this in action, let's imagine we have an Employees
table that stores information about both current and former employees, distinguished by whether their end_date
is NULL
. We want to get a single list of all employees and their status
SELECT employee_id, first_name, last_name, 'Current Employee' AS employment_status
FROM Employees
WHERE end_date IS NULL
UNION ALL
SELECT employee_id, first_name, last_name, 'Former Employee' AS employment_status
FROM Employees
WHERE end_date IS NOT NULL;
3. SQL Challenge: Filtering Combined Results with UNION
and a WHERE
Clause
Let's say we want to see a distinct list of all employees (both current and former) who have a 'Developer' role. This combines the UNION
concept with a WHERE
clause for filtering before the union operation.
SELECT employee_id, first_name, last_name, 'Current Employee' AS employment_status
FROM Employees
WHERE end_date IS NULL AND role = 'Developer'
UNION
SELECT employee_id, first_name, last_name, 'Former Employee' AS employment_status
FROM Employees
WHERE end_date IS NOT NULL AND role = 'Developer';
4. SQL Mistake: Misunderstanding UNION
vs. UNION ALL
Performance
A common mistake beginners make is using UNION
when UNION ALL
would be more appropriate, especially for very large datasets. Because UNION
has to do extra work to identify and remove duplicate rows, it can be significantly slower than UNION ALL
when duplicates aren't a concern. Always consider if you truly need to eliminate duplicates before choosing UNION
. If not, UNION ALL
is the more efficient choice.
Imagine you're working for a social media company. You have two tables: DailyPosts
(for current day's posts) and ArchivedPosts
(for older posts). You want to see a combined list of all posts that received more than 100 likes, regardless of whether they are recent or archived, including any duplicate post IDs if they somehow appear in both (though typically they wouldn't; for this example, we'll use UNION ALL
to illustrate keeping all entries).
SELECT post_id, post_text, likes, comments
FROM DailyPosts
WHERE likes > 100
UNION ALL
SELECT post_id, post_text, likes, comments
FROM ArchivedPosts
WHERE likes > 100;
6. SQL Resource: SQLBolt
Name: SQLBolt Link: https://sqlbolt.com/ Description: SQLBolt is an excellent, free, and interactive website designed specifically for beginners. It provides a series of well-structured lessons with clear explanations and immediate hands-on exercises right in your browser. You get instant feedback on your queries, making it a fantastic tool for quickly grasping fundamental SQL concepts like SELECT
, WHERE
, GROUP BY
, ORDER BY
, and joins without needing to set up any software. It's concise and to the point, perfect for building a solid foundation.