7  Set Operations

7.1 Learning Objectives

By the end of this chapter, students will be able to:

  • Distinguish between joins, which combine tables horizontally, and set operations, which stack result sets vertically.
  • Write UNION, UNION ALL, INTERSECT, and EXCEPT queries to combine the results of two SELECT statements.
  • Explain why UNION removes duplicate rows while UNION ALL keeps them, and choose appropriately based on the task.
  • Identify the compatibility requirements for set operations: matching column count and compatible data types.
  • Apply set operations to answer questions about overlap, difference, and union across related datasets.

Joins combine tables horizontally, adding columns by matching rows on a key. Set operations work differently: they combine the results of two separate queries vertically, stacking one result set on top of another. The three set operations in SQL are UNION, INTERSECT, and EXCEPT, and each answers a different question about how two result sets relate to each other.

Operation Returns
UNION Rows that appear in either result set (duplicates removed)
UNION ALL Rows that appear in either result set (duplicates kept)
INTERSECT Rows that appear in both result sets
EXCEPT Rows that appear in the first result set but not the second

7.2 Requirements

For two queries to be combined with a set operation, their result sets must be union-compatible:

  1. Both queries must return the same number of columns.
  2. Corresponding columns must have compatible data types (both text, both numeric, and so on).
  3. Column names and data types in the final result are taken from the first query. If the second query uses different column names, those names are silently ignored.

These rules apply to all three operations. Violating any of them produces an error before any rows are returned.

7.3 UNION ALL

UNION ALL stacks two result sets together and returns every row from both, including duplicates. It is the simplest set operation conceptually: the result is everything on the left, followed by everything on the right.

The query below combines all low-income and lower-middle-income countries into a single list.

SELECT short_name, income_group
FROM countries
WHERE income_group = 'Low income'
  AND region != 'Aggregates'

UNION ALL

SELECT short_name, income_group
FROM countries
WHERE income_group = 'Lower middle income'
  AND region != 'Aggregates'

ORDER BY income_group, short_name;
Displaying records 1 - 25
short_name income_group
Afghanistan Low income
Burkina Faso Low income
Burundi Low income
Central African Republic Low income
Chad Low income
Congo, Dem. Rep. Low income
Eritrea Low income
Ethiopia Low income
Gambia, The Low income
Guinea Low income
Guinea-Bissau Low income
Korea, Dem. People’s Rep. Low income
Liberia Low income
Madagascar Low income
Malawi Low income
Mali Low income
Mozambique Low income
Niger Low income
Rwanda Low income
Sierra Leone Low income
Somalia Low income
South Sudan Low income
Sudan Low income
Syrian Arab Republic Low income
Togo Low income

The result has one row per country across both income groups. Because no country can belong to two income groups simultaneously, no duplicates exist here and UNION ALL and UNION would produce identical results. The next section uses an example where that is not the case.

UNION ALL is generally faster than UNION because it does no additional work to identify and remove duplicates. When you are certain the two result sets are disjoint, or when duplicates are acceptable, prefer UNION ALL.

7.4 UNION

UNION works exactly like UNION ALL but adds a deduplication step: any row that appears in both result sets is included only once in the final output.

To see the difference, consider the question: which countries appeared in the top ten by GDP per capita in both 2000 and 2022? UNION ALL gives every entry from both ranked lists, while UNION collapses the countries that appeared in both down to a single row.

(SELECT short_name
 FROM countries AS c
 INNER JOIN indicators AS i ON c.id = i.country_id
 WHERE i.year = 2000
   AND c.region != 'Aggregates'
   AND i.gdp_per_capita_usd IS NOT NULL
 ORDER BY i.gdp_per_capita_usd DESC
 LIMIT 10)

UNION ALL

(SELECT short_name
 FROM countries AS c
 INNER JOIN indicators AS i ON c.id = i.country_id
 WHERE i.year = 2022
   AND c.region != 'Aggregates'
   AND i.gdp_per_capita_usd IS NOT NULL
 ORDER BY i.gdp_per_capita_usd DESC
 LIMIT 10)

ORDER BY short_name;
20 records
short_name
Bermuda
Bermuda
Cayman Islands
Channel Islands
Ireland
Japan
Liechtenstein
Liechtenstein
Luxembourg
Luxembourg
Monaco
Monaco
Norway
Norway
Qatar
San Marino
Singapore
Switzerland
Switzerland
United States

Twenty rows come back, one for each entry in both top-ten lists. Countries that ranked in the top ten in both years appear twice: Monaco, Liechtenstein, Luxembourg, Bermuda, Norway, and Switzerland each have two rows.

Swapping UNION ALL for UNION removes those duplicates:

(SELECT short_name
 FROM countries AS c
 INNER JOIN indicators AS i ON c.id = i.country_id
 WHERE i.year = 2000
   AND c.region != 'Aggregates'
   AND i.gdp_per_capita_usd IS NOT NULL
 ORDER BY i.gdp_per_capita_usd DESC
 LIMIT 10)

UNION

(SELECT short_name
 FROM countries AS c
 INNER JOIN indicators AS i ON c.id = i.country_id
 WHERE i.year = 2022
   AND c.region != 'Aggregates'
   AND i.gdp_per_capita_usd IS NOT NULL
 ORDER BY i.gdp_per_capita_usd DESC
 LIMIT 10)

ORDER BY short_name;
14 records
short_name
Bermuda
Cayman Islands
Channel Islands
Ireland
Japan
Liechtenstein
Luxembourg
Monaco
Norway
Qatar
San Marino
Singapore
Switzerland
United States

Fourteen rows come back. The six countries that ranked in the top ten in both years are each counted once.

When each component query needs its own ORDER BY or LIMIT, wrap it in parentheses as shown above. A bare ORDER BY at the end applies to the combined result. The ordering and limiting applied inside each pair of parentheses controls which rows each component contributes before the two sets are combined.

7.5 INTERSECT

INTERSECT returns only the rows that appear in both result sets. It is the overlap.

The UNION example above showed which countries appeared in either top-ten list. INTERSECT answers the narrower question: which countries appeared in both? The threshold here is raised to the top twenty to produce a more interesting result.

(SELECT short_name
 FROM countries AS c
 INNER JOIN indicators AS i ON c.id = i.country_id
 WHERE i.year = 2000
   AND c.region != 'Aggregates'
   AND i.gdp_per_capita_usd IS NOT NULL
 ORDER BY i.gdp_per_capita_usd DESC
 LIMIT 20)

INTERSECT

(SELECT short_name
 FROM countries AS c
 INNER JOIN indicators AS i ON c.id = i.country_id
 WHERE i.year = 2022
   AND c.region != 'Aggregates'
   AND i.gdp_per_capita_usd IS NOT NULL
 ORDER BY i.gdp_per_capita_usd DESC
 LIMIT 20)

ORDER BY short_name;
14 records
short_name
Bermuda
Channel Islands
Denmark
Iceland
Ireland
Liechtenstein
Luxembourg
Monaco
Netherlands
Norway
Qatar
Sweden
Switzerland
United States

Fourteen countries appear in the top twenty by GDP per capita in both 2000 and 2022, representing a core group of economies that have ranked among the world’s wealthiest for at least two decades.

INTERSECT implicitly deduplicates, just like UNION. PostgreSQL also supports INTERSECT ALL, which preserves duplicates by keeping each row as many times as it appears in both result sets (the minimum count). INTERSECT ALL is rarely needed in practice.

7.6 EXCEPT

EXCEPT returns rows from the first result set that do not appear in the second. The order of the two queries matters: A EXCEPT B and B EXCEPT A return different results.

The indicators table contains data stretching back to 1960, but many countries did not exist as independent nations at the start of that period. The query below identifies countries that have forest area data for 2022 but not for 1990, a gap that reveals which nations were established after 1990 rather than which ones simply have incomplete records.

SELECT c.short_name, c.region
FROM countries AS c
INNER JOIN indicators AS i ON c.id = i.country_id
WHERE i.year = 2022
  AND i.pct_forest_area IS NOT NULL
  AND c.region != 'Aggregates'

EXCEPT

SELECT c.short_name, c.region
FROM countries AS c
INNER JOIN indicators AS i ON c.id = i.country_id
WHERE i.year = 1990
  AND i.pct_forest_area IS NOT NULL
  AND c.region != 'Aggregates'

ORDER BY region, short_name;
Displaying records 1 - 25
short_name region
Marshall Islands East Asia & Pacific
Micronesia, Fed. Sts. East Asia & Pacific
Northern Mariana Islands East Asia & Pacific
Palau East Asia & Pacific
Armenia Europe & Central Asia
Azerbaijan Europe & Central Asia
Belarus Europe & Central Asia
Belgium Europe & Central Asia
Bosnia and Herzegovina Europe & Central Asia
Croatia Europe & Central Asia
Czech Republic Europe & Central Asia
Estonia Europe & Central Asia
Georgia Europe & Central Asia
Kazakhstan Europe & Central Asia
Kyrgyz Republic Europe & Central Asia
Latvia Europe & Central Asia
Lithuania Europe & Central Asia
Luxembourg Europe & Central Asia
Moldova Europe & Central Asia
Montenegro Europe & Central Asia
North Macedonia Europe & Central Asia
Russian Federation Europe & Central Asia
Serbia Europe & Central Asia
Slovak Republic Europe & Central Asia
Slovenia Europe & Central Asia

The result is a list of countries that are present in the 2022 dataset but absent from 1990: mostly states that emerged from the dissolution of the Soviet Union and Yugoslavia, such as Armenia, Estonia, Kazakhstan, and Slovenia, along with a few others like South Sudan (independence: 2011) and Eritrea (independence: 1993). No data existed for them in 1990 because they were not yet sovereign countries.

Reversing the operands narrows the result considerably:

SELECT c.short_name, c.region
FROM countries AS c
INNER JOIN indicators AS i ON c.id = i.country_id
WHERE i.year = 1990
  AND i.pct_forest_area IS NOT NULL
  AND c.region != 'Aggregates'

EXCEPT

SELECT c.short_name, c.region
FROM countries AS c
INNER JOIN indicators AS i ON c.id = i.country_id
WHERE i.year = 2022
  AND i.pct_forest_area IS NOT NULL
  AND c.region != 'Aggregates'

ORDER BY region, short_name;
4 records
short_name region
Channel Islands Europe & Central Asia
Gibraltar Europe & Central Asia
Israel Middle East & North Africa
West Bank and Gaza Middle East & North Africa

Only a handful of countries had 1990 forest area data that is absent from 2022. This asymmetry is typical of EXCEPT: the result depends entirely on which query is listed first.

Like INTERSECT, EXCEPT implicitly deduplicates. PostgreSQL supports EXCEPT ALL to preserve duplicates, but it is seldom used.

7.7 Ordering the Combined Result

A single ORDER BY clause at the end of the entire statement sorts the combined result. It must reference columns by the names established in the first query, since those are the names the combined result carries.

SELECT short_name, income_group
FROM countries
WHERE income_group = 'Low income' AND region != 'Aggregates'

UNION ALL

SELECT short_name, income_group
FROM countries
WHERE income_group = 'Lower middle income' AND region != 'Aggregates'

ORDER BY income_group, short_name;

You cannot place ORDER BY at the end of the first query alone without parentheses: the database will raise a syntax error because ORDER BY is interpreted as applying to the entire statement, not just the first component. If you need each component sorted for its own LIMIT clause (as in the top-ten examples above), wrap that component in parentheses.

7.8 Chaining Set Operations

Multiple set operations can be chained in a single statement:

SELECT short_name FROM countries WHERE income_group = 'Low income'
UNION
SELECT short_name FROM countries WHERE income_group = 'Lower middle income'
EXCEPT
SELECT short_name FROM countries WHERE region = 'South Asia';

When mixing operation types, be aware that INTERSECT binds more tightly than UNION and EXCEPT, similar to how multiplication binds before addition. The expression A UNION B INTERSECT C is evaluated as A UNION (B INTERSECT C), not (A UNION B) INTERSECT C. Use parentheses to make the intended order explicit whenever you chain more than one type of set operation.

7.9 Exercises

7.9.1 Reflection

  1. Explain the difference between UNION and UNION ALL. When would you prefer one over the other?

  2. A colleague combines two queries with UNION and gets back fewer rows than expected. What could explain the lower row count?

  3. EXCEPT returns different results depending on which query is listed first. Why? Give an example using the worldbank data to illustrate.

  4. Why must both queries in a set operation return the same number of columns with compatible data types? What would go wrong if they did not?

7.9.2 Coding

  1. Write a query using UNION ALL that combines all high-income countries and all upper-middle-income countries into a single result set. Include the short_name and income_group columns. Sort alphabetically by short_name.

  2. Rewrite the query from exercise 5 using UNION instead of UNION ALL. Explain whether the row count changes and why.

  3. Using INTERSECT, find the countries that appear in both of the following groups: countries with 2022 population above 50 million, and countries with 2022 GDP per capita above 10,000 USD. Return short_name sorted alphabetically.

  4. Using EXCEPT, find countries that had GDP per capita data in 1990 but have no GDP per capita data in 2022. Return short_name and region, sorted by region then short_name.

  5. Write a query that returns the short_name of every country that ranks in the top 15 by population in 2022 or in the top 15 by GDP per capita in 2022, but not necessarily both. Use UNION to deduplicate the combined list. Sort by short_name.