By the end of this chapter, students will be able to:
Distinguish between joins, which combine tables horizontally, and set operations, which stack result sets vertically.
WriteUNION, UNION ALL, INTERSECT, and EXCEPT queries to combine the results of two SELECT statements.
Explain why UNION removes duplicate rows while UNION ALL keeps them, and choose appropriately based on the task.
Identify the compatibility requirements for set operations: matching column count and compatible data types.
Apply set operations to answer questions about overlap, difference, and union across related datasets.
Joins combine tables horizontally, adding columns by matching rows on a key. Set operations work differently: they combine the results of two separate queries vertically, stacking one result set on top of another. The three set operations in SQL are UNION, INTERSECT, and EXCEPT, and each answers a different question about how two result sets relate to each other.
Operation
Returns
UNION
Rows that appear in either result set (duplicates removed)
UNION ALL
Rows that appear in either result set (duplicates kept)
INTERSECT
Rows that appear in both result sets
EXCEPT
Rows that appear in the first result set but not the second
7.2 Requirements
For two queries to be combined with a set operation, their result sets must be union-compatible:
Both queries must return the same number of columns.
Corresponding columns must have compatible data types (both text, both numeric, and so on).
Column names and data types in the final result are taken from the first query. If the second query uses different column names, those names are silently ignored.
These rules apply to all three operations. Violating any of them produces an error before any rows are returned.
7.3 UNION ALL
UNION ALL stacks two result sets together and returns every row from both, including duplicates. It is the simplest set operation conceptually: the result is everything on the left, followed by everything on the right.
The query below combines all low-income and lower-middle-income countries into a single list.
SELECT short_name, income_groupFROM countriesWHERE income_group ='Low income'AND region !='Aggregates'UNIONALLSELECT short_name, income_groupFROM countriesWHERE income_group ='Lower middle income'AND region !='Aggregates'ORDERBY income_group, short_name;
Displaying records 1 - 25
short_name
income_group
Afghanistan
Low income
Burkina Faso
Low income
Burundi
Low income
Central African Republic
Low income
Chad
Low income
Congo, Dem. Rep.
Low income
Eritrea
Low income
Ethiopia
Low income
Gambia, The
Low income
Guinea
Low income
Guinea-Bissau
Low income
Korea, Dem. People’s Rep.
Low income
Liberia
Low income
Madagascar
Low income
Malawi
Low income
Mali
Low income
Mozambique
Low income
Niger
Low income
Rwanda
Low income
Sierra Leone
Low income
Somalia
Low income
South Sudan
Low income
Sudan
Low income
Syrian Arab Republic
Low income
Togo
Low income
The result has one row per country across both income groups. Because no country can belong to two income groups simultaneously, no duplicates exist here and UNION ALL and UNION would produce identical results. The next section uses an example where that is not the case.
UNION ALL is generally faster than UNION because it does no additional work to identify and remove duplicates. When you are certain the two result sets are disjoint, or when duplicates are acceptable, prefer UNION ALL.
7.4 UNION
UNION works exactly like UNION ALL but adds a deduplication step: any row that appears in both result sets is included only once in the final output.
To see the difference, consider the question: which countries appeared in the top ten by GDP per capita in both 2000 and 2022? UNION ALL gives every entry from both ranked lists, while UNION collapses the countries that appeared in both down to a single row.
(SELECT short_nameFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=2000AND c.region !='Aggregates'AND i.gdp_per_capita_usd ISNOTNULLORDERBY i.gdp_per_capita_usd DESCLIMIT10)UNIONALL(SELECT short_nameFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=2022AND c.region !='Aggregates'AND i.gdp_per_capita_usd ISNOTNULLORDERBY i.gdp_per_capita_usd DESCLIMIT10)ORDERBY short_name;
20 records
short_name
Bermuda
Bermuda
Cayman Islands
Channel Islands
Ireland
Japan
Liechtenstein
Liechtenstein
Luxembourg
Luxembourg
Monaco
Monaco
Norway
Norway
Qatar
San Marino
Singapore
Switzerland
Switzerland
United States
Twenty rows come back, one for each entry in both top-ten lists. Countries that ranked in the top ten in both years appear twice: Monaco, Liechtenstein, Luxembourg, Bermuda, Norway, and Switzerland each have two rows.
Swapping UNION ALL for UNION removes those duplicates:
(SELECT short_nameFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=2000AND c.region !='Aggregates'AND i.gdp_per_capita_usd ISNOTNULLORDERBY i.gdp_per_capita_usd DESCLIMIT10)UNION(SELECT short_nameFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=2022AND c.region !='Aggregates'AND i.gdp_per_capita_usd ISNOTNULLORDERBY i.gdp_per_capita_usd DESCLIMIT10)ORDERBY short_name;
14 records
short_name
Bermuda
Cayman Islands
Channel Islands
Ireland
Japan
Liechtenstein
Luxembourg
Monaco
Norway
Qatar
San Marino
Singapore
Switzerland
United States
Fourteen rows come back. The six countries that ranked in the top ten in both years are each counted once.
When each component query needs its own ORDER BY or LIMIT, wrap it in parentheses as shown above. A bare ORDER BY at the end applies to the combined result. The ordering and limiting applied inside each pair of parentheses controls which rows each component contributes before the two sets are combined.
7.5 INTERSECT
INTERSECT returns only the rows that appear in both result sets. It is the overlap.
The UNION example above showed which countries appeared in either top-ten list. INTERSECT answers the narrower question: which countries appeared in both? The threshold here is raised to the top twenty to produce a more interesting result.
(SELECT short_nameFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=2000AND c.region !='Aggregates'AND i.gdp_per_capita_usd ISNOTNULLORDERBY i.gdp_per_capita_usd DESCLIMIT20)INTERSECT(SELECT short_nameFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=2022AND c.region !='Aggregates'AND i.gdp_per_capita_usd ISNOTNULLORDERBY i.gdp_per_capita_usd DESCLIMIT20)ORDERBY short_name;
14 records
short_name
Bermuda
Channel Islands
Denmark
Iceland
Ireland
Liechtenstein
Luxembourg
Monaco
Netherlands
Norway
Qatar
Sweden
Switzerland
United States
Fourteen countries appear in the top twenty by GDP per capita in both 2000 and 2022, representing a core group of economies that have ranked among the world’s wealthiest for at least two decades.
INTERSECT implicitly deduplicates, just like UNION. PostgreSQL also supports INTERSECT ALL, which preserves duplicates by keeping each row as many times as it appears in both result sets (the minimum count). INTERSECT ALL is rarely needed in practice.
7.6 EXCEPT
EXCEPT returns rows from the first result set that do not appear in the second. The order of the two queries matters: A EXCEPT B and B EXCEPT A return different results.
The indicators table contains data stretching back to 1960, but many countries did not exist as independent nations at the start of that period. The query below identifies countries that have forest area data for 2022 but not for 1990, a gap that reveals which nations were established after 1990 rather than which ones simply have incomplete records.
SELECT c.short_name, c.regionFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=2022AND i.pct_forest_area ISNOTNULLAND c.region !='Aggregates'EXCEPTSELECT c.short_name, c.regionFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=1990AND i.pct_forest_area ISNOTNULLAND c.region !='Aggregates'ORDERBY region, short_name;
Displaying records 1 - 25
short_name
region
Marshall Islands
East Asia & Pacific
Micronesia, Fed. Sts.
East Asia & Pacific
Northern Mariana Islands
East Asia & Pacific
Palau
East Asia & Pacific
Armenia
Europe & Central Asia
Azerbaijan
Europe & Central Asia
Belarus
Europe & Central Asia
Belgium
Europe & Central Asia
Bosnia and Herzegovina
Europe & Central Asia
Croatia
Europe & Central Asia
Czech Republic
Europe & Central Asia
Estonia
Europe & Central Asia
Georgia
Europe & Central Asia
Kazakhstan
Europe & Central Asia
Kyrgyz Republic
Europe & Central Asia
Latvia
Europe & Central Asia
Lithuania
Europe & Central Asia
Luxembourg
Europe & Central Asia
Moldova
Europe & Central Asia
Montenegro
Europe & Central Asia
North Macedonia
Europe & Central Asia
Russian Federation
Europe & Central Asia
Serbia
Europe & Central Asia
Slovak Republic
Europe & Central Asia
Slovenia
Europe & Central Asia
The result is a list of countries that are present in the 2022 dataset but absent from 1990: mostly states that emerged from the dissolution of the Soviet Union and Yugoslavia, such as Armenia, Estonia, Kazakhstan, and Slovenia, along with a few others like South Sudan (independence: 2011) and Eritrea (independence: 1993). No data existed for them in 1990 because they were not yet sovereign countries.
Reversing the operands narrows the result considerably:
SELECT c.short_name, c.regionFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=1990AND i.pct_forest_area ISNOTNULLAND c.region !='Aggregates'EXCEPTSELECT c.short_name, c.regionFROM countries AS cINNERJOIN indicators AS i ON c.id= i.country_idWHERE i.year=2022AND i.pct_forest_area ISNOTNULLAND c.region !='Aggregates'ORDERBY region, short_name;
4 records
short_name
region
Channel Islands
Europe & Central Asia
Gibraltar
Europe & Central Asia
Israel
Middle East & North Africa
West Bank and Gaza
Middle East & North Africa
Only a handful of countries had 1990 forest area data that is absent from 2022. This asymmetry is typical of EXCEPT: the result depends entirely on which query is listed first.
Like INTERSECT, EXCEPT implicitly deduplicates. PostgreSQL supports EXCEPT ALL to preserve duplicates, but it is seldom used.
7.7 Ordering the Combined Result
A single ORDER BY clause at the end of the entire statement sorts the combined result. It must reference columns by the names established in the first query, since those are the names the combined result carries.
SELECT short_name, income_groupFROM countriesWHERE income_group ='Low income'AND region !='Aggregates'UNIONALLSELECT short_name, income_groupFROM countriesWHERE income_group ='Lower middle income'AND region !='Aggregates'ORDERBY income_group, short_name;
You cannot place ORDER BY at the end of the first query alone without parentheses: the database will raise a syntax error because ORDER BY is interpreted as applying to the entire statement, not just the first component. If you need each component sorted for its own LIMIT clause (as in the top-ten examples above), wrap that component in parentheses.
7.8 Chaining Set Operations
Multiple set operations can be chained in a single statement:
SELECT short_name FROM countries WHERE income_group ='Low income'UNIONSELECT short_name FROM countries WHERE income_group ='Lower middle income'EXCEPTSELECT short_name FROM countries WHERE region ='South Asia';
When mixing operation types, be aware that INTERSECT binds more tightly than UNION and EXCEPT, similar to how multiplication binds before addition. The expression A UNION B INTERSECT C is evaluated as A UNION (B INTERSECT C), not (A UNION B) INTERSECT C. Use parentheses to make the intended order explicit whenever you chain more than one type of set operation.
7.9 Exercises
7.9.1 Reflection
Explain the difference between UNION and UNION ALL. When would you prefer one over the other?
A colleague combines two queries with UNION and gets back fewer rows than expected. What could explain the lower row count?
EXCEPT returns different results depending on which query is listed first. Why? Give an example using the worldbank data to illustrate.
Why must both queries in a set operation return the same number of columns with compatible data types? What would go wrong if they did not?
7.9.2 Coding
Write a query using UNION ALL that combines all high-income countries and all upper-middle-income countries into a single result set. Include the short_name and income_group columns. Sort alphabetically by short_name.
Rewrite the query from exercise 5 using UNION instead of UNION ALL. Explain whether the row count changes and why.
Using INTERSECT, find the countries that appear in both of the following groups: countries with 2022 population above 50 million, and countries with 2022 GDP per capita above 10,000 USD. Return short_name sorted alphabetically.
Using EXCEPT, find countries that had GDP per capita data in 1990 but have no GDP per capita data in 2022. Return short_name and region, sorted by region then short_name.
Write a query that returns the short_name of every country that ranks in the top 15 by population in 2022 or in the top 15 by GDP per capita in 2022, but not necessarily both. Use UNION to deduplicate the combined list. Sort by short_name.