In Development: This textbook is currently undergoing development and should not be used as an authoritative source of... well, ANYTHING.

3 SELECT Statements

3.1 Learning Objectives

By the end of this chapter, students will be able to:

Write a SELECT statement that retrieves specific columns from a table using SELECT ... FROM.
Apply column aliases using AS to rename columns in query output.
Construct text concatenation expressions using the || operator to combine column values and literal strings.
Write arithmetic expressions in SELECT to compute new values from existing numeric columns.
Apply DISTINCT to return only unique values from a column or combination of columns.
Use COUNT(*), COUNT(column), and COUNT(DISTINCT column) to count rows and measure data completeness.
Apply WHERE with comparison operators (=, <>, <, >, BETWEEN, IN, LIKE, IS NULL) to filter rows.
Combine filter conditions using AND, OR, and NOT, applying parentheses to control evaluation order.
Explain why WHERE column = NULL always returns zero rows and use IS NULL instead.
Describe the NOT IN / NULL trap and explain why a single NULL in the list causes no rows to be returned.
Sort query results using ORDER BY with ASC and DESC across one or more columns.
Use LIMIT to restrict the number of rows returned and explain its interaction with ORDER BY.
Describe the logical order in which SQL clauses are evaluated and explain why column aliases cannot be referenced in WHERE.

Now that your database environment is up and running, it is time to write some SQL. The most fundamental operation in SQL is the query: a question you ask the database, written in a structured way that the database engine can understand and answer. In this chapter, you will write your first queries against the actors database that ships with the course Docker image.

The actors table contains one row per actor, covering 200 entries drawn from film history. Each row records the actor’s name, birth name (for those who used a stage name), sex, birth and death dates, country of birth, height, Oscar nominations and wins, and primary film genre. It is a rich dataset for learning SQL because it mixes text columns, fixed-length character fields, whole numbers, and dates in a single table.

By the end of this chapter, you will know how to retrieve data from a table, choose which columns to display, compute new values from existing ones, filter rows by any condition you can describe, sort the results, and cap how many rows are returned.

3.2 The SELECT and FROM Clauses

Every query that retrieves data starts with a SELECT statement. The two required pieces are:

SELECT specifies which columns you want in the result.
FROM tells the database which table to look in.

3.2.1 Selecting All Columns

The quickest way to see everything a table contains is to use an asterisk (*) in place of column names. The asterisk is shorthand for “every column, in the order they are defined.”

SELECT *
FROM actors
LIMIT 5;

5 records
actor_id	first_name	last_name	birth_name	sex	birth_date	death_date	birth_country	height_cm	oscar_nominations	oscar_wins	primary_genre	has_honorary_oscar	notable_role	total_box_office_usd
1	Tom	Hanks	NULL	M	1956-07-09	NULL	USA	183	5	2	Drama	FALSE	Forrest Gump	4.8e+09
2	Meryl	Streep	NULL	F	1949-06-22	NULL	USA	168	21	3	Drama	FALSE	The Devil Wears Prada	2.1e+09
3	Cate	Blanchett	NULL	F	1969-05-14	NULL	Australia	174	8	2	Drama	FALSE	Elizabeth	5.1e+09
4	Marlon	Brando	NULL	M	1924-04-03	2004-07-01	USA	175	8	2	Drama	FALSE	The Godfather	1.6e+09
5	Audrey	Hepburn	Audrey Kathleen Ruston	F	1929-05-04	1993-01-20	UK	170	5	1	Romance	TRUE	Breakfast at Tiffany’s	9.2e+08

The LIMIT 5 at the end restricts the output to five rows. Without it, the query would return all 200 rows in the table. This is similar to the head() function in R or Python. You will learn more about LIMIT later in this chapter, but getting into the habit of adding it while exploring an unfamiliar table is good practice.

3.2.2 Selecting Specific Columns

In most real situations, you will not want every column. Listing only the columns you need makes results easier to read and faster to retrieve from large tables.

SELECT first_name, last_name, birth_country, primary_genre
FROM actors;

Displaying records 1 - 25
first_name	last_name	birth_country	primary_genre
Tom	Hanks	USA	Drama
Meryl	Streep	USA	Drama
Cate	Blanchett	Australia	Drama
Marlon	Brando	USA	Drama
Audrey	Hepburn	UK	Romance
Humphrey	Bogart	USA	Drama
Katharine	Hepburn	USA	Drama
Jack	Nicholson	USA	Drama
Denzel	Washington	USA	Drama
Ingrid	Bergman	Sweden	Drama
James	Stewart	USA	Drama
Bette	Davis	USA	Drama
Sidney	Poitier	USA	Drama
Jane	Fonda	USA	Drama
Al	Pacino	USA	Drama
Robert	De Niro	USA	Drama
Sophia	Loren	Italy	Drama
Marcello	Mastroianni	Italy	Drama
Jeanne	Moreau	France	Drama
Jean-Paul	Belmondo	France	Action
Catherine	Deneuve	France	Drama
Isabelle	Huppert	France	Drama
Gérard	Depardieu	France	Drama
Alec	Guinness	UK	Drama
Peter	Sellers	UK	Comedy

Column names are listed after SELECT, separated by commas. The database returns them in exactly the order you specify, regardless of how they are arranged inside the table.

Relying on SELECT * in production code is considered poor practice because it breaks silently when someone adds, removes, or reorders columns in the table. Naming your columns explicitly makes queries more predictable and easier to maintain.

3.3 Shaping Your Output

Beyond choosing which columns to include, SQL lets you rename columns, combine text from multiple columns, and compute new values on the fly. None of these operations modify the data in the table; they only affect what is shown in the result.

3.3.1 Column Aliases

By default, a column in your result inherits the name it has in the table. You can override this with the AS keyword, which assigns an alias to a column. Aliases are useful for giving computed values a meaningful name, shortening long column names, or improving readability.

SELECT
    first_name        AS given_name,
    last_name         AS surname,
    birth_country     AS country,
    primary_genre     AS genre,
    oscar_nominations AS nominations
FROM actors;

Displaying records 1 - 25
given_name	surname	country	genre	nominations
Tom	Hanks	USA	Drama	5
Meryl	Streep	USA	Drama	21
Cate	Blanchett	Australia	Drama	8
Marlon	Brando	USA	Drama	8
Audrey	Hepburn	UK	Romance	5
Humphrey	Bogart	USA	Drama	3
Katharine	Hepburn	USA	Drama	12
Jack	Nicholson	USA	Drama	12
Denzel	Washington	USA	Drama	9
Ingrid	Bergman	Sweden	Drama	7
James	Stewart	USA	Drama	5
Bette	Davis	USA	Drama	10
Sidney	Poitier	USA	Drama	2
Jane	Fonda	USA	Drama	7
Al	Pacino	USA	Drama	9
Robert	De Niro	USA	Drama	8
Sophia	Loren	Italy	Drama	5
Marcello	Mastroianni	Italy	Drama	3
Jeanne	Moreau	France	Drama	0
Jean-Paul	Belmondo	France	Action	0
Catherine	Deneuve	France	Drama	2
Isabelle	Huppert	France	Drama	2
Gérard	Depardieu	France	Drama	1
Alec	Guinness	UK	Drama	5
Peter	Sellers	UK	Comedy	3

The AS keyword is optional in PostgreSQL; you can write the alias directly after the column name without it. Using AS explicitly is recommended, however, because it makes the intent of the query immediately clear to anyone reading it.

3.3.2 Combining Text Columns

PostgreSQL uses the || operator to concatenate (join) strings together. This lets you build a single display value out of multiple columns, or add literal text between column values.

SELECT
    first_name || ' ' || last_name AS full_name,
    birth_country,
    primary_genre
FROM actors;

Displaying records 1 - 25
full_name	birth_country	primary_genre
Tom Hanks	USA	Drama
Meryl Streep	USA	Drama
Cate Blanchett	Australia	Drama
Marlon Brando	USA	Drama
Audrey Hepburn	UK	Romance
Humphrey Bogart	USA	Drama
Katharine Hepburn	USA	Drama
Jack Nicholson	USA	Drama
Denzel Washington	USA	Drama
Ingrid Bergman	Sweden	Drama
James Stewart	USA	Drama
Bette Davis	USA	Drama
Sidney Poitier	USA	Drama
Jane Fonda	USA	Drama
Al Pacino	USA	Drama
Robert De Niro	USA	Drama
Sophia Loren	Italy	Drama
Marcello Mastroianni	Italy	Drama
Jeanne Moreau	France	Drama
Jean-Paul Belmondo	France	Action
Catherine Deneuve	France	Drama
Isabelle Huppert	France	Drama
Gérard Depardieu	France	Drama
Alec Guinness	UK	Drama
Peter Sellers	UK	Comedy

You can add as many literal strings as you like. Here, a comma and space are inserted between the last name and the birth country to produce a compact bio label:

SELECT
    first_name || ' ' || last_name || ', ' || birth_country AS bio_label,
    primary_genre
FROM actors;

Displaying records 1 - 25
bio_label	primary_genre
Tom Hanks, USA	Drama
Meryl Streep, USA	Drama
Cate Blanchett, Australia	Drama
Marlon Brando, USA	Drama
Audrey Hepburn, UK	Romance
Humphrey Bogart, USA	Drama
Katharine Hepburn, USA	Drama
Jack Nicholson, USA	Drama
Denzel Washington, USA	Drama
Ingrid Bergman, Sweden	Drama
James Stewart, USA	Drama
Bette Davis, USA	Drama
Sidney Poitier, USA	Drama
Jane Fonda, USA	Drama
Al Pacino, USA	Drama
Robert De Niro, USA	Drama
Sophia Loren, Italy	Drama
Marcello Mastroianni, Italy	Drama
Jeanne Moreau, France	Drama
Jean-Paul Belmondo, France	Action
Catherine Deneuve, France	Drama
Isabelle Huppert, France	Drama
Gérard Depardieu, France	Drama
Alec Guinness, UK	Drama
Peter Sellers, UK	Comedy

If any value being concatenated is NULL, the entire result of || becomes NULL. The actors table has a birth_name column that is NULL for most actors. If you concatenated birth_name into a label without accounting for that, those rows would produce a NULL result. The COALESCE function, covered in a later chapter, handles this situation cleanly.

3.3.3 Arithmetic Operations

SQL treats numeric columns like variables in an algebraic expression. You can add, subtract, multiply, and divide columns and constants using standard arithmetic operators (+, -, *, /).

As an example, height_cm stores each actor’s height in centimeters. You can convert that to inches by dividing by 2.54:

SELECT
    first_name || ' ' || last_name AS full_name,
    height_cm,
    ROUND(height_cm / 2.54::numeric, 1) AS height_in
FROM actors
ORDER BY height_cm DESC
LIMIT 10;

10 records
full_name	height_cm	height_in
Max von Sydow	196	77.2
Clint Eastwood	193	76.0
Gregory Peck	191	75.2
Charlton Heston	191	75.2
Chris Hemsworth	190	74.8
James Stewart	188	74.0
Idris Elba	188	74.0
Sidney Poitier	188	74.0
Hugh Jackman	188	74.0
Henry Fonda	188	74.0

ROUND(value, 1) rounds the result to one decimal place. The ::numeric cast converts the 2.54 literal to a decimal type before dividing, which ensures a decimal result rather than integer division.

3.4 Exploring Data with DISTINCT and COUNT

Two features of the SELECT clause are especially useful when you first sit down with an unfamiliar database: DISTINCT, which removes duplicate values from your results, and COUNT, which tallies rows. Together they let you quickly understand what a table contains and how its data is distributed before you start filtering or transforming it.

3.4.1 DISTINCT

Adding the DISTINCT keyword immediately after SELECT tells the database to return only unique combinations of the specified columns, removing any duplicate rows from the result.

A common use is to see every unique value that appears in a categorical column. The actors table has 200 rows but only eight distinct primary genres. Without DISTINCT, selecting the primary_genre column would return 200 rows with many repeated values. With it, you get the eight unique values once each:

SELECT DISTINCT primary_genre
FROM actors
ORDER BY primary_genre;

8 records
primary_genre
Action
Comedy
Drama
Horror
Musical
Romance
Sci-Fi
Thriller

DISTINCT applies to the full combination of columns listed, not just the first one. If you write SELECT DISTINCT primary_genre, birth_country, the result contains each unique pairing of those two values rather than unique genres alone.

SELECT DISTINCT is useful for a quick look at what values exist in a column, but it is not a substitute for proper data profiling. For large tables, counting how many rows correspond to each distinct value tells you much more than the list of values alone. GROUP BY, covered in a later chapter, is the right tool for that.

3.4.2 COUNT

COUNT is an aggregate function: rather than returning one row per input row, it collapses results into a single summary value. The most common form, COUNT(*), returns the total number of rows in the result.

SELECT COUNT(*) AS total_actors
FROM actors;

1 records
total_actors
200

Paired with a WHERE clause, COUNT(*) counts only the rows that satisfy the filter:

SELECT COUNT(*) AS drama_actors
FROM actors
WHERE primary_genre = 'Drama';

1 records
drama_actors
135

COUNT(column_name), with a specific column instead of *, counts only rows where that column is not NULL. This makes it a convenient way to measure how complete your data is:

SELECT
    COUNT(*)          AS total_rows,
    COUNT(birth_name) AS rows_with_birth_name,
    COUNT(death_date) AS rows_with_death_date
FROM actors;

1 records
total_rows	rows_with_birth_name	rows_with_death_date
200	70	82

The birth_name column records an actor’s legal name when they perform under a stage name. The result shows that only 70 of the 200 actors have a recorded birth name, meaning 130 either perform under their real name or have no birth name on record. Similarly, death_date is NULL for living actors, so COUNT(death_date) reveals how many actors in the table are deceased.

COUNT(DISTINCT column) combines both ideas, counting how many unique values appear in a column:

SELECT COUNT(DISTINCT birth_country) AS num_countries
FROM actors;

1 records
num_countries
19

COUNT belongs to a family of aggregate functions that summarize groups of rows into a single value. Others include SUM(), AVG(), MIN(), and MAX(). These become significantly more powerful when paired with GROUP BY, which computes a separate summary for each category in a column, such as the number of actors per genre or the average height per birth country. GROUP BY is covered in its own chapter.

3.5 Filtering Rows with WHERE

SELECT and FROM control which columns and tables you work with. The WHERE clause controls which rows appear in the result. Any row that does not satisfy the condition after WHERE is excluded from the output.

SELECT column1, column2
FROM table_name
WHERE condition;

3.5.1 Comparing Text Values

Use a single equals sign (=) to match a text value exactly. Text comparisons in PostgreSQL are case-sensitive by default, so 'Drama' and 'drama' are not the same.

SELECT first_name, last_name, birth_country, oscar_nominations
FROM actors
WHERE primary_genre = 'Drama'
ORDER BY oscar_nominations DESC;

Displaying records 1 - 25
first_name	last_name	birth_country	oscar_nominations
Meryl	Streep	USA	21
Katharine	Hepburn	USA	12
Jack	Nicholson	USA	12
Paul	Newman	USA	10
Laurence	Olivier	UK	10
Bette	Davis	USA	10
Al	Pacino	USA	9
Denzel	Washington	USA	9
Spencer	Tracy	USA	9
Geraldine	Page	USA	8
Peter	O’Toole	Ireland	8
Robert	De Niro	USA	8
Glenn	Close	USA	8
Cate	Blanchett	Australia	8
Marlon	Brando	USA	8
Ingrid	Bergman	Sweden	7
Kate	Winslet	UK	7
Dustin	Hoffman	USA	7
Jane	Fonda	USA	7
Richard	Burton	UK	7
Judi	Dench	UK	7
Daniel	Day-Lewis	UK	6
Sissy	Spacek	USA	6
Amy	Adams	USA	6
Anthony	Hopkins	UK	6

To exclude a value instead of matching it, use <> (or the equivalent !=):

SELECT first_name, last_name, primary_genre
FROM actors
WHERE primary_genre <> 'Drama';

3.5.2 Comparing Numeric Values

The standard comparison operators work on numeric columns: <, >, <=, >=, =, and <>.

SELECT first_name, last_name, oscar_nominations, oscar_wins
FROM actors
WHERE oscar_nominations >= 5
ORDER BY oscar_nominations DESC;

Displaying records 1 - 25
first_name	last_name	oscar_nominations	oscar_wins
Meryl	Streep	21	3
Katharine	Hepburn	12	4
Jack	Nicholson	12	3
Paul	Newman	10	1
Bette	Davis	10	2
Laurence	Olivier	10	1
Spencer	Tracy	9	2
Denzel	Washington	9	2
Al	Pacino	9	1
Peter	O’Toole	8	0
Jack	Lemmon	8	2
Cate	Blanchett	8	2
Glenn	Close	8	0
Robert	De Niro	8	2
Marlon	Brando	8	2
Geraldine	Page	8	1
Dustin	Hoffman	7	2
Ingrid	Bergman	7	3
Jane	Fonda	7	2
Richard	Burton	7	0
Judi	Dench	7	1
Kate	Winslet	7	1
Michael	Caine	6	2
Anthony	Hopkins	6	1
Daniel	Day-Lewis	6	3

Fifteen actors in the table have received five or more Oscar nominations. Meryl Streep leads with 21 nominations, more than any other actor in the dataset.

3.5.3 Comparing Date Values

Date columns can be compared with the same operators as numbers. PostgreSQL accepts date literals in 'YYYY-MM-DD' format.

SELECT first_name, last_name, birth_date, birth_country
FROM actors
WHERE birth_date >= '1980-01-01'
ORDER BY birth_date;

15 records
first_name	last_name	birth_date	birth_country
Rinko	Kikuchi	1981-01-06	Japan
Rami	Malek	1981-05-12	USA
Natalie	Portman	1981-06-09	USA
Priyanka	Chopra	1982-07-18	India
Lupita	Nyong’o	1983-03-01	Mexico
Chris	Hemsworth	1983-08-11	Australia
Keira	Knightley	1985-03-26	UK
Ana	de Armas	1988-04-30	Cuba
Alicia	Vikander	1988-10-03	Sweden
Emma	Stone	1988-11-06	USA
Brie	Larson	1989-10-01	USA
Jennifer	Lawrence	1990-08-15	USA
Saoirse	Ronan	1994-04-12	Ireland
Florence	Pugh	1996-01-03	UK
Timothée	Chalamet	2000-12-27	USA

This query returns actors born in 1980 or later, representing the youngest generation in the dataset.

3.5.4 BETWEEN

BETWEEN is shorthand for a range filter. The following two forms are exactly equivalent:

-- Using comparison operators
WHERE birth_date >= '1950-01-01' AND birth_date <= '1969-12-31'

-- Using BETWEEN
WHERE birth_date BETWEEN '1950-01-01' AND '1969-12-31'

BETWEEN is inclusive on both ends: a row where the column value equals either boundary is included in the result. It works with dates, integers, and strings.

SELECT first_name, last_name, birth_date, oscar_nominations
FROM actors
WHERE birth_date BETWEEN '1950-01-01' AND '1969-12-31'
ORDER BY birth_date;

Displaying records 1 - 25
first_name	last_name	birth_date	oscar_nominations
Geoffrey	Rush	1951-07-06	4
Roberto	Benigni	1952-10-27	3
Isabelle	Huppert	1953-03-16	2
Denzel	Washington	1954-12-28	9
Whoopi	Goldberg	1955-11-13	2
Koji	Yakusho	1956-01-02	0
Mel	Gibson	1956-01-03	2
Tom	Hanks	1956-07-09	5
Christoph	Waltz	1956-10-04	4
Daniel	Day-Lewis	1957-04-29	6
Sharon	Stone	1958-03-10	1
Gary	Oldman	1958-03-21	2
Michelle	Pfeiffer	1958-04-29	3
Kevin	Bacon	1958-07-08	0
Angela	Bassett	1958-08-16	1
Emma	Thompson	1959-04-15	5
Victoria	Abril	1959-07-04	0
Marcia Gay	Harden	1959-08-14	2
Ken	Watanabe	1959-10-21	1
Antonio	Banderas	1960-08-10	0
Colin	Firth	1960-09-10	2
Melissa	Leo	1960-09-14	2
Forest	Whitaker	1961-07-15	2
Jodie	Foster	1962-11-19	4
Ralph	Fiennes	1962-12-22	2

The same syntax applies to integer columns:

SELECT first_name, last_name, oscar_nominations, oscar_wins
FROM actors
WHERE oscar_nominations BETWEEN 3 AND 8
ORDER BY oscar_nominations DESC;

Displaying records 1 - 25
first_name	last_name	oscar_nominations	oscar_wins
Jack	Lemmon	8	2
Robert	De Niro	8	2
Cate	Blanchett	8	2
Marlon	Brando	8	2
Peter	O’Toole	8	0
Geraldine	Page	8	1
Glenn	Close	8	0
Kate	Winslet	7	1
Dustin	Hoffman	7	2
Jane	Fonda	7	2
Judi	Dench	7	1
Ingrid	Bergman	7	3
Richard	Burton	7	0
Thelma	Ritter	6	0
Anthony	Hopkins	6	1
Daniel	Day-Lewis	6	3
Michael	Caine	6	2
Shirley	MacLaine	6	1
Sissy	Spacek	6	1
Amy	Adams	6	0
Simone	Signoret	5	1
Audrey	Hepburn	5	1
James	Stewart	5	1
Sophia	Loren	5	1
Alec	Guinness	5	1

With strings, BETWEEN uses alphabetical (lexicographic) order. The boundary behavior is the same — both ends are inclusive — but the upper boundary catches only that exact string, not everything that begins with it. last_name BETWEEN 'D' AND 'H' would miss Hanks, Hepburn, and every other name that starts with H but sorts after the bare letter H. The correct way to express “last names beginning with D through H” is a half-open range using explicit operators:

SELECT first_name, last_name, primary_genre
FROM actors
WHERE last_name >= 'D' AND last_name < 'I'
ORDER BY last_name;

Displaying records 1 - 25
first_name	last_name	primary_genre
Bette	Davis	Drama
Viola	Davis	Drama
Doris	Day	Comedy
Daniel	Day-Lewis	Drama
Robert	De Niro	Drama
James	Dean	Drama
Alain	Delon	Action
Judi	Dench	Drama
Catherine	Deneuve	Drama
Gérard	Depardieu	Drama
Kirk	Douglas	Drama
Faye	Dunaway	Drama
Clint	Eastwood	Action
Chiwetel	Ejiofor	Drama
Idris	Elba	Action
Michael	Fassbender	Drama
Ralph	Fiennes	Drama
Albert	Finney	Drama
Colin	Firth	Drama
Henry	Fonda	Drama
Jane	Fonda	Drama
Jodie	Foster	Thriller
Jamie	Foxx	Drama
Gael	García Bernal	Drama
Giancarlo	Giannini	Drama

The lower bound >= 'D' is inclusive and works as expected. The upper bound uses < 'I' rather than <= 'H' precisely because 'H' alone would exclude names like Hanks and Hoffman that sort after the single character H.

Prefer explicit >= and <= comparisons over BETWEEN for most situations. The inclusive behavior is easy to forget, which makes off-by-one errors common, and BETWEEN cannot express half-open ranges. The alphabetical edge cases with strings make it especially error-prone there. Explicit operators are unambiguous. BETWEEN is fine when you encounter it in existing code or reach for it with simple integer or date ranges, but it is not the preferred form for new queries.

3.5.5 NULL Values and IS NULL

NULL is a special marker in SQL that represents the absence of a value. It is not the same as zero, an empty string, or a blank space. NULL means that no value was recorded for that column in that row at all.

The death_date column is NULL for actors who are still living. You can find living actors using IS NULL:

SELECT first_name, last_name, birth_date, birth_country
FROM actors
WHERE death_date IS NULL
ORDER BY birth_date;

Displaying records 1 - 25
first_name	last_name	birth_date	birth_country
Eva Marie	Saint	1924-07-04	USA
Gene	Hackman	1930-01-30	USA
Clint	Eastwood	1930-05-31	USA
Harriet	Andersson	1932-02-14	Sweden
Kim	Novak	1933-02-13	USA
Michael	Caine	1933-03-14	UK
Shirley	MacLaine	1934-04-24	USA
Sophia	Loren	1934-09-20	Italy
Brigitte	Bardot	1934-09-28	France
Judi	Dench	1934-12-09	UK
Julie	Andrews	1935-10-01	UK
Warren	Beatty	1937-03-30	USA
Jack	Nicholson	1937-04-22	USA
Dustin	Hoffman	1937-08-08	USA
Jane	Fonda	1937-12-21	USA
Anthony	Hopkins	1937-12-31	UK
Liv	Ullmann	1938-12-16	Norway
Al	Pacino	1940-04-25	USA
Faye	Dunaway	1941-01-14	USA
Barbra	Streisand	1942-04-24	USA
Giancarlo	Giannini	1942-08-01	Italy
Amitabh	Bachchan	1942-10-11	India
Robert	De Niro	1943-08-17	USA
Catherine	Deneuve	1943-10-22	France
Carmen	Maura	1945-09-15	Spain

To find rows where a value is present, use IS NOT NULL. The following query returns only deceased actors by requiring that death_date has been recorded:

SELECT first_name, last_name, birth_date, death_date
FROM actors
WHERE death_date IS NOT NULL
ORDER BY death_date DESC;

Displaying records 1 - 25
first_name	last_name	birth_date	death_date
Alain	Delon	1935-11-08	2024-08-18
Gina	Lollobrigida	1927-07-04	2023-01-16
Sidney	Poitier	1927-02-20	2022-01-06
Jean-Paul	Belmondo	1933-04-09	2021-09-06
Sean	Connery	1930-08-25	2020-10-31
Max	von Sydow	1929-04-10	2020-03-08
Kirk	Douglas	1916-12-09	2020-02-05
Doris	Day	1922-04-03	2019-05-13
Bibi	Andersson	1935-11-11	2019-04-14
Albert	Finney	1936-05-09	2019-02-07
Jeanne	Moreau	1928-01-23	2017-07-31
Roger	Moore	1927-10-14	2017-05-23
Debbie	Reynolds	1932-04-01	2016-12-28
Omar	Sharif	1932-04-10	2015-07-10
Philip	Hoffman	1967-07-23	2014-02-02
Peter	O’Toole	1932-08-02	2013-12-14
Celeste	Holm	1917-04-29	2012-07-15
Ernest	Borgnine	1917-01-24	2012-07-08
Elizabeth	Taylor	1932-02-27	2011-03-23
Tony	Curtis	1925-06-03	2010-09-29
Karl	Malden	1912-03-22	2009-07-01
Paul	Newman	1925-01-26	2008-09-26
Charlton	Heston	1923-10-04	2008-04-05
Heath	Ledger	1979-04-04	2008-01-22
Shelley	Winters	1920-08-18	2006-01-14

You cannot test for NULL using =. Writing WHERE birth_name = NULL always returns zero rows, because NULL is not considered equal to anything, not even another NULL. Always use IS NULL or IS NOT NULL when checking for missing values.

3.5.6 Combining Conditions with AND and OR

A WHERE clause can include multiple conditions joined by AND or OR.

AND requires both conditions to be true for a row to be included.
OR requires at least one condition to be true.

SELECT first_name, last_name, birth_country, oscar_nominations, oscar_wins
FROM actors
WHERE primary_genre = 'Drama'
  AND oscar_wins > 0
ORDER BY oscar_wins DESC, oscar_nominations DESC;

Displaying records 1 - 25
first_name	last_name	birth_country	oscar_nominations	oscar_wins
Katharine	Hepburn	USA	12	4
Meryl	Streep	USA	21	3
Jack	Nicholson	USA	12	3
Ingrid	Bergman	Sweden	7	3
Daniel	Day-Lewis	UK	6	3
Walter	Brennan	USA	4	3
Bette	Davis	USA	10	2
Denzel	Washington	USA	9	2
Spencer	Tracy	USA	9	2
Marlon	Brando	USA	8	2
Cate	Blanchett	Australia	8	2
Robert	De Niro	USA	8	2
Jane	Fonda	USA	7	2
Dustin	Hoffman	USA	7	2
Tom	Hanks	USA	5	2
Jessica	Lange	USA	5	2
Elizabeth	Taylor	UK	5	2
Vivien	Leigh	UK	5	2
Emma	Thompson	UK	5	2
Gene	Hackman	USA	5	2
Christoph	Waltz	Germany	4	2
Shelley	Winters	USA	4	2
Mahershala	Ali	USA	2	2
Hilary	Swank	USA	2	2
Paul	Newman	USA	10	1

This query returns Drama actors who have won at least one Oscar, sorted by wins and then by total nominations to break ties.

When combining AND and OR in the same WHERE clause, use parentheses to make your intentions explicit. SQL evaluates AND before OR, which can produce surprising results if conditions are not grouped clearly.

-- Returns actors in Drama or Comedy who have at least one nomination
WHERE (primary_genre = 'Drama' OR primary_genre = 'Comedy')
  AND oscar_nominations > 0

3.5.7 Matching Patterns with LIKE

The LIKE operator filters text columns using simple patterns. Two wildcard characters are available:

% matches any sequence of zero or more characters.
_ matches exactly one character.

The following example finds all actors whose last names begin with H:

SELECT first_name, last_name, primary_genre, birth_country
FROM actors
WHERE last_name LIKE 'H%'
ORDER BY last_name;

14 records
first_name	last_name	primary_genre	birth_country
Gene	Hackman	Drama	USA
Tom	Hanks	Drama	USA
Marcia Gay	Harden	Drama	USA
Rex	Harrison	Musical	UK
Salma	Hayek	Drama	Mexico
Chris	Hemsworth	Action	Australia
Katharine	Hepburn	Drama	USA
Audrey	Hepburn	Romance	UK
Charlton	Heston	Action	USA
Dustin	Hoffman	Drama	USA
Philip	Hoffman	Drama	USA
Celeste	Holm	Comedy	USA
Anthony	Hopkins	Drama	UK
Isabelle	Huppert	Drama	France

Lowercase 'h%' would return no results here because text matching with LIKE is case-sensitive, and no last names in this table begin with a lower-case h. ILIKE works identically to LIKE but ignores case. Replacing LIKE with ILIKE makes the same pattern work regardless of how the value is capitalized:

SELECT first_name, last_name, primary_genre, birth_country
FROM actors
WHERE last_name ILIKE 'h%'
ORDER BY last_name;

14 records
first_name	last_name	primary_genre	birth_country
Gene	Hackman	Drama	USA
Tom	Hanks	Drama	USA
Marcia Gay	Harden	Drama	USA
Rex	Harrison	Musical	UK
Salma	Hayek	Drama	Mexico
Chris	Hemsworth	Action	Australia
Katharine	Hepburn	Drama	USA
Audrey	Hepburn	Romance	UK
Charlton	Heston	Action	USA
Dustin	Hoffman	Drama	USA
Philip	Hoffman	Drama	USA
Celeste	Holm	Comedy	USA
Anthony	Hopkins	Drama	UK
Isabelle	Huppert	Drama	France

Both queries return the same fourteen rows because the data happens to be consistently capitalized. The difference becomes meaningful when data entry is inconsistent — for example, if primary_genre values were stored as a mix of 'drama', 'Drama', and 'DRAMA', a LIKE filter for 'drama' would miss the capitalized variants while ILIKE would catch all of them.

With % wildcards on both sides of a search term, you can find text that occurs anywhere in a column. The following query finds actors whose birth names contain the word “Schwartz” or any other substring:

SELECT first_name, last_name, birth_name
FROM actors
WHERE birth_name LIKE '%Bernard%';

1 records
first_name	last_name	birth_name
Tony	Curtis	Bernard Schwartz

3.5.8 Matching a List of Values with IN

When you want to match any one of several specific values, IN is more readable than a chain of OR conditions.

SELECT first_name, last_name, birth_country, oscar_nominations
FROM actors
WHERE birth_country IN ('UK', 'Australia', 'Ireland')
ORDER BY birth_country, oscar_nominations DESC;

Displaying records 1 - 25
first_name	last_name	birth_country	oscar_nominations
Cate	Blanchett	Australia	8
Nicole	Kidman	Australia	5
Geoffrey	Rush	Australia	4
Heath	Ledger	Australia	2
Hugh	Jackman	Australia	1
Toni	Collette	Australia	1
Eric	Bana	Australia	0
Chris	Hemsworth	Australia	0
Peter	O’Toole	Ireland	8
Saoirse	Ronan	Ireland	4
Laurence	Olivier	UK	10
Kate	Winslet	UK	7
Judi	Dench	UK	7
Richard	Burton	UK	7
Anthony	Hopkins	UK	6
Michael	Caine	UK	6
Daniel	Day-Lewis	UK	6
Vivien	Leigh	UK	5
Audrey	Hepburn	UK	5
Alec	Guinness	UK	5
Emma	Thompson	UK	5
Elizabeth	Taylor	UK	5
Albert	Finney	UK	5
Rex	Harrison	UK	5
Peter	Sellers	UK	3

3.5.9 Negating Conditions with NOT

The NOT keyword reverses the result of a condition. It can be placed in front of LIKE, IN, BETWEEN, and similar operators to express “everything except what this pattern matches.”

NOT LIKE excludes rows that match a pattern. The query below returns actors whose last names do not begin with a vowel:

SELECT first_name, last_name, primary_genre
FROM actors
WHERE last_name NOT LIKE 'A%'
  AND last_name NOT LIKE 'E%'
  AND last_name NOT LIKE 'I%'
  AND last_name NOT LIKE 'O%'
  AND last_name NOT LIKE 'U%'
ORDER BY last_name;

Displaying records 1 - 25
first_name	last_name	primary_genre
Amitabh	Bachchan	Drama
Kevin	Bacon	Drama
Eric	Bana	Action
Anne	Bancroft	Drama
Antonio	Banderas	Action
Javier	Bardem	Drama
Brigitte	Bardot	Romance
Angela	Bassett	Drama
Warren	Beatty	Drama
Monica	Bellucci	Action
Jean-Paul	Belmondo	Action
Roberto	Benigni	Comedy
Ingrid	Bergman	Drama
Halle	Berry	Action
Juliette	Binoche	Drama
Cate	Blanchett	Drama
Dirk	Bogarde	Drama
Humphrey	Bogart	Drama
Helena	Bonham Carter	Drama
Ernest	Borgnine	Drama
Marlon	Brando	Drama
Walter	Brennan	Drama
Sandra	Bullock	Comedy
Richard	Burton	Drama
Nicolas	Cage	Drama

NOT IN excludes rows whose column value matches any entry in the list:

SELECT first_name, last_name, birth_country, primary_genre
FROM actors
WHERE primary_genre NOT IN ('Drama', 'Action')
ORDER BY primary_genre, last_name;

Displaying records 1 - 25
first_name	last_name	birth_country	primary_genre
Roberto	Benigni	Italy	Comedy
Sandra	Bullock	USA	Comedy
Tony	Curtis	USA	Comedy
Doris	Day	USA	Comedy
Whoopi	Goldberg	USA	Comedy
Cary	Grant	UK	Comedy
Celeste	Holm	USA	Comedy
Diane	Keaton	USA	Comedy
Jack	Lemmon	USA	Comedy
Shirley	MacLaine	USA	Comedy
Walter	Matthau	USA	Comedy
Bette	Midler	USA	Comedy
Marilyn	Monroe	USA	Comedy
Thelma	Ritter	USA	Comedy
Peter	Sellers	UK	Comedy
Alastair	Sim	UK	Comedy
Alberto	Sordi	Italy	Comedy
Mira	Sorvino	USA	Comedy
Emma	Stone	USA	Comedy
Louis	de Funès	France	Comedy
Toni	Collette	Australia	Horror
Ruth	Gordon	USA	Horror
Janet	Leigh	USA	Horror
Julie	Andrews	UK	Musical
Fred	Astaire	USA	Musical

IS NOT NULL, introduced in the previous section, follows the same pattern and is technically a NOT form as well.

NOT can also be written as a prefix before a parenthesized condition:

WHERE NOT (primary_genre = 'Drama')

This is equivalent to WHERE primary_genre <> 'Drama'. The prefix form is more useful when negating a compound condition that would be awkward to rewrite with <> or other operators.

3.5.9.1 NOT IN and NULL Values

NOT IN behaves unexpectedly when NULL values are involved, and understanding this is important for using it safely.

When a NULL appears anywhere in the list passed to NOT IN, the condition returns no rows at all. To see this, try adding a NULL to an otherwise straightforward filter:

SELECT first_name, last_name, primary_genre
FROM actors
WHERE primary_genre NOT IN ('Drama', 'Action', NULL);

0 records
first_name	last_name	primary_genre

This returns zero rows despite the fact that most actors in the table are neither Drama nor Action performers. The reason comes down to how SQL evaluates NULL comparisons. The expression primary_genre NOT IN ('Drama', 'Action', NULL) expands internally to:

primary_genre <> 'Drama'
AND primary_genre <> 'Action'
AND primary_genre <> NULL

The third condition, primary_genre <> NULL, never evaluates to TRUE or FALSE. Any comparison with NULL produces NULL, meaning the result is unknown. When one part of an AND chain is unknown, the entire expression is unknown, and the row is excluded from the output. Because this applies to every row in the table, the query returns nothing.

A single NULL anywhere in the list passed to NOT IN silently causes the entire condition to return no rows. Because no error is raised, this problem can go unnoticed for a long time. When building a list for NOT IN, verify that it contains no NULL values. A later chapter covers how this same issue arises when the list comes from a subquery rather than a literal set of values, and introduces NOT EXISTS as the preferred alternative in those situations. Unlike NOT IN, NOT EXISTS handles NULL values correctly and does not produce this silent failure.

3.6 Sorting Results with ORDER BY

By default, a database returns rows in no particular order. The physical order of data in storage has nothing to do with alphabetical order, nomination count, or any other meaningful sequence. The ORDER BY clause lets you specify the order you want.

SELECT column1, column2
FROM table_name
ORDER BY column1 ASC;

ASC (ascending) is the default and can be omitted. Use DESC (descending) to reverse the order.

SELECT first_name, last_name, oscar_nominations, oscar_wins
FROM actors
ORDER BY oscar_nominations DESC;

Displaying records 1 - 25
first_name	last_name	oscar_nominations	oscar_wins
Meryl	Streep	21	3
Katharine	Hepburn	12	4
Jack	Nicholson	12	3
Laurence	Olivier	10	1
Paul	Newman	10	1
Bette	Davis	10	2
Spencer	Tracy	9	2
Denzel	Washington	9	2
Al	Pacino	9	1
Glenn	Close	8	0
Cate	Blanchett	8	2
Geraldine	Page	8	1
Jack	Lemmon	8	2
Marlon	Brando	8	2
Robert	De Niro	8	2
Peter	O’Toole	8	0
Ingrid	Bergman	7	3
Jane	Fonda	7	2
Dustin	Hoffman	7	2
Judi	Dench	7	1
Richard	Burton	7	0
Kate	Winslet	7	1
Shirley	MacLaine	6	1
Daniel	Day-Lewis	6	3
Sissy	Spacek	6	1

3.6.1 Sorting by Multiple Columns

You can sort by more than one column by listing them after ORDER BY, separated by commas. The database sorts by the first column, then uses the second column to break ties, then the third, and so on.

SELECT first_name, last_name, primary_genre, oscar_nominations
FROM actors
ORDER BY primary_genre ASC, oscar_nominations DESC;

Displaying records 1 - 25
first_name	last_name	primary_genre	oscar_nominations
Clint	Eastwood	Action	4
Russell	Crowe	Action	3
Mel	Gibson	Action	2
Charlton	Heston	Action	1
Lee	Marvin	Action	1
Hugh	Jackman	Action	1
Sean	Connery	Action	1
Steve	McQueen	Action	1
Halle	Berry	Action	1
Telly	Savalas	Action	1
Priyanka	Chopra	Action	0
Chris	Hemsworth	Action	0
Jean	Reno	Action	0
Monica	Bellucci	Action	0
Toshiro	Mifune	Action	0
Idris	Elba	Action	0
Alain	Delon	Action	0
Roger	Moore	Action	0
Eric	Bana	Action	0
Jean-Paul	Belmondo	Action	0
Robert	Mitchum	Action	0
Antonio	Banderas	Action	0
Jack	Lemmon	Comedy	8
Thelma	Ritter	Comedy	6
Shirley	MacLaine	Comedy	6

This query first groups rows by genre alphabetically, then within each genre sorts actors from most nominated to least. The result lets you compare the most decorated actors within each genre at a glance.

3.7 Limiting Results with LIMIT

The LIMIT clause caps the number of rows returned by a query. It is most useful when combined with ORDER BY, because the two together let you answer “top N” questions: the ten most nominated actors, the five tallest actors, and so on.

SELECT first_name, last_name, primary_genre, oscar_nominations
FROM actors
ORDER BY oscar_nominations DESC
LIMIT 10;

10 records
first_name	last_name	primary_genre	oscar_nominations
Meryl	Streep	Drama	21
Jack	Nicholson	Drama	12
Katharine	Hepburn	Drama	12
Laurence	Olivier	Drama	10
Paul	Newman	Drama	10
Bette	Davis	Drama	10
Al	Pacino	Drama	9
Denzel	Washington	Drama	9
Spencer	Tracy	Drama	9
Cate	Blanchett	Drama	8

Using LIMIT without ORDER BY returns an arbitrary subset of rows. Which rows you get depends on how the database happens to store and retrieve data at that moment, which can vary between runs. If you need a specific subset (such as the most nominated, tallest, or earliest-born), always pair LIMIT with an ORDER BY that defines what “first” means.

3.8 Putting It All Together

The clauses covered in this chapter can be combined in a single query. PostgreSQL evaluates them in a fixed logical order, regardless of how you arrange them in the text:

FROM (identify the table)
WHERE (filter rows)
SELECT (compute and choose columns)
ORDER BY (sort the result)
LIMIT (cut the result to the requested number of rows)

The query below uses all of these clauses together to find the ten most Oscar-nominated actors born outside the United States, showing their full name, birth country, nomination total, and wins.

SELECT
    first_name || ' ' || last_name AS full_name,
    birth_country,
    oscar_nominations,
    oscar_wins
FROM actors
WHERE birth_country <> 'USA'
  AND death_date IS NULL
ORDER BY oscar_nominations DESC
LIMIT 10;

10 records
full_name	birth_country	oscar_nominations	oscar_wins
Cate Blanchett	Australia	8	2
Kate Winslet	UK	7	1
Judi Dench	UK	7	1
Daniel Day-Lewis	UK	6	3
Michael Caine	UK	6	2
Anthony Hopkins	UK	6	1
Sophia Loren	Italy	5	1
Emma Thompson	UK	5	2
Nicole Kidman	Australia	5	1
Geoffrey Rush	Australia	4	1

Reading a query like this from top to bottom is natural once you understand what each clause contributes. As you move through subsequent chapters and your queries grow more complex, this structured way of thinking about each clause separately will continue to serve as a reliable foundation.

3.9 SQL as a Declarative Language

Now that you have a sense for how a basic SELECT statement is written, it is important to understand something fundamental about how SQL thinks, because it is different from how R and Python think.

If you have experience with either of those languages, you are accustomed to writing programs that specify a sequence of steps. You tell the computer exactly what to do and in what order: load this data, filter these rows, select these columns, sort the result. This style is called imperative programming, because you are issuing commands.

SQL is declarative. Rather than specifying steps, you describe the result you want and let the database engine figure out how to produce it.

To make the contrast concrete, consider retrieving Drama actors sorted by nomination count. In Python with pandas, you might write:

drama = df[df["primary_genre"] == "Drama"]
result = (
    drama[["first_name", "last_name", "oscar_nominations"]]
    .sort_values("oscar_nominations", ascending=False)
)

In R with dplyr:

result <- actors |>
  filter(primary_genre == "Drama") |>
  select(first_name, last_name, oscar_nominations) |>
  arrange(desc(oscar_nominations))

Both of these describe a sequence of operations. The order of the instructions matches the intended order of execution.

The equivalent SQL looks like this:

SELECT first_name, last_name, oscar_nominations
FROM actors
WHERE primary_genre = 'Drama'
ORDER BY oscar_nominations DESC;

The SQL query does not describe steps. It describes a result: “I want the first_name, last_name, and oscar_nominations columns, from the actors table, restricted to rows where primary_genre is 'Drama', sorted by oscar_nominations descending.” The database engine reads that description and decides for itself how to retrieve the data efficiently. It may use an index, reorder internal operations, or parallelize work across processors. You do not control those decisions, and most of the time you do not need to.

3.9.1 Set-Based Thinking

One practical consequence of SQL’s declarative nature is that it operates on sets of rows rather than individual records. When you write a WHERE clause, you are not writing a loop that examines each row in turn. You are describing a condition that applies to the entire table at once, and the database evaluates it however it finds most efficient.

This can feel unfamiliar at first. In Python or R, reaching for a loop or an apply-style function is instinctive. In SQL, you resist that instinct and instead describe what the complete result set should look like. Queries that try to mimic row-by-row processing are almost always slower and harder to read than queries that embrace the set-based model.

3.9.2 The Logical Order of Execution

There is one aspect of SQL’s declarative model that surprises many people coming from R or Python: the order in which you write a query’s clauses is not the order in which the database evaluates them.

A query is written as:

SELECT ...
FROM ...
WHERE ...
ORDER BY ...
LIMIT ...

But the database processes the clauses in this logical order:

FROM (identify the source table)
WHERE (filter rows)
SELECT (compute and choose columns)
ORDER BY (sort the result)
LIMIT (trim to the requested number of rows)

The most common practical consequence of this is that column aliases defined in SELECT cannot be referenced in a WHERE clause, because WHERE runs before SELECT. The following query fails for exactly that reason:

-- This fails: 'height_in' is not defined when WHERE runs
SELECT ROUND(height_cm / 2.54::numeric, 1) AS height_in
FROM actors
WHERE height_in > 72;

You have to repeat the expression in WHERE instead:

SELECT ROUND(height_cm / 2.54::numeric, 1) AS height_in
FROM actors
WHERE ROUND(height_cm / 2.54::numeric, 1) > 72;

ORDER BY is an exception: most databases, including PostgreSQL, allow ORDER BY to reference aliases defined in SELECT, because ORDER BY is evaluated after SELECT. That is why ORDER BY height_in would work in the query above even though WHERE height_in does not.

Keeping this logical order in mind will make otherwise cryptic error messages much easier to diagnose as your queries grow more complex.

3.10 Exercises

3.10.1 Reflection

SQL is described in this chapter as a “declarative” programming language, while R and Python are “imperative.” In your own words, explain what that distinction means. How does it change the way you approach writing a query compared to writing a data pipeline in R or Python?
A colleague writes the following query hoping to find all actors who are over six feet tall (approximately 183 cm), but PostgreSQL returns an error. Identify the problem and explain how to fix it.
```
SELECT first_name, last_name, ROUND(height_cm / 2.54::numeric, 1) AS height_in
FROM actors
WHERE height_in > 72;
```
Explain the difference between COUNT(*) and COUNT(birth_name) as they apply to the actors table. Under what circumstances would the two expressions return different values?
A classmate wants to retrieve all actors for whom no birth name has been recorded. They write the query below, but it always returns zero rows. Why? What should they write instead?
```
SELECT first_name, last_name
FROM actors
WHERE birth_name = NULL;
```

3.10.2 Coding

For each exercise below, write a SQL query against the actors database.

Write a query that retrieves the first name, last name, birth country, and primary genre for every actor in the table. Sort the results alphabetically by last name.
Write a query that returns the full name (first and last name combined into a single column called full_name), birth country, and Oscar nomination count for all actors born in the United Kingdom, sorted from most to least nominated.
Write a query that shows only the distinct birth countries represented in the actors table, sorted alphabetically. Then write a second query using COUNT(DISTINCT birth_country) to confirm the total number of unique countries.
Write a query that returns the first name, last name, height in centimeters, and height converted to inches (rounded to one decimal place, named height_in) for all actors over 185 cm tall. Sort from tallest to shortest.
Write a query that returns the first name, last name, primary genre, Oscar nominations, and Oscar wins for all actors who meet every one of the following conditions: they are still living (no death date recorded), they have received at least three Oscar nominations, and they were born in the USA. Sort the results by Oscar nominations in descending order.