3  SELECT Statements

3.1 Learning Objectives

By the end of this chapter, students will be able to:

  • Write a SELECT statement that retrieves specific columns from a table using SELECT ... FROM.
  • Apply column aliases using AS to rename columns in query output.
  • Construct text concatenation expressions using the || operator to combine column values and literal strings.
  • Write arithmetic expressions in SELECT to compute new values from existing numeric columns.
  • Apply DISTINCT to return only unique values from a column or combination of columns.
  • Use COUNT(*), COUNT(column), and COUNT(DISTINCT column) to count rows and measure data completeness.
  • Apply WHERE with comparison operators (=, <>, <, >, BETWEEN, IN, LIKE, IS NULL) to filter rows.
  • Combine filter conditions using AND, OR, and NOT, applying parentheses to control evaluation order.
  • Explain why WHERE column = NULL always returns zero rows and use IS NULL instead.
  • Describe the NOT IN / NULL trap and explain why a single NULL in the list causes no rows to be returned.
  • Sort query results using ORDER BY with ASC and DESC across one or more columns.
  • Use LIMIT to restrict the number of rows returned and explain its interaction with ORDER BY.
  • Describe the logical order in which SQL clauses are evaluated and explain why column aliases cannot be referenced in WHERE.

Now that your database environment is up and running, it is time to write some SQL. The most fundamental operation in SQL is the query: a question you ask the database, written in a structured way that the database engine can understand and answer. In this chapter, you will write your first queries against the actors database that ships with the course Docker image.

The actors table contains one row per actor, covering 200 entries drawn from film history. Each row records the actor’s name, birth name (for those who used a stage name), sex, birth and death dates, country of birth, height, Oscar nominations and wins, and primary film genre. It is a rich dataset for learning SQL because it mixes text columns, fixed-length character fields, whole numbers, and dates in a single table.

By the end of this chapter, you will know how to retrieve data from a table, choose which columns to display, compute new values from existing ones, filter rows by any condition you can describe, sort the results, and cap how many rows are returned.

3.2 The SELECT and FROM Clauses

Every query that retrieves data starts with a SELECT statement. The two required pieces are:

  • SELECT specifies which columns you want in the result.
  • FROM tells the database which table to look in.

3.2.1 Selecting All Columns

The quickest way to see everything a table contains is to use an asterisk (*) in place of column names. The asterisk is shorthand for “every column, in the order they are defined.”

SELECT *
FROM actors
LIMIT 5;
5 records
actor_id first_name last_name birth_name sex birth_date death_date birth_country height_cm oscar_nominations oscar_wins primary_genre has_honorary_oscar notable_role total_box_office_usd
1 Tom Hanks NULL M 1956-07-09 NULL USA 183 5 2 Drama FALSE Forrest Gump 4.8e+09
2 Meryl Streep NULL F 1949-06-22 NULL USA 168 21 3 Drama FALSE The Devil Wears Prada 2.1e+09
3 Cate Blanchett NULL F 1969-05-14 NULL Australia 174 8 2 Drama FALSE Elizabeth 5.1e+09
4 Marlon Brando NULL M 1924-04-03 2004-07-01 USA 175 8 2 Drama FALSE The Godfather 1.6e+09
5 Audrey Hepburn Audrey Kathleen Ruston F 1929-05-04 1993-01-20 UK 170 5 1 Romance TRUE Breakfast at Tiffany’s 9.2e+08

The LIMIT 5 at the end restricts the output to five rows. Without it, the query would return all 200 rows in the table. This is similar to the head() function in R or Python. You will learn more about LIMIT later in this chapter, but getting into the habit of adding it while exploring an unfamiliar table is good practice.

3.2.2 Selecting Specific Columns

In most real situations, you will not want every column. Listing only the columns you need makes results easier to read and faster to retrieve from large tables.

SELECT first_name, last_name, birth_country, primary_genre
FROM actors;
Displaying records 1 - 25
first_name last_name birth_country primary_genre
Tom Hanks USA Drama
Meryl Streep USA Drama
Cate Blanchett Australia Drama
Marlon Brando USA Drama
Audrey Hepburn UK Romance
Humphrey Bogart USA Drama
Katharine Hepburn USA Drama
Jack Nicholson USA Drama
Denzel Washington USA Drama
Ingrid Bergman Sweden Drama
James Stewart USA Drama
Bette Davis USA Drama
Sidney Poitier USA Drama
Jane Fonda USA Drama
Al Pacino USA Drama
Robert De Niro USA Drama
Sophia Loren Italy Drama
Marcello Mastroianni Italy Drama
Jeanne Moreau France Drama
Jean-Paul Belmondo France Action
Catherine Deneuve France Drama
Isabelle Huppert France Drama
Gérard Depardieu France Drama
Alec Guinness UK Drama
Peter Sellers UK Comedy

Column names are listed after SELECT, separated by commas. The database returns them in exactly the order you specify, regardless of how they are arranged inside the table.

Relying on SELECT * in production code is considered poor practice because it breaks silently when someone adds, removes, or reorders columns in the table. Naming your columns explicitly makes queries more predictable and easier to maintain.

3.3 Shaping Your Output

Beyond choosing which columns to include, SQL lets you rename columns, combine text from multiple columns, and compute new values on the fly. None of these operations modify the data in the table; they only affect what is shown in the result.

3.3.1 Column Aliases

By default, a column in your result inherits the name it has in the table. You can override this with the AS keyword, which assigns an alias to a column. Aliases are useful for giving computed values a meaningful name, shortening long column names, or improving readability.

SELECT
    first_name        AS given_name,
    last_name         AS surname,
    birth_country     AS country,
    primary_genre     AS genre,
    oscar_nominations AS nominations
FROM actors;
Displaying records 1 - 25
given_name surname country genre nominations
Tom Hanks USA Drama 5
Meryl Streep USA Drama 21
Cate Blanchett Australia Drama 8
Marlon Brando USA Drama 8
Audrey Hepburn UK Romance 5
Humphrey Bogart USA Drama 3
Katharine Hepburn USA Drama 12
Jack Nicholson USA Drama 12
Denzel Washington USA Drama 9
Ingrid Bergman Sweden Drama 7
James Stewart USA Drama 5
Bette Davis USA Drama 10
Sidney Poitier USA Drama 2
Jane Fonda USA Drama 7
Al Pacino USA Drama 9
Robert De Niro USA Drama 8
Sophia Loren Italy Drama 5
Marcello Mastroianni Italy Drama 3
Jeanne Moreau France Drama 0
Jean-Paul Belmondo France Action 0
Catherine Deneuve France Drama 2
Isabelle Huppert France Drama 2
Gérard Depardieu France Drama 1
Alec Guinness UK Drama 5
Peter Sellers UK Comedy 3

The AS keyword is optional in PostgreSQL; you can write the alias directly after the column name without it. Using AS explicitly is recommended, however, because it makes the intent of the query immediately clear to anyone reading it.

3.3.2 Combining Text Columns

PostgreSQL uses the || operator to concatenate (join) strings together. This lets you build a single display value out of multiple columns, or add literal text between column values.

SELECT
    first_name || ' ' || last_name AS full_name,
    birth_country,
    primary_genre
FROM actors;
Displaying records 1 - 25
full_name birth_country primary_genre
Tom Hanks USA Drama
Meryl Streep USA Drama
Cate Blanchett Australia Drama
Marlon Brando USA Drama
Audrey Hepburn UK Romance
Humphrey Bogart USA Drama
Katharine Hepburn USA Drama
Jack Nicholson USA Drama
Denzel Washington USA Drama
Ingrid Bergman Sweden Drama
James Stewart USA Drama
Bette Davis USA Drama
Sidney Poitier USA Drama
Jane Fonda USA Drama
Al Pacino USA Drama
Robert De Niro USA Drama
Sophia Loren Italy Drama
Marcello Mastroianni Italy Drama
Jeanne Moreau France Drama
Jean-Paul Belmondo France Action
Catherine Deneuve France Drama
Isabelle Huppert France Drama
Gérard Depardieu France Drama
Alec Guinness UK Drama
Peter Sellers UK Comedy

You can add as many literal strings as you like. Here, a comma and space are inserted between the last name and the birth country to produce a compact bio label:

SELECT
    first_name || ' ' || last_name || ', ' || birth_country AS bio_label,
    primary_genre
FROM actors;
Displaying records 1 - 25
bio_label primary_genre
Tom Hanks, USA Drama
Meryl Streep, USA Drama
Cate Blanchett, Australia Drama
Marlon Brando, USA Drama
Audrey Hepburn, UK Romance
Humphrey Bogart, USA Drama
Katharine Hepburn, USA Drama
Jack Nicholson, USA Drama
Denzel Washington, USA Drama
Ingrid Bergman, Sweden Drama
James Stewart, USA Drama
Bette Davis, USA Drama
Sidney Poitier, USA Drama
Jane Fonda, USA Drama
Al Pacino, USA Drama
Robert De Niro, USA Drama
Sophia Loren, Italy Drama
Marcello Mastroianni, Italy Drama
Jeanne Moreau, France Drama
Jean-Paul Belmondo, France Action
Catherine Deneuve, France Drama
Isabelle Huppert, France Drama
Gérard Depardieu, France Drama
Alec Guinness, UK Drama
Peter Sellers, UK Comedy

If any value being concatenated is NULL, the entire result of || becomes NULL. The actors table has a birth_name column that is NULL for most actors. If you concatenated birth_name into a label without accounting for that, those rows would produce a NULL result. The COALESCE function, covered in a later chapter, handles this situation cleanly.

3.3.3 Arithmetic Operations

SQL treats numeric columns like variables in an algebraic expression. You can add, subtract, multiply, and divide columns and constants using standard arithmetic operators (+, -, *, /).

As an example, height_cm stores each actor’s height in centimeters. You can convert that to inches by dividing by 2.54:

SELECT
    first_name || ' ' || last_name AS full_name,
    height_cm,
    ROUND(height_cm / 2.54::numeric, 1) AS height_in
FROM actors
ORDER BY height_cm DESC
LIMIT 10;
10 records
full_name height_cm height_in
Max von Sydow 196 77.2
Clint Eastwood 193 76.0
Gregory Peck 191 75.2
Charlton Heston 191 75.2
Chris Hemsworth 190 74.8
James Stewart 188 74.0
Idris Elba 188 74.0
Sidney Poitier 188 74.0
Hugh Jackman 188 74.0
Henry Fonda 188 74.0

ROUND(value, 1) rounds the result to one decimal place. The ::numeric cast converts the 2.54 literal to a decimal type before dividing, which ensures a decimal result rather than integer division.

3.4 Exploring Data with DISTINCT and COUNT

Two features of the SELECT clause are especially useful when you first sit down with an unfamiliar database: DISTINCT, which removes duplicate values from your results, and COUNT, which tallies rows. Together they let you quickly understand what a table contains and how its data is distributed before you start filtering or transforming it.

3.4.1 DISTINCT

Adding the DISTINCT keyword immediately after SELECT tells the database to return only unique combinations of the specified columns, removing any duplicate rows from the result.

A common use is to see every unique value that appears in a categorical column. The actors table has 200 rows but only eight distinct primary genres. Without DISTINCT, selecting the primary_genre column would return 200 rows with many repeated values. With it, you get the eight unique values once each:

SELECT DISTINCT primary_genre
FROM actors
ORDER BY primary_genre;
8 records
primary_genre
Action
Comedy
Drama
Horror
Musical
Romance
Sci-Fi
Thriller

DISTINCT applies to the full combination of columns listed, not just the first one. If you write SELECT DISTINCT primary_genre, birth_country, the result contains each unique pairing of those two values rather than unique genres alone.

SELECT DISTINCT is useful for a quick look at what values exist in a column, but it is not a substitute for proper data profiling. For large tables, counting how many rows correspond to each distinct value tells you much more than the list of values alone. GROUP BY, covered in a later chapter, is the right tool for that.

3.4.2 COUNT

COUNT is an aggregate function: rather than returning one row per input row, it collapses results into a single summary value. The most common form, COUNT(*), returns the total number of rows in the result.

SELECT COUNT(*) AS total_actors
FROM actors;
1 records
total_actors
200

Paired with a WHERE clause, COUNT(*) counts only the rows that satisfy the filter:

SELECT COUNT(*) AS drama_actors
FROM actors
WHERE primary_genre = 'Drama';
1 records
drama_actors
135

COUNT(column_name), with a specific column instead of *, counts only rows where that column is not NULL. This makes it a convenient way to measure how complete your data is:

SELECT
    COUNT(*)          AS total_rows,
    COUNT(birth_name) AS rows_with_birth_name,
    COUNT(death_date) AS rows_with_death_date
FROM actors;
1 records
total_rows rows_with_birth_name rows_with_death_date
200 70 82

The birth_name column records an actor’s legal name when they perform under a stage name. The result shows that only 70 of the 200 actors have a recorded birth name, meaning 130 either perform under their real name or have no birth name on record. Similarly, death_date is NULL for living actors, so COUNT(death_date) reveals how many actors in the table are deceased.

COUNT(DISTINCT column) combines both ideas, counting how many unique values appear in a column:

SELECT COUNT(DISTINCT birth_country) AS num_countries
FROM actors;
1 records
num_countries
19

COUNT belongs to a family of aggregate functions that summarize groups of rows into a single value. Others include SUM(), AVG(), MIN(), and MAX(). These become significantly more powerful when paired with GROUP BY, which computes a separate summary for each category in a column, such as the number of actors per genre or the average height per birth country. GROUP BY is covered in its own chapter.

3.5 Filtering Rows with WHERE

SELECT and FROM control which columns and tables you work with. The WHERE clause controls which rows appear in the result. Any row that does not satisfy the condition after WHERE is excluded from the output.

SELECT column1, column2
FROM table_name
WHERE condition;

3.5.1 Comparing Text Values

Use a single equals sign (=) to match a text value exactly. Text comparisons in PostgreSQL are case-sensitive by default, so 'Drama' and 'drama' are not the same.

SELECT first_name, last_name, birth_country, oscar_nominations
FROM actors
WHERE primary_genre = 'Drama'
ORDER BY oscar_nominations DESC;
Displaying records 1 - 25
first_name last_name birth_country oscar_nominations
Meryl Streep USA 21
Katharine Hepburn USA 12
Jack Nicholson USA 12
Paul Newman USA 10
Laurence Olivier UK 10
Bette Davis USA 10
Al Pacino USA 9
Denzel Washington USA 9
Spencer Tracy USA 9
Geraldine Page USA 8
Peter O’Toole Ireland 8
Robert De Niro USA 8
Glenn Close USA 8
Cate Blanchett Australia 8
Marlon Brando USA 8
Ingrid Bergman Sweden 7
Kate Winslet UK 7
Dustin Hoffman USA 7
Jane Fonda USA 7
Richard Burton UK 7
Judi Dench UK 7
Daniel Day-Lewis UK 6
Sissy Spacek USA 6
Amy Adams USA 6
Anthony Hopkins UK 6

To exclude a value instead of matching it, use <> (or the equivalent !=):

SELECT first_name, last_name, primary_genre
FROM actors
WHERE primary_genre <> 'Drama';

3.5.2 Comparing Numeric Values

The standard comparison operators work on numeric columns: <, >, <=, >=, =, and <>.

SELECT first_name, last_name, oscar_nominations, oscar_wins
FROM actors
WHERE oscar_nominations >= 5
ORDER BY oscar_nominations DESC;
Displaying records 1 - 25
first_name last_name oscar_nominations oscar_wins
Meryl Streep 21 3
Katharine Hepburn 12 4
Jack Nicholson 12 3
Paul Newman 10 1
Bette Davis 10 2
Laurence Olivier 10 1
Spencer Tracy 9 2
Denzel Washington 9 2
Al Pacino 9 1
Peter O’Toole 8 0
Jack Lemmon 8 2
Cate Blanchett 8 2
Glenn Close 8 0
Robert De Niro 8 2
Marlon Brando 8 2
Geraldine Page 8 1
Dustin Hoffman 7 2
Ingrid Bergman 7 3
Jane Fonda 7 2
Richard Burton 7 0
Judi Dench 7 1
Kate Winslet 7 1
Michael Caine 6 2
Anthony Hopkins 6 1
Daniel Day-Lewis 6 3

Fifteen actors in the table have received five or more Oscar nominations. Meryl Streep leads with 21 nominations, more than any other actor in the dataset.

3.5.3 Comparing Date Values

Date columns can be compared with the same operators as numbers. PostgreSQL accepts date literals in 'YYYY-MM-DD' format.

SELECT first_name, last_name, birth_date, birth_country
FROM actors
WHERE birth_date >= '1980-01-01'
ORDER BY birth_date;
15 records
first_name last_name birth_date birth_country
Rinko Kikuchi 1981-01-06 Japan
Rami Malek 1981-05-12 USA
Natalie Portman 1981-06-09 USA
Priyanka Chopra 1982-07-18 India
Lupita Nyong’o 1983-03-01 Mexico
Chris Hemsworth 1983-08-11 Australia
Keira Knightley 1985-03-26 UK
Ana de Armas 1988-04-30 Cuba
Alicia Vikander 1988-10-03 Sweden
Emma Stone 1988-11-06 USA
Brie Larson 1989-10-01 USA
Jennifer Lawrence 1990-08-15 USA
Saoirse Ronan 1994-04-12 Ireland
Florence Pugh 1996-01-03 UK
Timothée Chalamet 2000-12-27 USA

This query returns actors born in 1980 or later, representing the youngest generation in the dataset.

3.5.4 BETWEEN

BETWEEN is shorthand for a range filter. The following two forms are exactly equivalent:

-- Using comparison operators
WHERE birth_date >= '1950-01-01' AND birth_date <= '1969-12-31'

-- Using BETWEEN
WHERE birth_date BETWEEN '1950-01-01' AND '1969-12-31'

BETWEEN is inclusive on both ends: a row where the column value equals either boundary is included in the result. It works with dates, integers, and strings.

SELECT first_name, last_name, birth_date, oscar_nominations
FROM actors
WHERE birth_date BETWEEN '1950-01-01' AND '1969-12-31'
ORDER BY birth_date;
Displaying records 1 - 25
first_name last_name birth_date oscar_nominations
Geoffrey Rush 1951-07-06 4
Roberto Benigni 1952-10-27 3
Isabelle Huppert 1953-03-16 2
Denzel Washington 1954-12-28 9
Whoopi Goldberg 1955-11-13 2
Koji Yakusho 1956-01-02 0
Mel Gibson 1956-01-03 2
Tom Hanks 1956-07-09 5
Christoph Waltz 1956-10-04 4
Daniel Day-Lewis 1957-04-29 6
Sharon Stone 1958-03-10 1
Gary Oldman 1958-03-21 2
Michelle Pfeiffer 1958-04-29 3
Kevin Bacon 1958-07-08 0
Angela Bassett 1958-08-16 1
Emma Thompson 1959-04-15 5
Victoria Abril 1959-07-04 0
Marcia Gay Harden 1959-08-14 2
Ken Watanabe 1959-10-21 1
Antonio Banderas 1960-08-10 0
Colin Firth 1960-09-10 2
Melissa Leo 1960-09-14 2
Forest Whitaker 1961-07-15 2
Jodie Foster 1962-11-19 4
Ralph Fiennes 1962-12-22 2

The same syntax applies to integer columns:

SELECT first_name, last_name, oscar_nominations, oscar_wins
FROM actors
WHERE oscar_nominations BETWEEN 3 AND 8
ORDER BY oscar_nominations DESC;
Displaying records 1 - 25
first_name last_name oscar_nominations oscar_wins
Jack Lemmon 8 2
Robert De Niro 8 2
Cate Blanchett 8 2
Marlon Brando 8 2
Peter O’Toole 8 0
Geraldine Page 8 1
Glenn Close 8 0
Kate Winslet 7 1
Dustin Hoffman 7 2
Jane Fonda 7 2
Judi Dench 7 1
Ingrid Bergman 7 3
Richard Burton 7 0
Thelma Ritter 6 0
Anthony Hopkins 6 1
Daniel Day-Lewis 6 3
Michael Caine 6 2
Shirley MacLaine 6 1
Sissy Spacek 6 1
Amy Adams 6 0
Simone Signoret 5 1
Audrey Hepburn 5 1
James Stewart 5 1
Sophia Loren 5 1
Alec Guinness 5 1

With strings, BETWEEN uses alphabetical (lexicographic) order. The boundary behavior is the same — both ends are inclusive — but the upper boundary catches only that exact string, not everything that begins with it. last_name BETWEEN 'D' AND 'H' would miss Hanks, Hepburn, and every other name that starts with H but sorts after the bare letter H. The correct way to express “last names beginning with D through H” is a half-open range using explicit operators:

SELECT first_name, last_name, primary_genre
FROM actors
WHERE last_name >= 'D' AND last_name < 'I'
ORDER BY last_name;
Displaying records 1 - 25
first_name last_name primary_genre
Bette Davis Drama
Viola Davis Drama
Doris Day Comedy
Daniel Day-Lewis Drama
Robert De Niro Drama
James Dean Drama
Alain Delon Action
Judi Dench Drama
Catherine Deneuve Drama
Gérard Depardieu Drama
Kirk Douglas Drama
Faye Dunaway Drama
Clint Eastwood Action
Chiwetel Ejiofor Drama
Idris Elba Action
Michael Fassbender Drama
Ralph Fiennes Drama
Albert Finney Drama
Colin Firth Drama
Henry Fonda Drama
Jane Fonda Drama
Jodie Foster Thriller
Jamie Foxx Drama
Gael García Bernal Drama
Giancarlo Giannini Drama

The lower bound >= 'D' is inclusive and works as expected. The upper bound uses < 'I' rather than <= 'H' precisely because 'H' alone would exclude names like Hanks and Hoffman that sort after the single character H.

Prefer explicit >= and <= comparisons over BETWEEN for most situations. The inclusive behavior is easy to forget, which makes off-by-one errors common, and BETWEEN cannot express half-open ranges. The alphabetical edge cases with strings make it especially error-prone there. Explicit operators are unambiguous. BETWEEN is fine when you encounter it in existing code or reach for it with simple integer or date ranges, but it is not the preferred form for new queries.

3.5.5 NULL Values and IS NULL

NULL is a special marker in SQL that represents the absence of a value. It is not the same as zero, an empty string, or a blank space. NULL means that no value was recorded for that column in that row at all.

The death_date column is NULL for actors who are still living. You can find living actors using IS NULL:

SELECT first_name, last_name, birth_date, birth_country
FROM actors
WHERE death_date IS NULL
ORDER BY birth_date;
Displaying records 1 - 25
first_name last_name birth_date birth_country
Eva Marie Saint 1924-07-04 USA
Gene Hackman 1930-01-30 USA
Clint Eastwood 1930-05-31 USA
Harriet Andersson 1932-02-14 Sweden
Kim Novak 1933-02-13 USA
Michael Caine 1933-03-14 UK
Shirley MacLaine 1934-04-24 USA
Sophia Loren 1934-09-20 Italy
Brigitte Bardot 1934-09-28 France
Judi Dench 1934-12-09 UK
Julie Andrews 1935-10-01 UK
Warren Beatty 1937-03-30 USA
Jack Nicholson 1937-04-22 USA
Dustin Hoffman 1937-08-08 USA
Jane Fonda 1937-12-21 USA
Anthony Hopkins 1937-12-31 UK
Liv Ullmann 1938-12-16 Norway
Al Pacino 1940-04-25 USA
Faye Dunaway 1941-01-14 USA
Barbra Streisand 1942-04-24 USA
Giancarlo Giannini 1942-08-01 Italy
Amitabh Bachchan 1942-10-11 India
Robert De Niro 1943-08-17 USA
Catherine Deneuve 1943-10-22 France
Carmen Maura 1945-09-15 Spain

To find rows where a value is present, use IS NOT NULL. The following query returns only deceased actors by requiring that death_date has been recorded:

SELECT first_name, last_name, birth_date, death_date
FROM actors
WHERE death_date IS NOT NULL
ORDER BY death_date DESC;
Displaying records 1 - 25
first_name last_name birth_date death_date
Alain Delon 1935-11-08 2024-08-18
Gina Lollobrigida 1927-07-04 2023-01-16
Sidney Poitier 1927-02-20 2022-01-06
Jean-Paul Belmondo 1933-04-09 2021-09-06
Sean Connery 1930-08-25 2020-10-31
Max von Sydow 1929-04-10 2020-03-08
Kirk Douglas 1916-12-09 2020-02-05
Doris Day 1922-04-03 2019-05-13
Bibi Andersson 1935-11-11 2019-04-14
Albert Finney 1936-05-09 2019-02-07
Jeanne Moreau 1928-01-23 2017-07-31
Roger Moore 1927-10-14 2017-05-23
Debbie Reynolds 1932-04-01 2016-12-28
Omar Sharif 1932-04-10 2015-07-10
Philip Hoffman 1967-07-23 2014-02-02
Peter O’Toole 1932-08-02 2013-12-14
Celeste Holm 1917-04-29 2012-07-15
Ernest Borgnine 1917-01-24 2012-07-08
Elizabeth Taylor 1932-02-27 2011-03-23
Tony Curtis 1925-06-03 2010-09-29
Karl Malden 1912-03-22 2009-07-01
Paul Newman 1925-01-26 2008-09-26
Charlton Heston 1923-10-04 2008-04-05
Heath Ledger 1979-04-04 2008-01-22
Shelley Winters 1920-08-18 2006-01-14

You cannot test for NULL using =. Writing WHERE birth_name = NULL always returns zero rows, because NULL is not considered equal to anything, not even another NULL. Always use IS NULL or IS NOT NULL when checking for missing values.

3.5.6 Combining Conditions with AND and OR

A WHERE clause can include multiple conditions joined by AND or OR.

  • AND requires both conditions to be true for a row to be included.
  • OR requires at least one condition to be true.
SELECT first_name, last_name, birth_country, oscar_nominations, oscar_wins
FROM actors
WHERE primary_genre = 'Drama'
  AND oscar_wins > 0
ORDER BY oscar_wins DESC, oscar_nominations DESC;
Displaying records 1 - 25
first_name last_name birth_country oscar_nominations oscar_wins
Katharine Hepburn USA 12 4
Meryl Streep USA 21 3
Jack Nicholson USA 12 3
Ingrid Bergman Sweden 7 3
Daniel Day-Lewis UK 6 3
Walter Brennan USA 4 3
Bette Davis USA 10 2
Denzel Washington USA 9 2
Spencer Tracy USA 9 2
Marlon Brando USA 8 2
Cate Blanchett Australia 8 2
Robert De Niro USA 8 2
Jane Fonda USA 7 2
Dustin Hoffman USA 7 2
Tom Hanks USA 5 2
Jessica Lange USA 5 2
Elizabeth Taylor UK 5 2
Vivien Leigh UK 5 2
Emma Thompson UK 5 2
Gene Hackman USA 5 2
Christoph Waltz Germany 4 2
Shelley Winters USA 4 2
Mahershala Ali USA 2 2
Hilary Swank USA 2 2
Paul Newman USA 10 1

This query returns Drama actors who have won at least one Oscar, sorted by wins and then by total nominations to break ties.

When combining AND and OR in the same WHERE clause, use parentheses to make your intentions explicit. SQL evaluates AND before OR, which can produce surprising results if conditions are not grouped clearly.

-- Returns actors in Drama or Comedy who have at least one nomination
WHERE (primary_genre = 'Drama' OR primary_genre = 'Comedy')
  AND oscar_nominations > 0

3.5.7 Matching Patterns with LIKE

The LIKE operator filters text columns using simple patterns. Two wildcard characters are available:

  • % matches any sequence of zero or more characters.
  • _ matches exactly one character.

The following example finds all actors whose last names begin with H:

SELECT first_name, last_name, primary_genre, birth_country
FROM actors
WHERE last_name LIKE 'H%'
ORDER BY last_name;
14 records
first_name last_name primary_genre birth_country
Gene Hackman Drama USA
Tom Hanks Drama USA
Marcia Gay Harden Drama USA
Rex Harrison Musical UK
Salma Hayek Drama Mexico
Chris Hemsworth Action Australia
Katharine Hepburn Drama USA
Audrey Hepburn Romance UK
Charlton Heston Action USA
Dustin Hoffman Drama USA
Philip Hoffman Drama USA
Celeste Holm Comedy USA
Anthony Hopkins Drama UK
Isabelle Huppert Drama France

Lowercase 'h%' would return no results here because text matching with LIKE is case-sensitive, and no last names in this table begin with a lower-case h. ILIKE works identically to LIKE but ignores case. Replacing LIKE with ILIKE makes the same pattern work regardless of how the value is capitalized:

SELECT first_name, last_name, primary_genre, birth_country
FROM actors
WHERE last_name ILIKE 'h%'
ORDER BY last_name;
14 records
first_name last_name primary_genre birth_country
Gene Hackman Drama USA
Tom Hanks Drama USA
Marcia Gay Harden Drama USA
Rex Harrison Musical UK
Salma Hayek Drama Mexico
Chris Hemsworth Action Australia
Katharine Hepburn Drama USA
Audrey Hepburn Romance UK
Charlton Heston Action USA
Dustin Hoffman Drama USA
Philip Hoffman Drama USA
Celeste Holm Comedy USA
Anthony Hopkins Drama UK
Isabelle Huppert Drama France

Both queries return the same fourteen rows because the data happens to be consistently capitalized. The difference becomes meaningful when data entry is inconsistent — for example, if primary_genre values were stored as a mix of 'drama', 'Drama', and 'DRAMA', a LIKE filter for 'drama' would miss the capitalized variants while ILIKE would catch all of them.

With % wildcards on both sides of a search term, you can find text that occurs anywhere in a column. The following query finds actors whose birth names contain the word “Schwartz” or any other substring:

SELECT first_name, last_name, birth_name
FROM actors
WHERE birth_name LIKE '%Bernard%';
1 records
first_name last_name birth_name
Tony Curtis Bernard Schwartz

3.5.8 Matching a List of Values with IN

When you want to match any one of several specific values, IN is more readable than a chain of OR conditions.

SELECT first_name, last_name, birth_country, oscar_nominations
FROM actors
WHERE birth_country IN ('UK', 'Australia', 'Ireland')
ORDER BY birth_country, oscar_nominations DESC;
Displaying records 1 - 25
first_name last_name birth_country oscar_nominations
Cate Blanchett Australia 8
Nicole Kidman Australia 5
Geoffrey Rush Australia 4
Heath Ledger Australia 2
Hugh Jackman Australia 1
Toni Collette Australia 1
Eric Bana Australia 0
Chris Hemsworth Australia 0
Peter O’Toole Ireland 8
Saoirse Ronan Ireland 4
Laurence Olivier UK 10
Kate Winslet UK 7
Judi Dench UK 7
Richard Burton UK 7
Anthony Hopkins UK 6
Michael Caine UK 6
Daniel Day-Lewis UK 6
Vivien Leigh UK 5
Audrey Hepburn UK 5
Alec Guinness UK 5
Emma Thompson UK 5
Elizabeth Taylor UK 5
Albert Finney UK 5
Rex Harrison UK 5
Peter Sellers UK 3

3.5.9 Negating Conditions with NOT

The NOT keyword reverses the result of a condition. It can be placed in front of LIKE, IN, BETWEEN, and similar operators to express “everything except what this pattern matches.”

NOT LIKE excludes rows that match a pattern. The query below returns actors whose last names do not begin with a vowel:

SELECT first_name, last_name, primary_genre
FROM actors
WHERE last_name NOT LIKE 'A%'
  AND last_name NOT LIKE 'E%'
  AND last_name NOT LIKE 'I%'
  AND last_name NOT LIKE 'O%'
  AND last_name NOT LIKE 'U%'
ORDER BY last_name;
Displaying records 1 - 25
first_name last_name primary_genre
Amitabh Bachchan Drama
Kevin Bacon Drama
Eric Bana Action
Anne Bancroft Drama
Antonio Banderas Action
Javier Bardem Drama
Brigitte Bardot Romance
Angela Bassett Drama
Warren Beatty Drama
Monica Bellucci Action
Jean-Paul Belmondo Action
Roberto Benigni Comedy
Ingrid Bergman Drama
Halle Berry Action
Juliette Binoche Drama
Cate Blanchett Drama
Dirk Bogarde Drama
Humphrey Bogart Drama
Helena Bonham Carter Drama
Ernest Borgnine Drama
Marlon Brando Drama
Walter Brennan Drama
Sandra Bullock Comedy
Richard Burton Drama
Nicolas Cage Drama

NOT IN excludes rows whose column value matches any entry in the list:

SELECT first_name, last_name, birth_country, primary_genre
FROM actors
WHERE primary_genre NOT IN ('Drama', 'Action')
ORDER BY primary_genre, last_name;
Displaying records 1 - 25
first_name last_name birth_country primary_genre
Roberto Benigni Italy Comedy
Sandra Bullock USA Comedy
Tony Curtis USA Comedy
Doris Day USA Comedy
Whoopi Goldberg USA Comedy
Cary Grant UK Comedy
Celeste Holm USA Comedy
Diane Keaton USA Comedy
Jack Lemmon USA Comedy
Shirley MacLaine USA Comedy
Walter Matthau USA Comedy
Bette Midler USA Comedy
Marilyn Monroe USA Comedy
Thelma Ritter USA Comedy
Peter Sellers UK Comedy
Alastair Sim UK Comedy
Alberto Sordi Italy Comedy
Mira Sorvino USA Comedy
Emma Stone USA Comedy
Louis de Funès France Comedy
Toni Collette Australia Horror
Ruth Gordon USA Horror
Janet Leigh USA Horror
Julie Andrews UK Musical
Fred Astaire USA Musical

IS NOT NULL, introduced in the previous section, follows the same pattern and is technically a NOT form as well.

NOT can also be written as a prefix before a parenthesized condition:

WHERE NOT (primary_genre = 'Drama')

This is equivalent to WHERE primary_genre <> 'Drama'. The prefix form is more useful when negating a compound condition that would be awkward to rewrite with <> or other operators.

3.5.9.1 NOT IN and NULL Values

NOT IN behaves unexpectedly when NULL values are involved, and understanding this is important for using it safely.

When a NULL appears anywhere in the list passed to NOT IN, the condition returns no rows at all. To see this, try adding a NULL to an otherwise straightforward filter:

SELECT first_name, last_name, primary_genre
FROM actors
WHERE primary_genre NOT IN ('Drama', 'Action', NULL);
0 records
first_name last_name primary_genre

This returns zero rows despite the fact that most actors in the table are neither Drama nor Action performers. The reason comes down to how SQL evaluates NULL comparisons. The expression primary_genre NOT IN ('Drama', 'Action', NULL) expands internally to:

primary_genre <> 'Drama'
AND primary_genre <> 'Action'
AND primary_genre <> NULL

The third condition, primary_genre <> NULL, never evaluates to TRUE or FALSE. Any comparison with NULL produces NULL, meaning the result is unknown. When one part of an AND chain is unknown, the entire expression is unknown, and the row is excluded from the output. Because this applies to every row in the table, the query returns nothing.

A single NULL anywhere in the list passed to NOT IN silently causes the entire condition to return no rows. Because no error is raised, this problem can go unnoticed for a long time. When building a list for NOT IN, verify that it contains no NULL values. A later chapter covers how this same issue arises when the list comes from a subquery rather than a literal set of values, and introduces NOT EXISTS as the preferred alternative in those situations. Unlike NOT IN, NOT EXISTS handles NULL values correctly and does not produce this silent failure.

3.6 Sorting Results with ORDER BY

By default, a database returns rows in no particular order. The physical order of data in storage has nothing to do with alphabetical order, nomination count, or any other meaningful sequence. The ORDER BY clause lets you specify the order you want.

SELECT column1, column2
FROM table_name
ORDER BY column1 ASC;

ASC (ascending) is the default and can be omitted. Use DESC (descending) to reverse the order.

SELECT first_name, last_name, oscar_nominations, oscar_wins
FROM actors
ORDER BY oscar_nominations DESC;
Displaying records 1 - 25
first_name last_name oscar_nominations oscar_wins
Meryl Streep 21 3
Katharine Hepburn 12 4
Jack Nicholson 12 3
Laurence Olivier 10 1
Paul Newman 10 1
Bette Davis 10 2
Spencer Tracy 9 2
Denzel Washington 9 2
Al Pacino 9 1
Glenn Close 8 0
Cate Blanchett 8 2
Geraldine Page 8 1
Jack Lemmon 8 2
Marlon Brando 8 2
Robert De Niro 8 2
Peter O’Toole 8 0
Ingrid Bergman 7 3
Jane Fonda 7 2
Dustin Hoffman 7 2
Judi Dench 7 1
Richard Burton 7 0
Kate Winslet 7 1
Shirley MacLaine 6 1
Daniel Day-Lewis 6 3
Sissy Spacek 6 1

3.6.1 Sorting by Multiple Columns

You can sort by more than one column by listing them after ORDER BY, separated by commas. The database sorts by the first column, then uses the second column to break ties, then the third, and so on.

SELECT first_name, last_name, primary_genre, oscar_nominations
FROM actors
ORDER BY primary_genre ASC, oscar_nominations DESC;
Displaying records 1 - 25
first_name last_name primary_genre oscar_nominations
Clint Eastwood Action 4
Russell Crowe Action 3
Mel Gibson Action 2
Charlton Heston Action 1
Lee Marvin Action 1
Hugh Jackman Action 1
Sean Connery Action 1
Steve McQueen Action 1
Halle Berry Action 1
Telly Savalas Action 1
Priyanka Chopra Action 0
Chris Hemsworth Action 0
Jean Reno Action 0
Monica Bellucci Action 0
Toshiro Mifune Action 0
Idris Elba Action 0
Alain Delon Action 0
Roger Moore Action 0
Eric Bana Action 0
Jean-Paul Belmondo Action 0
Robert Mitchum Action 0
Antonio Banderas Action 0
Jack Lemmon Comedy 8
Thelma Ritter Comedy 6
Shirley MacLaine Comedy 6

This query first groups rows by genre alphabetically, then within each genre sorts actors from most nominated to least. The result lets you compare the most decorated actors within each genre at a glance.

3.7 Limiting Results with LIMIT

The LIMIT clause caps the number of rows returned by a query. It is most useful when combined with ORDER BY, because the two together let you answer “top N” questions: the ten most nominated actors, the five tallest actors, and so on.

SELECT first_name, last_name, primary_genre, oscar_nominations
FROM actors
ORDER BY oscar_nominations DESC
LIMIT 10;
10 records
first_name last_name primary_genre oscar_nominations
Meryl Streep Drama 21
Jack Nicholson Drama 12
Katharine Hepburn Drama 12
Laurence Olivier Drama 10
Paul Newman Drama 10
Bette Davis Drama 10
Al Pacino Drama 9
Denzel Washington Drama 9
Spencer Tracy Drama 9
Cate Blanchett Drama 8

Using LIMIT without ORDER BY returns an arbitrary subset of rows. Which rows you get depends on how the database happens to store and retrieve data at that moment, which can vary between runs. If you need a specific subset (such as the most nominated, tallest, or earliest-born), always pair LIMIT with an ORDER BY that defines what “first” means.

3.8 Putting It All Together

The clauses covered in this chapter can be combined in a single query. PostgreSQL evaluates them in a fixed logical order, regardless of how you arrange them in the text:

  1. FROM (identify the table)
  2. WHERE (filter rows)
  3. SELECT (compute and choose columns)
  4. ORDER BY (sort the result)
  5. LIMIT (cut the result to the requested number of rows)

The query below uses all of these clauses together to find the ten most Oscar-nominated actors born outside the United States, showing their full name, birth country, nomination total, and wins.

SELECT
    first_name || ' ' || last_name AS full_name,
    birth_country,
    oscar_nominations,
    oscar_wins
FROM actors
WHERE birth_country <> 'USA'
  AND death_date IS NULL
ORDER BY oscar_nominations DESC
LIMIT 10;
10 records
full_name birth_country oscar_nominations oscar_wins
Cate Blanchett Australia 8 2
Kate Winslet UK 7 1
Judi Dench UK 7 1
Daniel Day-Lewis UK 6 3
Michael Caine UK 6 2
Anthony Hopkins UK 6 1
Sophia Loren Italy 5 1
Emma Thompson UK 5 2
Nicole Kidman Australia 5 1
Geoffrey Rush Australia 4 1

Reading a query like this from top to bottom is natural once you understand what each clause contributes. As you move through subsequent chapters and your queries grow more complex, this structured way of thinking about each clause separately will continue to serve as a reliable foundation.

3.9 SQL as a Declarative Language

Now that you have a sense for how a basic SELECT statement is written, it is important to understand something fundamental about how SQL thinks, because it is different from how R and Python think.

If you have experience with either of those languages, you are accustomed to writing programs that specify a sequence of steps. You tell the computer exactly what to do and in what order: load this data, filter these rows, select these columns, sort the result. This style is called imperative programming, because you are issuing commands.

SQL is declarative. Rather than specifying steps, you describe the result you want and let the database engine figure out how to produce it.

To make the contrast concrete, consider retrieving Drama actors sorted by nomination count. In Python with pandas, you might write:

drama = df[df["primary_genre"] == "Drama"]
result = (
    drama[["first_name", "last_name", "oscar_nominations"]]
    .sort_values("oscar_nominations", ascending=False)
)

In R with dplyr:

result <- actors |>
  filter(primary_genre == "Drama") |>
  select(first_name, last_name, oscar_nominations) |>
  arrange(desc(oscar_nominations))

Both of these describe a sequence of operations. The order of the instructions matches the intended order of execution.

The equivalent SQL looks like this:

SELECT first_name, last_name, oscar_nominations
FROM actors
WHERE primary_genre = 'Drama'
ORDER BY oscar_nominations DESC;

The SQL query does not describe steps. It describes a result: “I want the first_name, last_name, and oscar_nominations columns, from the actors table, restricted to rows where primary_genre is 'Drama', sorted by oscar_nominations descending.” The database engine reads that description and decides for itself how to retrieve the data efficiently. It may use an index, reorder internal operations, or parallelize work across processors. You do not control those decisions, and most of the time you do not need to.

3.9.1 Set-Based Thinking

One practical consequence of SQL’s declarative nature is that it operates on sets of rows rather than individual records. When you write a WHERE clause, you are not writing a loop that examines each row in turn. You are describing a condition that applies to the entire table at once, and the database evaluates it however it finds most efficient.

This can feel unfamiliar at first. In Python or R, reaching for a loop or an apply-style function is instinctive. In SQL, you resist that instinct and instead describe what the complete result set should look like. Queries that try to mimic row-by-row processing are almost always slower and harder to read than queries that embrace the set-based model.

3.9.2 The Logical Order of Execution

There is one aspect of SQL’s declarative model that surprises many people coming from R or Python: the order in which you write a query’s clauses is not the order in which the database evaluates them.

A query is written as:

SELECT ...
FROM ...
WHERE ...
ORDER BY ...
LIMIT ...

But the database processes the clauses in this logical order:

  1. FROM (identify the source table)
  2. WHERE (filter rows)
  3. SELECT (compute and choose columns)
  4. ORDER BY (sort the result)
  5. LIMIT (trim to the requested number of rows)

The most common practical consequence of this is that column aliases defined in SELECT cannot be referenced in a WHERE clause, because WHERE runs before SELECT. The following query fails for exactly that reason:

-- This fails: 'height_in' is not defined when WHERE runs
SELECT ROUND(height_cm / 2.54::numeric, 1) AS height_in
FROM actors
WHERE height_in > 72;

You have to repeat the expression in WHERE instead:

SELECT ROUND(height_cm / 2.54::numeric, 1) AS height_in
FROM actors
WHERE ROUND(height_cm / 2.54::numeric, 1) > 72;

ORDER BY is an exception: most databases, including PostgreSQL, allow ORDER BY to reference aliases defined in SELECT, because ORDER BY is evaluated after SELECT. That is why ORDER BY height_in would work in the query above even though WHERE height_in does not.

Keeping this logical order in mind will make otherwise cryptic error messages much easier to diagnose as your queries grow more complex.

3.10 Exercises

3.10.1 Reflection

  1. SQL is described in this chapter as a “declarative” programming language, while R and Python are “imperative.” In your own words, explain what that distinction means. How does it change the way you approach writing a query compared to writing a data pipeline in R or Python?

  2. A colleague writes the following query hoping to find all actors who are over six feet tall (approximately 183 cm), but PostgreSQL returns an error. Identify the problem and explain how to fix it.

    SELECT first_name, last_name, ROUND(height_cm / 2.54::numeric, 1) AS height_in
    FROM actors
    WHERE height_in > 72;
  3. Explain the difference between COUNT(*) and COUNT(birth_name) as they apply to the actors table. Under what circumstances would the two expressions return different values?

  4. A classmate wants to retrieve all actors for whom no birth name has been recorded. They write the query below, but it always returns zero rows. Why? What should they write instead?

    SELECT first_name, last_name
    FROM actors
    WHERE birth_name = NULL;

3.10.2 Coding

For each exercise below, write a SQL query against the actors database.

  1. Write a query that retrieves the first name, last name, birth country, and primary genre for every actor in the table. Sort the results alphabetically by last name.

  2. Write a query that returns the full name (first and last name combined into a single column called full_name), birth country, and Oscar nomination count for all actors born in the United Kingdom, sorted from most to least nominated.

  3. Write a query that shows only the distinct birth countries represented in the actors table, sorted alphabetically. Then write a second query using COUNT(DISTINCT birth_country) to confirm the total number of unique countries.

  4. Write a query that returns the first name, last name, height in centimeters, and height converted to inches (rounded to one decimal place, named height_in) for all actors over 185 cm tall. Sort from tallest to shortest.

  5. Write a query that returns the first name, last name, primary genre, Oscar nominations, and Oscar wins for all actors who meet every one of the following conditions: they are still living (no death date recorded), they have received at least three Oscar nominations, and they were born in the USA. Sort the results by Oscar nominations in descending order.