16  Creating Tables

16.1 Learning Objectives

By the end of this chapter, students will be able to:

  • Write CREATE TABLE statements with appropriate column names, data types, and inline constraints.
  • Apply NOT NULL, UNIQUE, DEFAULT, and CHECK constraints when defining columns.
  • Use ALTER TABLE to add, drop, or modify columns and constraints on an existing table.
  • Write DROP TABLE and TRUNCATE TABLE and explain the difference in behavior and reversibility.
  • Use CREATE TABLE AS SELECT to persist query results as a new permanent table.
  • Define an auto-incrementing primary key column using GENERATED ALWAYS AS IDENTITY.
  • Distinguish between a temporary table and a permanent table and identify appropriate use cases for each.

16.2 From Reading to Writing

Every query in this book so far has been a SELECT statement: reading data from tables that already exist. Creating and modifying tables is the domain of Data Definition Language (DDL), a subset of SQL distinct from the SELECT, INSERT, UPDATE, and DELETE statements that make up Data Manipulation Language (DML).

This chapter introduces CREATE TABLE, the fundamental DDL statement, in the context of building a university database from scratch. By the end, you will know how to create a database, define a table with appropriate constraints, prevent duplicate or invalid data at the schema level, and modify a table’s structure after the fact. The next chapter extends this foundation to multi-table schemas with foreign keys.

16.3 Creating a Database

In PostgreSQL, a database is the top-level container: an isolated environment with its own tables, schemas, users, and settings. A running server can host multiple databases simultaneously. The worldbank database you have been querying throughout this book is one such database on your local Docker container. Here, you will create a second one.

CREATE DATABASE university;

CREATE DATABASE must be run while connected to a different database — you cannot create a database while you are connected to it. The standard approach is to connect to the built-in postgres maintenance database first:

psql -U postgres -p 5433 -d postgres
postgres=# CREATE DATABASE university;
postgres=# \c university
university=#

Once connected to university, every CREATE TABLE statement you run will place its tables there, completely separate from worldbank.

16.4 Schemas

Within a database, a schema is a named namespace that groups related objects — tables, views, functions — together. Think of a database as a building and schemas as the floors inside it.

Every PostgreSQL database starts with a schema called public. When you create a table without specifying a schema, it lands in public by default:

-- These two are equivalent
CREATE TABLE students (...);
CREATE TABLE public.students (...);

You can create additional schemas to separate concerns — for example, a registrar schema for enrollment data and an finance schema for billing tables. Tables outside the default schema are referenced with a schema-qualified name: schema_name.table_name. The search_path setting controls which schemas PostgreSQL searches when you use an unqualified name.

For the exercises in this chapter, the public schema of the university database is sufficient. Schemas become more important when a single database serves multiple applications or teams — a pattern covered in the chapter on privileges.

16.5 Basic Syntax

CREATE TABLE table_name (
    column_name  data_type  [column_constraints],
    column_name  data_type  [column_constraints],
    ...
    [table_constraints]
);

Each column is defined by its name, its data type (covered in the data types chapter), and any constraints that apply to it. Constraints tell the database what values are and are not acceptable, enforcing business rules at the storage level regardless of which application or user touches the data.

16.6 Column Constraints

16.6.1 NOT NULL

NOT NULL prevents a column from ever holding a NULL value. It is the most common constraint after the primary key.

CREATE TABLE students (
    student_id   SERIAL  PRIMARY KEY,
    first_name   TEXT    NOT NULL,
    last_name    TEXT    NOT NULL
);

Any INSERT or UPDATE that leaves first_name or last_name as NULL will be rejected immediately by the database.

16.6.2 DEFAULT

A DEFAULT value is substituted automatically when an INSERT statement omits the column entirely.

CREATE TABLE students (
    student_id      SERIAL   PRIMARY KEY,
    first_name      TEXT     NOT NULL,
    last_name       TEXT     NOT NULL,
    enrolled_on     DATE     NOT NULL DEFAULT CURRENT_DATE,
    is_active       BOOLEAN  NOT NULL DEFAULT TRUE
);

If a new row is inserted without specifying enrolled_on, the database records the current date. A default is not the same as NOT NULL: a column can have a default and still accept an explicitly supplied NULL, unless NOT NULL is also present.

16.6.3 UNIQUE

UNIQUE ensures no two rows share the same value in that column. A student’s email address is a natural candidate — each address should appear at most once.

CREATE TABLE students (
    student_id   SERIAL  PRIMARY KEY,
    first_name   TEXT    NOT NULL,
    last_name    TEXT    NOT NULL,
    email        TEXT    NOT NULL UNIQUE,
    enrolled_on  DATE    NOT NULL DEFAULT CURRENT_DATE,
    is_active    BOOLEAN NOT NULL DEFAULT TRUE
);

PRIMARY KEY already implies both uniqueness and non-nullability, so you do not add UNIQUE or NOT NULL to a primary key column separately.

PostgreSQL permits multiple NULL values in a UNIQUE column because NULL is not considered equal to NULL — no uniqueness violation occurs. If a non-key column must be both unique and present in every row, declare both UNIQUE and NOT NULL.

16.6.4 CHECK

A CHECK constraint specifies a boolean condition that every row must satisfy. GPA values, for example, should fall within a defined range:

CREATE TABLE students (
    student_id   SERIAL        PRIMARY KEY,
    first_name   TEXT          NOT NULL,
    last_name    TEXT          NOT NULL,
    email        TEXT          NOT NULL UNIQUE,
    enrolled_on  DATE          NOT NULL DEFAULT CURRENT_DATE,
    is_active    BOOLEAN       NOT NULL DEFAULT TRUE,
    gpa          NUMERIC(3, 2) CHECK (gpa IS NULL OR (gpa >= 0.0 AND gpa <= 4.0))
);

CHECK constraints are evaluated on every INSERT and UPDATE. The gpa IS NULL clause is necessary because a CHECK constraint only fails when it evaluates to FALSE — a NULL result is treated as passing. Allowing NULL here is intentional: a newly enrolled student may not yet have a GPA on record.

16.7 Primary Keys

The keys and integrity chapter introduced primary keys as a design concept. In SQL, the PRIMARY KEY constraint is how you enforce that concept.

16.7.1 Single-Column Primary Key

The simplest form declares PRIMARY KEY inline with the column definition:

CREATE TABLE students (
    student_id  SERIAL  PRIMARY KEY,
    ...
);

PRIMARY KEY implies both NOT NULL and UNIQUE, so neither needs to be written separately. SERIAL is a PostgreSQL convenience type that creates an auto-incrementing integer column backed by a sequence — each new row automatically receives the next integer if no value is supplied. This is the standard way to create a surrogate primary key.

SERIAL is PostgreSQL-specific. The SQL standard equivalent, supported in PostgreSQL 10 and later, is:

student_id  INTEGER  GENERATED ALWAYS AS IDENTITY  PRIMARY KEY

Both accomplish the same thing. SERIAL is still common in documentation and existing codebases.

16.7.2 Composite Primary Keys

When no single column uniquely identifies a row, two or more columns can serve as the primary key together. A composite key is written as a table-level constraint after the column definitions:

CREATE TABLE enrollments (
    student_id  INTEGER  NOT NULL,
    course_id   INTEGER  NOT NULL,
    semester    TEXT     NOT NULL,
    PRIMARY KEY (student_id, course_id, semester)
);

Inline PRIMARY KEY syntax cannot express a composite key. The full multi-table schema, including this enrollments table and the foreign keys that connect it to students and courses, is built out in the next chapter.

16.8 Putting It Together: The Students Table

Here is the complete students table definition, bringing all the constraints above into a single statement:

CREATE TABLE students (
    student_id   SERIAL        PRIMARY KEY,
    first_name   TEXT          NOT NULL,
    last_name    TEXT          NOT NULL,
    email        TEXT          NOT NULL UNIQUE,
    enrolled_on  DATE          NOT NULL DEFAULT CURRENT_DATE,
    is_active    BOOLEAN       NOT NULL DEFAULT TRUE,
    gpa          NUMERIC(3, 2) CHECK (gpa IS NULL OR (gpa >= 0.0 AND gpa <= 4.0))
)

Insert a few rows to verify the table works:

INSERT INTO students (first_name, last_name, email, enrolled_on, gpa)
VALUES
    ('Maria',  'Santos',  '[email protected]',  '2023-08-28', 3.85),
    ('James',  'Okafor',  '[email protected]',  '2023-08-28', 3.41),
    ('Priya',  'Nair',    '[email protected]',    '2024-01-15', NULL),
    ('Connor', 'Walsh',   '[email protected]',   '2024-01-15', 2.97)
SELECT * FROM students
4 records
student_id first_name last_name email enrolled_on is_active gpa
1 Maria Santos [email protected] 2023-08-28 TRUE 3.85
2 James Okafor [email protected] 2023-08-28 TRUE 3.41
3 Priya Nair [email protected] 2024-01-15 TRUE NULL
4 Connor Walsh [email protected] 2024-01-15 TRUE 2.97

Priya’s GPA is NULL — she enrolled mid-year and has not yet completed any graded coursework. The CHECK constraint permits this because it was written to allow NULLs explicitly.

16.9 CREATE TABLE IF NOT EXISTS

Running CREATE TABLE against a table that already exists produces an error. The IF NOT EXISTS modifier suppresses that error, making the statement safe to run in a script that may execute more than once:

CREATE TABLE IF NOT EXISTS students (
    student_id   SERIAL        PRIMARY KEY,
    first_name   TEXT          NOT NULL,
    last_name    TEXT          NOT NULL,
    email        TEXT          NOT NULL UNIQUE,
    enrolled_on  DATE          NOT NULL DEFAULT CURRENT_DATE,
    is_active    BOOLEAN       NOT NULL DEFAULT TRUE,
    gpa          NUMERIC(3, 2) CHECK (gpa IS NULL OR (gpa >= 0.0 AND gpa <= 4.0))
);

If students already exists, this statement does nothing and raises no error. This is the pattern used in setup scripts and database migrations.

16.10 Removing Tables

DROP TABLE removes a table and all of its data permanently:

DROP TABLE students;

DROP TABLE IF EXISTS is the complement to CREATE TABLE IF NOT EXISTS — it does nothing if the table does not exist rather than raising an error:

DROP TABLE IF EXISTS students;

When a table has foreign keys pointing to it from other tables, dropping it requires either dropping the dependent tables first or using CASCADE to remove the referencing foreign key constraints automatically:

DROP TABLE students CASCADE;

CASCADE removes the constraint definitions in the dependent tables; it does not drop those tables themselves. Use it carefully — in a complex schema it can remove constraints you did not intend to touch.

16.11 Modifying Existing Tables

After a table is created, ALTER TABLE can change its structure without dropping and recreating it.

Add a column:

ALTER TABLE students
ADD COLUMN major TEXT;

New columns are added with NULL in every existing row unless a DEFAULT is specified. If the new column is NOT NULL, you must supply a DEFAULT so the database has a value to back-fill:

ALTER TABLE students
ADD COLUMN major TEXT NOT NULL DEFAULT 'Undeclared';

Change a column’s data type:

ALTER TABLE students
ALTER COLUMN major TYPE VARCHAR(100);

The cast must be valid for all existing values. Changing TEXT to VARCHAR(100) succeeds; changing TEXT to INTEGER would fail unless every value can be parsed as a number.

Add a constraint to an existing column:

ALTER TABLE students
ADD CONSTRAINT students_email_lower
    CHECK (email = lower(email));

Adding a CHECK constraint to a populated table causes PostgreSQL to validate every existing row. If any row fails the check, the statement is rejected.

Remove a column:

ALTER TABLE students
DROP COLUMN major;

ALTER TABLE modifies the live table immediately. On large tables, some operations — particularly adding a NOT NULL constraint without a default, or building a new index — can lock the table and block reads and writes for seconds or longer. In a learning environment the tables are small enough that this does not matter, but it is worth knowing before working on production systems.

16.12 Exercises

1. Connect to your university database and run \dt in psql to list its tables. Confirm that students exists. Then query information_schema.columns to display the column names, data types, and nullability for the students table.


2. Insert a row into students that violates the gpa CHECK constraint (for example, a GPA of 5.0). Observe the error message. Then insert a row with a NULL GPA and confirm it succeeds.


3. Insert two students with the same email address and observe the error. Explain why this constraint belongs in the database rather than only in the application that writes to it.


4. Add a phone column of type TEXT to the students table, allowing NULL. Then alter the column to add a CHECK constraint that requires the value to be at least 10 characters long (using char_length(phone) >= 10), or NULL. Test the constraint with a valid and invalid value.


5. Write a CREATE TABLE IF NOT EXISTS statement for a courses table with the following rules:

  • A natural primary key using a course code such as 'CS101' (not a surrogate integer).
  • A title that is required.
  • A credit hours column that must be a positive integer.
  • A department name that defaults to 'General'.

Run the statement twice and confirm the second execution produces no error.


6. Drop the students table, then recreate it using CREATE TABLE IF NOT EXISTS. Verify that the data is gone after the drop and that the table structure is restored after the recreate.