PostgreSQL

07 Feb 2023 - New York

Contents
  1. Comparing joins and subqueries
  2. Comparing transient tables and queries
  3. Analyzing query optimizer
  4. Table partitioning
  5. Window functions
  6. Fetching with offsets and frames
  7. General SQL heuristics
    1. Indexes
    2. Query Lifecycle
    3. Query Planning Order of Operations

Comparing joins and subqueries

  • In SELECT and WHERE clauses: Equivalent once the query optimizer gets done with them
  • In FROM clause: JOIN recommended over subquerying

Comparing transient tables and queries

  • CTEs: auxiliary statement for use in a larger query, exist for a single query
  • Temporary tables: Dropped at the end of a session
  • Views: A stored query
  • Materialized Views: Stored results of a query view, can be configured to refresh periodically

Analyzing query optimizer

  • EXPLAIN ANALYZE: Executes the query being EXPLAIN-ed, providing actual time instead of cost estimates. Careful with mutating queries.

Table partitioning

Partitioning: splitting what is logically one large table into smaller physical pieces.

  • Range Partitioning
  • List Partitioning
  • Hash Partitioning

Window functions

SELECT year, ROW_NUMBER() OVER (ORDER BY year DESC) AS seq_num
FROM -- ...
-- year | seq_num
-- 1999 | 1
-- 1999 | 2
-- 1998 | 3
-- 1997 | 4

SELECT year, ROW_NUMBER() OVER (PARTITION BY year DESC) AS seq_num
FROM -- ...
-- year | seq_num
-- 1999 | 1
-- 1999 | 2
-- 1998 | 1
-- 1997 | 1

SELECT gender, year, champion,
    LAG(champion) OVER (PARTITION BY gender ORDER BY year DESC)
    AS last_champion
FROM -- ...
-- gender | year | champion | last_champion
-- male   | 1990 |  CZE     |  null
-- male   | 1990 |  NOR     |  CZE
-- female | 1990 |  NOR     |  null
-- female | 1990 |  GYE     |  NOR

SELECT SUM(medals) OVER (PARTITION BY country)
SELECT COUNT(medals) OVER (PARTITION BY country)
SELECT AVG(medals) OVER (PARTITION BY country)

Fetching with offsets and frames

  • LAG(column, n), LEAD(column, n)
  • FIRST_VALUE(column), LAST_VALUE(column)
  • RANGE BETWEEN [start] AND [finish]
    • RANGE ignores duplicates in the ORDER BY subclause of OVER
  • ROWS BETWEEN [start] AND [finish]
    • n PRECEDING, CURRENT ROW, n FOLLOWING
    • UNBOUNDED PRECEDING
    • UNBOUNDED FOLLOWING

Example

SELECT
    year,
    medals,
    MAX(medals) OVER (
        ORDER BY year ASC
    ) AS max_medals,
    MAX(medals) OVER (
        ORDER BY year ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS max_medals_thus_far

General SQL heuristics

Indexes

Apply to:

  • Large tables
  • Frequently used filtering conditions
  • Primary key, foreign keys

Don’t apply to:

  • Small tables
  • Columns with a high number of nulls
  • Frequently updated tables

Query Lifecycle

  1. Parser
  2. Planner & Optimizer
  3. Executor

Query Planning Order of Operations

ClauseDelimits
1FROMtable
2WHERErows
3GROUP BYcolumns
4SUM, etc [aggs]rows
5SELECTcolumns
6DISTINCTrows
7ORDER BY-
8LIMITrows