What is Trino?


Trino is an open source SQL engine built for speed and flexibility. Instead of storing data itself, Trino coordinates workers that read from external systems. Compute scales independently so queries finish quickly even against huge datasets.

How it works

The cluster is composed of a single coordinator that plans queries and many workers that process data in parallel. Connectors expose catalogs of tables from disparate sources. During planning, the coordinator breaks a query into stages that workers execute across the cluster. Even expensive statements like a CROSS JOIN are split so each worker handles a slice of the intermediate data.

flowchart TB
    subgraph Cluster
        C[Coordinator]
        W1[Worker 1]
        W2[Worker 2]
        W3[Worker 3]
    end
    C --> W1
    C --> W2
    C --> W3

Adding or removing workers lets you adjust capacity without moving the data itself.

Accessing open table formats

Connectors allow Trino to read from data lakes and warehouses alike. Some of the most common connectors include:

  • Amazon S3 for object storage
  • Snowflake for cloud warehouses
  • Kafka for streaming data
  • PostgreSQL for relational workloads
  • SQL Server for enterprise databases
  • Oracle for large transactional systems

These and many others let a single query span multiple systems. When a query references an Iceberg or Delta Lake table, workers fetch Parquet files directly from object storage while the coordinator ensures the correct snapshot and partitions are used.

flowchart LR
    Query --> C2[Coordinator]
    C2 -->|Plan| Workers2[Workers]
    Workers2 -->|Iceberg / Delta| Storage[(Object Storage)]

This approach keeps data in open formats so other engines can share it.

Curious to dig deeper? Read the overview below:

From the depths, The Nudibranches crew.