Glossary

A quick reference for common terms used in modern data platforms.

Lakehouse

A lakehouse merges the flexible storage of data lakes with the transactional reliability of warehouses. By storing files in open formats and layering transactional features on top, it enables analytics engines to work off the same shared tables. This approach keeps costs low while supporting batch and streaming workloads side by side. Read more →

Trino

Trino is a high‑performance distributed SQL engine. A single coordinator plans queries that workers run in parallel across many catalogs. Built‑in connectors such as Amazon S3, Snowflake, Kafka, PostgreSQL, SQL Server, and Oracle let it join data from lakes and databases alike. Even cross joins scale across the cluster. Read more →

Kubernetes

Kubernetes is an open‑source platform for orchestrating and managing containerized applications. Beyond simply deploying containers, it handles service discovery, rollout strategies, and resource scheduling. Many data teams rely on Kubernetes to scale Spark jobs, host Trino clusters, and manage the services that power a lakehouse architecture.