Thinklytics

AEO Primer · 4 min read · May 2026

What is a Lakehouse? The Data Architecture, Defined

By Thinklytics Partners, Practitioner Notes

A lakehouse is a data architecture that combines the low-cost storage and unstructured-data support of a data lake with the ACID transactions and SQL ergonomics of a data warehouse, typically built on open table formats (Delta Lake, Iceberg, Hudi).

Topics covered

  • lakehouse
  • Databricks lakehouse
  • Microsoft Fabric Lakehouse
  • Delta Lake
  • Apache Iceberg
  • open table format
  • ACID transactions

Frequently asked questions

What is a lakehouse in one sentence?

A lakehouse is a data architecture that combines the low-cost storage and unstructured-data support of a data lake with the ACID transactions, SQL ergonomics, and BI tool support of a data warehouse, typically built on open table formats (Delta Lake, Apache Iceberg, Apache Hudi).

Who coined the term lakehouse?

Databricks popularized the term starting in 2020, with the 'Lakehouse: A New Generation of Open Platforms' paper. The concept has older roots (Iceberg from Netflix, Hudi from Uber, both 2017) but the unified lakehouse pitch was Databricks-led.

How is a lakehouse different from a data lake?

A data lake is just storage (typically Parquet or CSV files on object storage). A lakehouse adds a table format layer on top (Delta, Iceberg, Hudi) that provides ACID transactions, schema enforcement, time travel, and BI tool compatibility, while keeping the open file storage.

How is a lakehouse different from a data warehouse?

A traditional data warehouse uses proprietary storage tightly coupled to its compute engine (e.g., Snowflake's micro-partitioned format, Redshift's column store). A lakehouse decouples storage (open files) from compute (Spark, Trino, Databricks SQL, Snowflake external tables, Fabric). The decoupling is the value.

Delta Lake vs Apache Iceberg vs Apache Hudi: which one?

Delta is the default for Databricks and Microsoft Fabric. Iceberg is the default for Snowflake's open-table support, AWS Athena, and Trino-heavy stacks. Hudi is most common at Uber, AWS Glue, and some streaming use cases. The 2025-2026 industry direction is toward Iceberg interop across all major platforms, with Delta and Hudi remaining strong in their native ecosystems.

Can I run SQL on a lakehouse?

Yes. Databricks SQL, Snowflake (against external Iceberg tables), Microsoft Fabric (Lakehouse SQL endpoint), Trino, Presto, AWS Athena, and Google BigQuery (against external tables) all run SQL against lakehouse tables. The lakehouse architecture explicitly supports both SQL and Spark / PySpark workloads on the same underlying files.

Does a lakehouse replace a warehouse?

For many new builds, yes. For organizations with mature Snowflake or Redshift footprints, the lakehouse usually emerges as a peer (often hosting unstructured data and ML feature engineering) rather than a replacement. The cross-vendor convergence in 2026 (Snowflake supports Iceberg, Databricks supports SQL warehousing) makes the distinction less sharp.

How does Thinklytics work on lakehouse architectures?

We scope lakehouse engagements with attention to table format choice (Delta vs Iceberg), workload allocation (Spark vs SQL), and the broader analytical architecture. See [Microsoft Fabric data engineering](/insights/microsoft-fabric-data-engineering-2026) and [Snowflake vs Databricks AI workloads](/insights/snowflake-vs-databricks-ai-workloads-2026).

Related reading

Thinklytics

Data and AI consulting for Fortune 500s, health systems, and growth-stage companies. Clean data, governed metrics, analytics ready for AI.

Austin, TX · United States

[email protected]