Skip to main content

Connect Onehouse

dbt supports connecting to Onehouse SQL using the Apache Spark Connector with the Thrift method.

note

Connect to a Onehouse SQL Cluster with the dbt-spark adapter.**

Requirements

  • For dbt, ensure your Onehouse SQL endpoint is accessible via external DNS/IP, whitelisting dbt IPs.

What works

  • All dbt Commands, including: dbt clean, dbt compile, dbt debug, dbt seed, and dbt run.
  • dbt materializations: table and incremental
  • Apache Hudi table types of Merge on Read (MoR) and Copy on Write (CoW). It is recommended to use MoR for mutable workloads.

Limitations

  • Views are not supported
  • dbt seed has row / record limits.
  • dbt seed only supports Copy on Write tables.

dbt connection

Fill in the following fields when creating an Apache Spark warehouse connection using the Thrift connection method:

FieldDescriptionExamples
MethodThe method for connecting to SparkThrift
HostnameThe hostname of your Onehouse SQL Cluster endpointyourProject.sparkHost.com
PortThe port to connect to Spark on10000
ClusterOnehouse does not use this field
Connection TimeoutNumber of seconds after which to timeout a connection10
Connection RetriesNumber of times to attempt connecting to cluster before failing0
OrganizationOnehouse does not use this field
UserOptional. Not enabled by default.dbt_cloud_user
AuthOptional, supply if using Kerberos. Not enabled by default.KERBEROS
Kerberos Service NameOptional, supply if using Kerberos. Not enabled by default.hive
Onehouse configurationOnehouse configuration

dbt project

We recommend that you set default configurations on the dbt_project.yml to ensure that the adapter executes with Onehouse compatible sql

FieldDescriptionRequiredDefaultRecommended
materializedmaterialization the project/directory will default toYeswithout input, viewtable
file_formattable format the project will default toYesN/Ahudi
location_rootLocation of the database in DFSYesN/A<your_database_location_dfs>
hoodie.table.typeMerge on Read or Copy on WriteNocowmor

dbt_project.yml template

      +materialized: table | incremental
+file_format: hudi
+location_root: <storage_uri>
+tblproperties:
hoodie.table.type: mor | cow

A dbt_project.yml example if using jaffle shop would be

models:
jaffle_shop:
+file_format: hudi
+location_root: s3://lakehouse/demolake/dbt_ecomm/
+tblproperties:
hoodie.table.type: mor
staging:
+materialized: incremental
marts:
+materialized: table
0