Databricks adapter behavior changes
The following are the current behavior change flags that are specific to dbt-databricks
:
Flag | dbt-databricks : Intro | dbt-databricks : Maturity |
---|---|---|
use_info_schema_for_columns | 1.9.0 | TBD |
use_user_folder_for_python | 1.9.0 | TBD |
use_materialization_v2 | 1.10.0 | TBD |
Use information schema for columns
The use_info_schema_for_columns
flag is False
by default.
Setting this flag to True
will use information_schema
rather than describe extended
to get column metadata for Unity Catalog tables. This setting helps you avoid issues where describe extended
truncates information when the type is a complex struct. However, this setting is not yet the default behavior, as there are performance impacts due to a Databricks metadata limitation because of the need to run REPAIR TABLE {{relation}} SYNC METADATA
before querying to ensure the information_schema
is complete.
Please note that there is no equivalent option for views at this time which means dbt will still need to use describe extended
for views.
This flag may become default behavior in the future, depending on how information_schema
changes.
If your complex type comes from processing JSON using from_json
, you have an alternative: use parse_json
to create the column as the variant
type.
Depending on how you intend to query or further process the data, the variant
type might be a reasonable alternative in terms of performance, while not suffering from the issue of type truncation in metadata queries.
Use user's folder for Python model notebooks
The use_user_folder_for_python
flag is False
by default and results in writing uploaded python model notebooks to /Shared/dbt_python_models/{{schema}}/
. Setting this flag to True
will write notebooks to /Users/{{current user}}/{{catalog}}/{{schema}}/
Writing to the Shared
folder is deprecated by Databricks as it does not align with governance best practices.
We plan to switch the default of this flag to True
in v1.11.0.
Use restructured materializations
The use_materialization_v2
flag is False
by default and guards significant rewrites of the core materializations in dbt-databricks
while they are still in an experimental stage.
When set to True
, dbt-databricks
uses the updated logic for all model types (views, tables, incremental, seeds). It also enables additional, optional config options for more fine-tuned control:
view_update_via_alter
— When enabled, this config attempts to update the view in place using alter view, instead of using create or replace to replace it.use_safer_relation_operation
— When enabled (and ifview_update_via_alter
isn't set), this config makes dbt model updates more safe by staging relations and using rename operations to ensure the live version of the table or view is not disrupted by failures.
These configs aren't required to receive the core benefits of this flag — like better performance and column/constraint functionality — but they are gated behind the flag because they introduce more significant changes to how materializations behave.
We plan to switch the default of this flag to True
in 1.11.0, depending on user feedback.
Changes to the Seed materialization
The seeds materialization should have the smallest difference between the old and new materialization, as the primary difference is just removing calls to methods that are not supported by Databricks, such as transaction operations.
Changes to the View materialization
With the use_materialization_v2
flag set to True
, there are two model configuration options that can customize how we handle the view materialization when we detect an existing relation at the target location.
view_update_via_alter
— Updates the view in place using alter view, instead of using create or replace to replace it. This allows continuity of history for the view, keeps the metadata, and helps with Unity Catalog compatibility. Here's an example of how to configure this:
version: 2
models:
- name: market_summary
config:
materialized: view
view_update_via_alter: true
columns:
- name: country
tests:
- unique
- not_null
...
As such, we must replace the view whenever you change its description
use_safer_relation_operations
— When enabled (and ifview_update_via_alter
isn't set), this config makes dbt model updates more safe by creating a new relation in a staging location, swapping it with the existing relation, and deleting the old relation afterward. The following example shows how to configure this:
version: 2
models:
- name: market_summary
config:
materialized: view
use_safer_relation_operations: true
columns:
- name: country
tests:
- unique
- not_null
...
While this approach is equivalent to the default dbt view materialization, it will create additional UC objects, as compared to alternatives. Since this config does not use atomic 'create or replace...' for any materialization, the history of the object in Unity Catalog may not behave as you expect. Consider carefully before using this model config broadly.
Changes to the Table materialization
As with views, these materialization changes could increase costs. More temporary objects are used, consistent with other dbt adapters' materializations. We consider these changes experimental in part because we do not have enough data quantifying the price impact of this change. The benefits though are improvements in performance, safety, and unblocking features that cannot be delivered with the existing materialization.
When use_materialization_v2
is set to True
, all materialization paths are updated. The key change is that table creation is separated from inserting rows into the table. This separation greatly improves performance for setting table comments, since adding comments at create time is faster than using separate alter table
statements. It also resolves compatibility issues in Databricks, where creating and inserting in one step prevents setting comments.
Additionally, this change makes it possible to support other column features — like column-level masks — that aren’t compatible with inserting data during creation. While these features aren’t included in version 1.10.0, they can now be added in future releases.
Constraints
For several feature releases now, dbt-databricks supported both dbt's constraints implementation and our own alternative, earlier version called persist_constraints
. With the use_materialization_v2
flag, we're beginning to deprecate persist_constraints
and shifting fully to dbt's native constraint support.
One new enhancement is support for the expression
field on primary and foreign keys, which lets you pass additional Databricks options — like using RELY
to tell the Databricks optimizer that it may exploit the constraint to rewrite queries.
Separating create
and insert
also changes how constraints behave. Previously, we would create a table with data and then apply constraints. If the new data violated a constraint, the run would fail — but by then, it had already replaced the valid table from the previous run.
As with views, you can select between performance and safety with the use_safer_relation_operations
flag, but regardless of setting, the new materialization approach ensures constraint violations don't make it into the target table.
use_safer_relation_operations
When using this model configuration with tables, we first create a staging table. After successfully inserting data into the table, we rename it to replace the target materialization. Since Databricks doesn’t support rollbacks, this is a safer approach — if something fails before the rename, the original table stays intact. That gives you time to troubleshoot without worrying that exposures or work streams relying on that table are broken in the mean time.
If this config is set to false
(the default), the target table will still never contain constraint-violating data, but it might end up empty if the insert fails due to the constraint. The key difference is whether we replace the target directly or use a staging-and-rename approach.
As with views, there is a cost to using additional temporary objects, in the form of creating more UC objects with their own history. Consider carefully whether you need this behavior.
Changes to the Incremental materialization
All the changes made to the Table materialization section also apply to Incremental materializations.
We’ve also added a new config: incremental_apply_config_changes
.
This config lets you control whether dbt should apply changes to things like tags
, tblproperties
, and comments during incremental runs. Many users wanted the capability to configure table metadata in Databricks — like AI-generated comments — without dbt overwriting them. Previously, dbt-databricks always applied detected changes during incremental runs.
With the V2 materialization, you can now set incremental_apply_config_changes
to False
to stop that behavior. (It defaults to True
to match the previous behavior.)
The following example shows how to configure this:
version: 2
models:
- name: incremental_market_updates
config:
materialized: incremental
incremental_apply_config_changes: false
...