Three dbt Macros I Use Every Day

Noah Kennedy
5 min readJul 20, 2023

--

Last month my team here at Tempus built out a new data mart using our normal tech stack of SQL + AirFlow + dbt in GCP, and I realized that a lot of macros I had relied heavily on in previous data marts hadn’t yet been implemented. In this post, I’ll outline two important macros that accelerate my development process and a Jinja framework I use regularly to make dbt vastly more powerful.

Photo by Gildardo RH on Unsplash

Introduction — What are Macros?

To start, here is dbt’s definition of a macro: pieces of Jinja code that can be reused multiple times.

In dbt, macros are analogous to functions. The recommended way of using macros is by defining them in the dbt/macros folder of your dbt repository. Macros allow you to reuse common components, kind of similar to using a CTE, while also enabling functionality that isn’t otherwise possible.

Here’s a super simple example macro that adds two columns together —

Example macro which takes two input columns and sums them together.

And the use case of how I would reference this macro —

This SQL model creates a table with one value: 3.

A few important points:

  • Your macro file name doesn’t matter, and it doesn’t need to match your macro. In fact, you can define more than one macro within a file!
  • Notice how macros in dbt don’t need to return anything — that’s because a macro operates similarly to a CTE: instead of doing a calculation and then returning a value, macros are compiled into SQL at compile time.
  • When you’re defining a macro, variables need to be used within curly braces {{here}}, otherwise, dbt thinks you’re just referring to SQL rather than using a variable.

Macro #1 — Date Comparison

Some of the data I work with is Time Series data, which has certain nuances and complexities. I use dbt macros to simplify these complexities and avoid writing duplicate code. The most important macro I find myself using over and over again is a Date Comparison macro that allows me to pass in two date-time objects and compare them without worrying about partial dates. It also allows you to compare a DATE versus a DATETIME versus a TIMESTAMP object without using CAST()or worrying about type mismatches.

Below is the macro —

Compare dates with different precisions and pass back TRUE or FALSE.

This macro has two parts: date cleaning and date comparison. I found that the date cleaning helped strip out the TIME portion of dates (for Medical data, our lowest relevant grain is often DAY — no less — however, you could change this to handle time components as well).

Some of the more important points of the code —

  • Jinja has some weird syntax — Line 3 is basically saying “IF the length of the dbt is greater than 10 then continue into the nested IF loop”.
  • All Jinja is denoted by curly braces, and anything not within curly braces is just plain SQL (Line 4, for example, is a normal BQ function).
  • {{}} is used for evaluating variables, versus {% %} which is used for ‘control structures’ such as defining variables, loops, or IF/ELSE blocks
  • {%- -%} is the same as {% %} except it trims out excess whitespace.
  • If you use the above function and then take a look at the compiled code, you’ll see that all it’s doing is passing in the CASE/WHEN logic directly into the SQL script you’re calling it from. Just like a CTE!

Macro #2 — Custom Schema Names

dbt has many pre-defined macros that are defined under the hood. If you’ve ever used a dbt Config block or a post-hook, both of these rely on code that looks pretty similar to any other kind of macro! Just like other languages, we can overload those function definitions with our own definitions.

I use one macro in every single data mart because it allows me to organize my BigQuery datasets much easier: generate_schema_name.

Normally, you can use a config block setting called schema to change the name of the schema, but it will always output your default schema name concatenated as a prefix. My teams have never wanted that, so we did a function overload on the generate_schema_name.sql macro that dbt wrote. It looks like this —

Before implementing this macro overload, my schema names looked something like this —

Default Schema evaluates to data_mart
Schema=’example_data_mart’ evaluates to data_mart_example_data_mart

After we implement it, the schema will evaluate to whatever we set it to in the Config block. If we don’t put anything in the config block, the schema will evaluate the project default.

On past teams, we’ve gone really in the weeds with overriding the generate_schema_function to perform operations such as creating custom schema naming patterns for our test failures and model-dev environments. The important part of this is the ability to override existing dbt functions to have more control over your data pipelines.

Macro #3 — Executing SQL code outside a model

One of the most powerful dbt tricks is the ability to write and executive a query, then use the output of that query to inform another part of your code. When I was designing complex systems in dbt, I would regularly pair a jinja execute block that queried the systems INFORMATION_SCHEMA with a for loop that ran through every table in the information schema.

Below is a macro that returns TRUE if a table exists and has more than a certain number of rows —

The more important parts of the macro are explained below:

  • First, we use the built-in adapter.get_relation() to let us know if the table exists (so that when we try and query it, it doesn’t break at runtime).
  • If the table does in fact exist, we enter the else statement and “set” our query from lines 14–17.
  • On line 19, we gather the results by running our query into the built-in run_query() function.
  • Line 21 is syntax — we need it so that our code doesn’t break if the query didn’t run for some reason.
  • Line 22 is a simple log statement used for debugging.
  • Line 28 returns TRUE or FALSE, contrary to what I said above! macros can return values, they’re just usually only used for interpolating SQL code.
  • Note that on line 28 we also don’t wrap the threshold variable in curly braces — that’s because we’re already inside a curly brace section.

In a future post, I’ll explain how to go from the above to querying your INFORMATION_SCHEMA and looping through the results.

Conclusion

When used correctly, dbt has many extremely powerful use cases that allow you to do a whole lot more than just write SQL. These use cases have the potential to save time, remove redundant code, and even unlock new potentials. For instance, the technique outlined in the macro #3 section was used by my last team to ‘proofread’ model outputs before we overwrote production data, so we didn’t accidentally drop rows.

Hopefully, the above macros are helpful! If you have any macro that you use reguarly and you’d like to share, drop them in the comments below. Happy coding!

--

--

Noah Kennedy
Noah Kennedy

Written by Noah Kennedy

DE @ Meta. Previously - Tempus AI. All opinions are my own. Easily excited, mainly by topics like dbt, endurance sports, pour-overs, and biotech trends.

Responses (1)