Lesson 2: How to Design for Efficiency
Once you get your hands on a dbt project, it can feel like there are infinite ways to go. And to be honest...there are. But with great power of choice comes great responsibility, which means you need
How to Design for Efficiency
This lesson is going to be all about actionable tips.
But buckle up because we're going to get pretty specific.
If there are concepts, terms or features mentioned that are unclear, use it as a sign that it's an area for you to explore more outside of this guide.
These are things I've noticed that clearly separate solid vs struggling projects.
I'm going to show you 3 ways you can design your project for maximum efficiency by:
Using Directories
Staying DRY
Separating Dev environments
Some of these you may already know.
But even if just one sparks a new idea then it's worth the read.
#1 - Use Directories
Out of the box, your skeleton dbt project will be full of directories.
But in this section we're focused on your models/ directory.
Keeping this area organized is the single most important part of your project.
Here's why:
Set reusable configurations
The main file that all projects have is called [dbt_project.ym](<https://docs.getdbt.com/reference/dbt_project.yml>)l.
And within it, you can set configurations for entire directories and sub-directories of models.
Things like how models are materialized (ex. table or view) or if they should be deployed to specific schemas in your database.
That means you don't need to recreate or copy the same configuration for each new file.
It doesn't matter if you have 1 model or 100, it still applies.
This allows you to continuously add models to a directory without worrying about the configurations.
Run together
By grouping models into directories, you can also take advantage of various operators.
Most dbt commands allow the option to select resources based on a file or directory path.
Therefore, grouping by directory means you can quickly identify and run all models in it.
No need to select each individually or hope you didn't miss one.
For example, the command dbt run -s marts.reporting.* **would run all models in the models/marts/reporting path.
Stay organized
A phrase I heard in a dbt training once was:
“new lines are cheap, brain power is expensive”
Essentially this means you should write code in a way that's easy to read vs trying to be cute.
But this applies also to your entire project design and how you organize with directories.
Directories allow you to logically group concepts together in a way that's easy to understand.
Any new developer could join the team and immediately contribute without much thought.
This might seem obvious.
But you'd be surprised how many projects get off track by not doing this.
<aside> 💡 Takeaway: Always break up your models into different directories & sub-directories so you can simplify configs, maximize functionality & stay organized.
</aside>
#2 - Stay DRY
DRY stands for Dont-Repeat-Yourself and is a popular phrase in the software (and now data) development world.
A big problem data teams historically face is having the same logic being duplicated all over the place.
(WET code - Write Everything Twice)
If you've worked on any data team in the last 10 years, I'm sure you've seen this happen.
This makes changes unnecessarily complicated/time consuming and leaves the potential for conflicting logic.
But if you are truly focused on DRY code, this duplicating of logic is unacceptable.
You want to be "modular", which in the development world means having a single source for a given piece of logic.
Being modular also has the side effect of forcing you to more deeply think through your changes.
A common approach is to create a "staging" layer in your project that sits on top of your source data.
This acts as the base of all future dbt-created models.
The goal is to handle all light transformations (renames, formatting, etc.) at the very start, one time, so you don't have to repeat it later.
Here's a helpful link explaining this approach in more detail.
<aside> 💡 Takeaway: When writing dbt code, you'll be tempted to just copy/paste some logic you saw somewhere else and be done with it.
But this "shortcut" decision is the fastest way for your project becoming a mess.
Don't do it!
</aside>
There's much more on the idea of DRY code that's outside the scope of this lesson.
So here's another link on the dbt site to dig more into it if you'd like.
#3 - Separate Dev Environments
The last tip to help you and your team build faster projects is to establish separate (custom) development schemas for each developer.
A common convention is to name the custom schema dbt_[firstInitialLastName].
For example, for me it would be dbt_mkahan.
The benefit of this is that it gives each developer a safe space to deploy and test their changes outside of production.
It also avoids accidental conflicts or overriding changes by two developers working on the same models.
Instead, each developer can:
Create their own branch (using git) to make changes
Deploy & test in their own safe space (
devschema)Submit their code for review when ready
Safely merge to
productiononce approved
And any conflicts would be called out through your version control platform (ex. GitHub) prior to merging.
Technically speaking, to make this happen you need to update your credentials file, aka the profiles.yml.
Each developer should create a dev target with a unique value.
For more, here's a link on using custom schemas.
<aside> 💡 Takeaway: Get out of each other's way by using separate dev schemas and you'll speed up the entire process.
</aside>
Hopefully your wheels are starting to turn with the possibilities for using dbt.
If you remember to use directories, stay DRY and have separate schemas, the speed of development won't be a problem.
And just like the project design, your SQL models should also follow a consistent format.
This not only makes your development process more efficient but makes it easier for other developers to read and debug your code.
I break down the 4 step approach I follow for every model in a separate YouTube video that I've added below if you'd like to check it out.
Learn a simple 4-step process for creating dbt models [4mins]
Let’s now move on to Lesson 3: Master the Art of Automation where you'll learn how to be more dynamic in your day-to-day operations.
Last updated