
Data Engineering With Dagster - Part Three: Definitions and Code Locations
Table of Contents
📦 Definitions & Code Locations – Your Project’s Backbone
So far, we’ve written asset functions, linked them together, and watched Dagster build clean execution graphs. But now we shift focus from individual whats to the big-picture how.
How does Dagster know which assets exist?
Where do they live?
How are they grouped and isolated?
Welcome to the world of Definitions and Code Locations.
🧠 What Are Definitions in Dagster?
Every time you define an asset, a resource, a schedule, a sensor, or anything Dagster-related, you’re contributing to your project’s Definitions.
Dagster needs a central object — a Definitions()
container — that it can scan and load. When you run dagster dev
, Dagster goes:
“Cool, let me find the Definitions object for this code location and boot up all registered components.”
You can think of the Definitions()
object as a registry of everything your Dagster project knows how to do.
🍪 Example: Baking With Definitions
Remember our cookie asset graph?
flour ─▶ cookie_dough ─▶ cookies
Let’s say you’ve got those assets defined like this:
@asset
def flour(): ...
@asset
def cookie_dough(flour): ...
@asset
def cookies(cookie_dough): ...
In your definitions.py
, you’d register them like so:
from dagster import Definitions
defs = Definitions(
assets=[
flour,
cookie_dough,
cookies,
]
)
This is the minimal working unit of a Dagster project. With just this file and these assets, dagster dev
will find your graph, and the UI will light up.
🧪 Real-World Breakdown: definitions.py
Let’s look at a typical setup with assets spread across modules like trips.py
and metrics.py
.
import dagster as dg
from .assets import metrics, trips
trip_assets = dg.load_assets_from_modules([trips])
metric_assets = dg.load_assets_from_modules([metrics])
defs = dg.Definitions(
assets=[*trip_assets, *metric_assets],
)
🔍 Line-by-Line Breakdown
import dagster as dg
→ Classic import, shorthand for the Dagster libraryfrom .assets import metrics, trips
→ Pull in our asset modules (Python packages that house your actual logic)dg.load_assets_from_modules(...)
→ This tells Dagster: “Yo, scan these modules and register all assets inside them.”defs = Definitions(...)
→ Combine everything into a centralized object Dagster can index.
💡 This Definitions object is where you’ll also later register resources, schedules, and sensors — not just assets.
🧱 What Are Code Locations?
A code location is:
- A Python module (with a
Definitions()
object) - A Python environment capable of running that module
It’s like a virtual kitchen where your Dagster project lives and operates.
🍪 Kitchen Analogy, Again
Let’s go back to our cookies.
- Your assets (flour, dough, cookies) are ingredients and recipes.
- Your
Definitions()
object is your recipe book. - Your code location is the kitchen — the physical space where those recipes can actually be executed.
Need to scale up production? Add more kitchens (code locations), each with their own recipe book.
🧠 Mini Excursus: What Is a Deployment?
In software, a deployment is how you make your code live and operational in a real-world environment. That includes:
- Python + libraries
- Dagster UI
- Orchestrator services
- Storage backends
- Your own asset code
In Dagster, a deployment includes one or more code locations.
🧩 Why Code Locations Matter as You Scale
At small scale, one Definitions object is enough. But when teams, pipelines, and assets grow, things get messy fast.
Traditionally, people tried multiple Dagster deployments to fix this. But that leads to:
- 🔧 Extra infrastructure
- 🔐 Complicated access control
- 💥 Version collisions (e.g., different teams using different PyTorch versions)
- 🤯 UI chaos (thousands of assets mixed together)
🚀 Dagster’s Solution: Isolate With Code Locations
Each code location can:
- Run a different Python version or dependency set
- Serve a different team (e.g., Marketing vs ML)
- Contain just a slice of the full asset graph
But here’s the magic: it’s all still one deployment.
Dagster keeps the “single pane of glass” approach alive, while still isolating user code.
🧁 From Home Baker to Full Bakery
Let’s say you start with a home kitchen. As things grow, you split out:
- A test kitchen for R&D
- A packaging station
- A decorating area
Each one is separate (no chocolate splatter in the vanilla zone), but they all belong to the same bakery. This is how code locations work in Dagster.
A fire in one kitchen doesn’t burn down your whole deployment.
🔍 Examples of How to Use Code Locations
Purpose | Code Location Example |
---|---|
Team isolation | marketing_assets , ml_team |
Language/version split | python39_legacy , py311_modern |
Function split | etl_jobs , reporting , ml_pipeline |
Compliance boundary | hipaa_assets , gdpr_sensitive |
Dagster will keep all these locations active within one deployment, and allow assets across locations to depend on each other.
🔐 Security Benefit
Since each location runs in its own process, with its own Python interpreter and packages, it creates a sandboxed boundary around user code.
This is great for:
- Keeping experiments from crashing prod
- Managing heavy dependencies (like GPU libs)
- Running untrusted code safely
Code Locations in the Dagster UI
Once your project uses code locations, they become visible and manageable right from the Dagster UI.
Navigate to the “Deployments” tab in the top bar — here you’ll see all registered code locations along with:
- ✅ Their current status (running, failed, etc.)
- 🕒 Last update timestamp
- 🔁 A handy Reload button
💡 Each code location is named after the Python module it loads. So if your module is called
dagster_university
, that’ll be its name in the UI — even if it’s just a folder on your system.
If a code location fails to load (for example, due to a Python error in your assets), you’ll see a “View Error” option to help you troubleshoot.
When you make updates to your project — maybe you added a new asset, tweaked a resource, or modified a job — these won’t show up until you tell Dagster to reload the code location.
There are two ways to do this:
- Deployments > Code Locations → click Reload
- Global Asset Lineage → click Reload Definitions
Reloading tells Dagster:
“Hey, grab the latest version of this module and update what you know about its assets, resources, and jobs.”
And no — loading assets into a Definitions object does not materialize them. It just registers them. Think of it as indexing, not executing.
🧠 Mini Excursus: Folder, Script, or Module?
Let’s quickly demystify how Python organizes your code — because this plays directly into how Dagster loads your Definitions.
Term | What it means | Example |
---|---|---|
Script | A single .py file you can run |
process_data.py |
Module | Any .py file or importable object |
import metrics |
Package | A folder with an __init__.py file |
from analytics import utils |
📁 A folder becomes a module only if it contains an
__init__.py
file.
This file tells Python:
“Treat this folder as an importable thing.”
If you leave it out, Dagster (and Python) will not recognize the folder as a valid module — and that can lead to mysterious import errors or failed code locations.
💡 Best practice: Organize your project as a Python package with clearly defined modules — especially when working with multiple code locations.
my_project/
│
├── assets/
│ ├── __init__.py
│ ├── trips.py
│ └── metrics.py
│
├── definitions.py
└── dagster.yaml
Then load your assets with:
from assets import trips, metrics
Clean. Reusable. Dagster-friendly.
✅ Quick Check: How Well Do You Know This?
Let’s test your understanding of Definitions and Code Locations:
-
In Dagster, a
Definitions
object contains…
→ ✔️ All your project’s assets, jobs, schedules, sensors, and resources. -
If you change your code and want to see updates in the UI, you must…
→ 🔁 Reload the code location.
Just refreshing the browser won’t do anything — especially if Dagster’s backend is already running. -
True or False?
“Including an asset in aDefinitions
object causes it to run automatically.”
→ ❌ False.
Assets must be materialized explicitly. Definitions just register them.
🎯 Wrapping Up
Definitions and Code Locations are the structural backbone of any scalable Dagster setup.
Definitions()
tells Dagster what your project can do.- Code locations tell Dagster where those components live, and how to run them.
By understanding these concepts, you move from just building pipelines — to designing full, reliable, and modular data platforms.
Whether you’re a solo builder or working across teams, this setup lets you grow confidently without losing control.
In Part 4, we’ll look into resources, sensors, and schedules — the moving parts that bring automation and data-awareness into your asset graph.
Until then: stay modular. Stay sharp. And remember to reload your code locations. 😉