Data Engineering With Dagster - Part Three: Definitions and Code Locations

Posted on Apr 3, 2025 7 mins

Dagster Orchestration Definitions Code-Locations Deployments Python-Modules

Table of Contents

📦 Definitions & Code Locations – Your Project’s Backbone

So far, we’ve written asset functions, linked them together, and watched Dagster build clean execution graphs. But now we shift focus from individual whats to the big-picture how.

How does Dagster know which assets exist?
Where do they live?
How are they grouped and isolated?

Welcome to the world of Definitions and Code Locations.

🧠 What Are Definitions in Dagster?

Every time you define an asset, a resource, a schedule, a sensor, or anything Dagster-related, you’re contributing to your project’s Definitions.

Dagster needs a central object - a Definitions() container - that it can scan and load. When you run dagster dev, Dagster goes:

“Cool, let me find the Definitions object for this code location and boot up all registered components.”

You can think of the Definitions() object as a registry of everything your Dagster project knows how to do.

🍪 Example: Baking With Definitions

Remember our cookie asset graph?

flour ─▶ cookie_dough ─▶ cookies

Let’s say you’ve got those assets defined like this:

@asset
def flour(): ...

@asset
def cookie_dough(flour): ...

@asset
def cookies(cookie_dough): ...

In your definitions.py, you’d register them like so:

from dagster import Definitions

defs = Definitions(
    assets=[
        flour,
        cookie_dough,
        cookies,
    ]
)

This is the minimal working unit of a Dagster project. With just this file and these assets, dagster dev will find your graph, and the UI will light up.

🧪 Real-World Breakdown: definitions.py

Let’s look at a typical setup with assets spread across modules like trips.py and metrics.py.

import dagster as dg

from .assets import metrics, trips

trip_assets = dg.load_assets_from_modules([trips])
metric_assets = dg.load_assets_from_modules([metrics])

defs = dg.Definitions(
    assets=[*trip_assets, *metric_assets],
)

🔍 Line-by-Line Breakdown

import dagster as dg
→ Classic import, shorthand for the Dagster library
from .assets import metrics, trips
→ Pull in our asset modules (Python packages that house your actual logic)
dg.load_assets_from_modules(...)
→ This tells Dagster: “Yo, scan these modules and register all assets inside them.”
defs = Definitions(...)
→ Combine everything into a centralized object Dagster can index.

💡 This Definitions object is where you’ll also later register resources, schedules, and sensors - not just assets.

🧱 What Are Code Locations?

A code location is:

A Python module (with a Definitions() object)
A Python environment capable of running that module

It’s like a virtual kitchen where your Dagster project lives and operates.

🍪 Kitchen Analogy, Again

Let’s go back to our cookies.

Your assets (flour, dough, cookies) are ingredients and recipes.
Your Definitions() object is your recipe book.
Your code location is the kitchen - the physical space where those recipes can actually be executed.

Need to scale up production? Add more kitchens (code locations), each with their own recipe book.

🧠 Mini Excursus: What Is a Deployment?

In software, a deployment is how you make your code live and operational in a real-world environment. That includes:

Python + libraries
Dagster UI
Orchestrator services
Storage backends
Your own asset code

In Dagster, a deployment includes one or more code locations.

🧩 Why Code Locations Matter as You Scale

At small scale, one Definitions object is enough. But when teams, pipelines, and assets grow, things get messy fast.

Traditionally, people tried multiple Dagster deployments to fix this. But that leads to:

🔧 Extra infrastructure
🔐 Complicated access control
💥 Version collisions (e.g., different teams using different PyTorch versions)
🤯 UI chaos (thousands of assets mixed together)

🚀 Dagster’s Solution: Isolate With Code Locations

Each code location can:

Run a different Python version or dependency set
Serve a different team (e.g., Marketing vs ML)
Contain just a slice of the full asset graph

But here’s the magic: it’s all still one deployment.
Dagster keeps the “single pane of glass” approach alive, while still isolating user code.

🧁 From Home Baker to Full Bakery

Let’s say you start with a home kitchen. As things grow, you split out:

A test kitchen for R&D
A packaging station
A decorating area

Each one is separate (no chocolate splatter in the vanilla zone), but they all belong to the same bakery. This is how code locations work in Dagster.

A fire in one kitchen doesn’t burn down your whole deployment.

🔍 Examples of How to Use Code Locations

Purpose	Code Location Example
Team isolation	`marketing_assets`, `ml_team`
Language/version split	`python39_legacy`, `py311_modern`
Function split	`etl_jobs`, `reporting`, `ml_pipeline`
Compliance boundary	`hipaa_assets`, `gdpr_sensitive`

Dagster will keep all these locations active within one deployment, and allow assets across locations to depend on each other.

🔐 Security Benefit

Since each location runs in its own process, with its own Python interpreter and packages, it creates a sandboxed boundary around user code.

This is great for:

Keeping experiments from crashing prod
Managing heavy dependencies (like GPU libs)
Running untrusted code safely

Code Locations in the Dagster UI

Once your project uses code locations, they become visible and manageable right from the Dagster UI.

Navigate to the “Deployments” tab in the top bar - here you’ll see all registered code locations along with:

✅ Their current status (running, failed, etc.)
🕒 Last update timestamp
🔁 A handy Reload button

💡 Each code location is named after the Python module it loads. So if your module is called dagster_university, that’ll be its name in the UI - even if it’s just a folder on your system.

If a code location fails to load (for example, due to a Python error in your assets), you’ll see a “View Error” option to help you troubleshoot.

When you make updates to your project - maybe you added a new asset, tweaked a resource, or modified a job - these won’t show up until you tell Dagster to reload the code location.

There are two ways to do this:

Deployments > Code Locations → click Reload
Global Asset Lineage → click Reload Definitions

Reloading tells Dagster:

“Hey, grab the latest version of this module and update what you know about its assets, resources, and jobs.”

And no - loading assets into a Definitions object does not materialize them. It just registers them. Think of it as indexing, not executing.

🧠 Mini Excursus: Folder, Script, or Module?

Let’s quickly demystify how Python organizes your code - because this plays directly into how Dagster loads your Definitions.

Term	What it means	Example
Script	A single `.py` file you can run	`process_data.py`
Module	Any `.py` file or importable object	`import metrics`
Package	A folder with an `__init__.py` file	`from analytics import utils`

📁 A folder becomes a module only if it contains an __init__.py file.

This file tells Python:

“Treat this folder as an importable thing.”

If you leave it out, Dagster (and Python) will not recognize the folder as a valid module - and that can lead to mysterious import errors or failed code locations.

💡 Best practice: Organize your project as a Python package with clearly defined modules - especially when working with multiple code locations.

my_project/
│
├── assets/
│   ├── __init__.py
│   ├── trips.py
│   └── metrics.py
│
├── definitions.py
└── dagster.yaml

Then load your assets with:

from assets import trips, metrics

Clean. Reusable. Dagster-friendly.

✅ Quick Check: How Well Do You Know This?

Let’s test your understanding of Definitions and Code Locations:

In Dagster, a Definitions object contains…
→ ✔️ All your project’s assets, jobs, schedules, sensors, and resources.
If you change your code and want to see updates in the UI, you must…
→ 🔁 Reload the code location.
Just refreshing the browser won’t do anything - especially if Dagster’s backend is already running.
True or False?
“Including an asset in a Definitions object causes it to run automatically.”
→ ❌ False.
Assets must be materialized explicitly. Definitions just register them.

🎯 Wrapping Up

Definitions and Code Locations are the structural backbone of any scalable Dagster setup.

Definitions() tells Dagster what your project can do.
Code locations tell Dagster where those components live, and how to run them.

By understanding these concepts, you move from just building pipelines - to designing full, reliable, and modular data platforms.

Whether you’re a solo builder or working across teams, this setup lets you grow confidently without losing control.

In Part 4, we’ll look into resources, sensors, and schedules - the moving parts that bring automation and data-awareness into your asset graph.

Until then: stay modular. Stay sharp. And remember to reload your code locations. 😉