Site Logo
Niklas Heringer - Cybersecurity Blog

Data Engineering With Dagster - Part Three: Definitions and Code Locations

📦 Definitions & Code Locations – Your Project’s Backbone

So far, we’ve written asset functions, linked them together, and watched Dagster build clean execution graphs. But now we shift focus from individual whats to the big-picture how.

How does Dagster know which assets exist?
Where do they live?
How are they grouped and isolated?

Welcome to the world of Definitions and Code Locations.


🧠 What Are Definitions in Dagster?

Every time you define an asset, a resource, a schedule, a sensor, or anything Dagster-related, you’re contributing to your project’s Definitions.

Dagster needs a central object — a Definitions() container — that it can scan and load. When you run dagster dev, Dagster goes:

“Cool, let me find the Definitions object for this code location and boot up all registered components.”

You can think of the Definitions() object as a registry of everything your Dagster project knows how to do.


🍪 Example: Baking With Definitions

Remember our cookie asset graph?

flour ─▶ cookie_dough ─▶ cookies

Let’s say you’ve got those assets defined like this:

@asset
def flour(): ...

@asset
def cookie_dough(flour): ...

@asset
def cookies(cookie_dough): ...

In your definitions.py, you’d register them like so:

from dagster import Definitions

defs = Definitions(
    assets=[
        flour,
        cookie_dough,
        cookies,
    ]
)

This is the minimal working unit of a Dagster project. With just this file and these assets, dagster dev will find your graph, and the UI will light up.


🧪 Real-World Breakdown: definitions.py

Let’s look at a typical setup with assets spread across modules like trips.py and metrics.py.

import dagster as dg

from .assets import metrics, trips

trip_assets = dg.load_assets_from_modules([trips])
metric_assets = dg.load_assets_from_modules([metrics])

defs = dg.Definitions(
    assets=[*trip_assets, *metric_assets],
)

🔍 Line-by-Line Breakdown

💡 This Definitions object is where you’ll also later register resources, schedules, and sensors — not just assets.


🧱 What Are Code Locations?

A code location is:

  1. A Python module (with a Definitions() object)
  2. A Python environment capable of running that module

It’s like a virtual kitchen where your Dagster project lives and operates.

🍪 Kitchen Analogy, Again

Let’s go back to our cookies.

Need to scale up production? Add more kitchens (code locations), each with their own recipe book.


🧠 Mini Excursus: What Is a Deployment?

In software, a deployment is how you make your code live and operational in a real-world environment. That includes:

In Dagster, a deployment includes one or more code locations.


🧩 Why Code Locations Matter as You Scale

At small scale, one Definitions object is enough. But when teams, pipelines, and assets grow, things get messy fast.

Traditionally, people tried multiple Dagster deployments to fix this. But that leads to:

🚀 Dagster’s Solution: Isolate With Code Locations

Each code location can:

But here’s the magic: it’s all still one deployment.
Dagster keeps the “single pane of glass” approach alive, while still isolating user code.


🧁 From Home Baker to Full Bakery

Let’s say you start with a home kitchen. As things grow, you split out:

Each one is separate (no chocolate splatter in the vanilla zone), but they all belong to the same bakery. This is how code locations work in Dagster.

A fire in one kitchen doesn’t burn down your whole deployment.


🔍 Examples of How to Use Code Locations

Purpose Code Location Example
Team isolation marketing_assets, ml_team
Language/version split python39_legacy, py311_modern
Function split etl_jobs, reporting, ml_pipeline
Compliance boundary hipaa_assets, gdpr_sensitive

Dagster will keep all these locations active within one deployment, and allow assets across locations to depend on each other.


🔐 Security Benefit

Since each location runs in its own process, with its own Python interpreter and packages, it creates a sandboxed boundary around user code.

This is great for:


Code Locations in the Dagster UI

Once your project uses code locations, they become visible and manageable right from the Dagster UI.

Navigate to the “Deployments” tab in the top bar — here you’ll see all registered code locations along with:

💡 Each code location is named after the Python module it loads. So if your module is called dagster_university, that’ll be its name in the UI — even if it’s just a folder on your system.

If a code location fails to load (for example, due to a Python error in your assets), you’ll see a “View Error” option to help you troubleshoot.

When you make updates to your project — maybe you added a new asset, tweaked a resource, or modified a job — these won’t show up until you tell Dagster to reload the code location.

There are two ways to do this:

  1. Deployments > Code Locations → click Reload
  2. Global Asset Lineage → click Reload Definitions

Reloading tells Dagster:

“Hey, grab the latest version of this module and update what you know about its assets, resources, and jobs.”

And no — loading assets into a Definitions object does not materialize them. It just registers them. Think of it as indexing, not executing.


🧠 Mini Excursus: Folder, Script, or Module?

Let’s quickly demystify how Python organizes your code — because this plays directly into how Dagster loads your Definitions.

Term What it means Example
Script A single .py file you can run process_data.py
Module Any .py file or importable object import metrics
Package A folder with an __init__.py file from analytics import utils

📁 A folder becomes a module only if it contains an __init__.py file.

This file tells Python:

“Treat this folder as an importable thing.”

If you leave it out, Dagster (and Python) will not recognize the folder as a valid module — and that can lead to mysterious import errors or failed code locations.

💡 Best practice: Organize your project as a Python package with clearly defined modules — especially when working with multiple code locations.

my_project/
├── assets/
│   ├── __init__.py
│   ├── trips.py
│   └── metrics.py
├── definitions.py
└── dagster.yaml

Then load your assets with:

from assets import trips, metrics

Clean. Reusable. Dagster-friendly.


✅ Quick Check: How Well Do You Know This?

Let’s test your understanding of Definitions and Code Locations:

  1. In Dagster, a Definitions object contains…
    → ✔️ All your project’s assets, jobs, schedules, sensors, and resources.

  2. If you change your code and want to see updates in the UI, you must…
    → 🔁 Reload the code location.
    Just refreshing the browser won’t do anything — especially if Dagster’s backend is already running.

  3. True or False?
    “Including an asset in a Definitions object causes it to run automatically.”
    → ❌ False.
    Assets must be materialized explicitly. Definitions just register them.


🎯 Wrapping Up

Definitions and Code Locations are the structural backbone of any scalable Dagster setup.

By understanding these concepts, you move from just building pipelines — to designing full, reliable, and modular data platforms.

Whether you’re a solo builder or working across teams, this setup lets you grow confidently without losing control.

In Part 4, we’ll look into resources, sensors, and schedules — the moving parts that bring automation and data-awareness into your asset graph.

Until then: stay modular. Stay sharp. And remember to reload your code locations. 😉