Data Engineering With Dagster Part Five – Automating With Schedules
Table of Contents
🎬 Finally Automating: Dagster Schedules
Let’s be honest — if you’re still manually triggering pipeline runs in 2025, your orchestrator is more like a glorified button clicker.
In this post, we’ll hook Dagster up with schedules to automate jobs and run selected assets on our terms — daily, weekly, monthly, you name it.
🧠 What Is a Schedule?
Think: “Run this job every Monday at midnight.” That’s what a schedule is — it defines when a Dagster job should execute.
You’ve probably seen similar setups using cron
in Linux or crontab files to schedule shell scripts.
Dagster just gives it all a Pythonic polish and tucks it neatly into your orchestration layer.
🧩 Anatomy of a Dagster Schedule
To build a schedule, you need:
- A Job (what to run)
- A Cron expression (when to run it)
Other optional components:
- Tags
- Run configuration
- Execution time evaluation
But for now, we’re keeping it lean and practical: Job + Cron = Schedule.
⚙️ Step 1: Define Your Job
When working with a big asset graph, you don’t always want to materialize everything in one go. Jobs let you slice up your graph and selectively execute parts of it.
Here’s how to isolate an asset using AssetSelection
and define a custom job:
# jobs.py
import dagster as dg
trips_by_week = dg.AssetSelection.assets("trips_by_week")
trip_update_job = dg.define_asset_job(
name="trip_update_job",
selection=dg.AssetSelection.all() - trips_by_week
)
💡 We exclude
trips_by_week
from this job because it has its own separate schedule. Think of this as the “everything else” job.
🧠 Mini Excursus: What Are Dagster Jobs, Really?
A job in Dagster is a reusable, named way to trigger a set of asset materializations.
Why care?
- You can create different jobs for different subsets of assets.
- Run one job in a K8s pod and another in-process.
- Schedule them differently.
- Add custom configs or tags.
Best practice: keep job definitions in a dedicated module, like jobs.py
.
🎯 Step 2: Add a Second Job for Our Weekly Asset
Let’s create a job that only runs the trips_by_week
asset:
# jobs.py
weekly_update_job = dg.define_asset_job(
name="weekly_update_job",
selection=trips_by_week
)
Simple and scoped. Clean separation of logic.
⏱️ Step 3: Cron Expressions 101
Cron is that weird 5-part string format that looks like keyboard spam but secretly controls most of automation:
15 5 * * 1-5
That means: Every weekday (Mon–Fri) at 5:15AM
Dagster uses the same syntax. Example:
# schedules.py
trip_update_schedule = dg.ScheduleDefinition(
job=trip_update_job,
cron_schedule="0 0 5 * *", # Every 5th of the month at midnight
)
🧠 Mini Excursus: Understanding Cron Syntax
Field | Meaning | Example | Notes |
---|---|---|---|
Minute | 0–59 | 0 |
Top of the hour |
Hour | 0–23 | 0 |
Midnight |
Day | 1–31 | 5 |
5th of the month |
Month | 1–12 | * |
Every month |
Weekday | 0–6 (Sun–Sat) | 1 |
Every Monday |
Want help building cron expressions? Use Crontab Guru — it’s simple, accurate, and shows examples live.
📆 Step 4: Schedule the Weekly Job
# schedules.py
from dagster_essentials.jobs import weekly_update_job
weekly_update_schedule = dg.ScheduleDefinition(
job=weekly_update_job,
cron_schedule="0 0 * * 1", # Every Monday at midnight
)
📦 Step 5: Plug Jobs into Definitions
Head to definitions.py
and tell Dagster about your jobs and schedules:
# definitions.py
from dagster_essentials.jobs import trip_update_job, weekly_update_job
from dagster_essentials.schedules import trip_update_schedule, weekly_update_schedule
all_jobs = [trip_update_job, weekly_update_job]
all_schedules = [trip_update_schedule, weekly_update_schedule]
defs = dg.Definitions(
assets=[*trip_assets, *metric_assets],
resources={"database": database_resource},
jobs=all_jobs,
schedules=all_schedules,
)
Boom. Now it’s all hooked in.
🧪 Testing It in the UI
Spin up the Dagster UI (dagster dev
) and check:
- Jobs:
Overview > Jobs
- Schedules:
Overview > Schedules
You’ll see:
Field | Description |
---|---|
Job Name | trip_update_job / weekly_update_job |
Schedules Linked | One or more |
Last Run | Timestamp of most recent execution |
Enabled? | Yes/No toggle (click to toggle state) |
You can test schedules using the “Test Schedule” button — simulate ticks and preview runs before they’re real.
🧠 Mini Excursus: dagster-daemon
When you run dagster dev
, you also spawn dagster-daemon
under the hood.
This background process handles:
- Running schedules
- Polling sensors
- Executing retries or hooks
Without the daemon, schedules won’t tick. So always make sure it’s up when testing automation.
🧠 Knowledge Check
-
Do schedules materialize everything in your pipeline?
Nope — schedules are attached to jobs, and jobs define which assets get run. -
Can I have multiple jobs using the same asset?
Absolutely. This lets you run different assets on different timelines or environments. -
Is cron still relevant in 2025?
Weirdly, yes — and it’s still the best shorthand we’ve got for “when stuff runs.”
✅ Wrapping Up
This part was all about turning orchestration into automation. By defining clear jobs and pairing them with cron-driven schedules, we let Dagster take the wheel.
No more manual runs.
No more “oops I forgot to click the button.”
Just clean, hands-off data engineering.
Next up? Probably Partitions and backfills — be suprised when you see it.