Data Engineering With Dagster Part Five – Automating With Schedules

Posted on Apr 3, 2025 4 mins

Dagster Orchestration Data-Pipelines Scheduling

Table of Contents

🎬 Finally Automating: Dagster Schedules

Let’s be honest — if you’re still manually triggering pipeline runs in 2025, your orchestrator is more like a glorified button clicker.

In this post, we’ll hook Dagster up with schedules to automate jobs and run selected assets on our terms — daily, weekly, monthly, you name it.

🧠 What Is a Schedule?

Think: “Run this job every Monday at midnight.” That’s what a schedule is — it defines when a Dagster job should execute.

You’ve probably seen similar setups using cron in Linux or crontab files to schedule shell scripts.

Dagster just gives it all a Pythonic polish and tucks it neatly into your orchestration layer.

🧩 Anatomy of a Dagster Schedule

To build a schedule, you need:

A Job (what to run)
A Cron expression (when to run it)

Other optional components:

Tags
Run configuration
Execution time evaluation

But for now, we’re keeping it lean and practical: Job + Cron = Schedule.

⚙️ Step 1: Define Your Job

When working with a big asset graph, you don’t always want to materialize everything in one go. Jobs let you slice up your graph and selectively execute parts of it.

Here’s how to isolate an asset using AssetSelection and define a custom job:

# jobs.py
import dagster as dg

trips_by_week = dg.AssetSelection.assets("trips_by_week")

trip_update_job = dg.define_asset_job(
    name="trip_update_job",
    selection=dg.AssetSelection.all() - trips_by_week
)

💡 We exclude trips_by_week from this job because it has its own separate schedule. Think of this as the “everything else” job.

🧠 Mini Excursus: What Are Dagster Jobs, Really?

A job in Dagster is a reusable, named way to trigger a set of asset materializations.

Why care?

You can create different jobs for different subsets of assets.
Run one job in a K8s pod and another in-process.
Schedule them differently.
Add custom configs or tags.

Best practice: keep job definitions in a dedicated module, like jobs.py.

🎯 Step 2: Add a Second Job for Our Weekly Asset

Let’s create a job that only runs the trips_by_week asset:

# jobs.py
weekly_update_job = dg.define_asset_job(
    name="weekly_update_job",
    selection=trips_by_week
)

Simple and scoped. Clean separation of logic.

⏱️ Step 3: Cron Expressions 101

Cron is that weird 5-part string format that looks like keyboard spam but secretly controls most of automation:

15 5 * * 1-5

That means: Every weekday (Mon–Fri) at 5:15AM

Dagster uses the same syntax. Example:

# schedules.py
trip_update_schedule = dg.ScheduleDefinition(
    job=trip_update_job,
    cron_schedule="0 0 5 * *",  # Every 5th of the month at midnight
)

🧠 Mini Excursus: Understanding Cron Syntax

Field	Meaning	Example	Notes
Minute	0–59	`0`	Top of the hour
Hour	0–23	`0`	Midnight
Day	1–31	`5`	5th of the month
Month	1–12	`*`	Every month
Weekday	0–6 (Sun–Sat)	`1`	Every Monday

Want help building cron expressions? Use Crontab Guru — it’s simple, accurate, and shows examples live.

📆 Step 4: Schedule the Weekly Job

# schedules.py
from dagster_essentials.jobs import weekly_update_job

weekly_update_schedule = dg.ScheduleDefinition(
    job=weekly_update_job,
    cron_schedule="0 0 * * 1",  # Every Monday at midnight
)

📦 Step 5: Plug Jobs into Definitions

Head to definitions.py and tell Dagster about your jobs and schedules:

# definitions.py
from dagster_essentials.jobs import trip_update_job, weekly_update_job
from dagster_essentials.schedules import trip_update_schedule, weekly_update_schedule

all_jobs = [trip_update_job, weekly_update_job]
all_schedules = [trip_update_schedule, weekly_update_schedule]

defs = dg.Definitions(
    assets=[*trip_assets, *metric_assets],
    resources={"database": database_resource},
    jobs=all_jobs,
    schedules=all_schedules,
)

Boom. Now it’s all hooked in.

🧪 Testing It in the UI

Spin up the Dagster UI (dagster dev) and check:

Jobs: Overview > Jobs
Schedules: Overview > Schedules

You’ll see:

Field	Description
Job Name	trip_update_job / weekly_update_job
Schedules Linked	One or more
Last Run	Timestamp of most recent execution
Enabled?	Yes/No toggle (click to toggle state)

You can test schedules using the “Test Schedule” button — simulate ticks and preview runs before they’re real.

🧠 Mini Excursus: dagster-daemon

When you run dagster dev, you also spawn dagster-daemon under the hood.

This background process handles:

Running schedules
Polling sensors
Executing retries or hooks

Without the daemon, schedules won’t tick. So always make sure it’s up when testing automation.

🧠 Knowledge Check

Do schedules materialize everything in your pipeline?
Nope — schedules are attached to jobs, and jobs define which assets get run.
Can I have multiple jobs using the same asset?
Absolutely. This lets you run different assets on different timelines or environments.
Is cron still relevant in 2025?
Weirdly, yes — and it’s still the best shorthand we’ve got for “when stuff runs.”

✅ Wrapping Up

This part was all about turning orchestration into automation. By defining clear jobs and pairing them with cron-driven schedules, we let Dagster take the wheel.

No more manual runs.
No more “oops I forgot to click the button.”
Just clean, hands-off data engineering.

Next up? Probably Partitions and backfills — be suprised when you see it.