- Introduction
-
Welcome! 0 hr 1 min
- The New Paradigm
-
The use cases 0 hr 3 min
-
What is a Dataset? 0 hr 1 min
-
Data-aware scheduled DAG in action! 0 hr 3 min
-
Datasets to replace all?
- Move further
-
Practice: Waiting for multiple Datasets
-
How does Airflow know Dataset updates? 0 hr 1 min
-
Conditional Dataset Scheduling 0 hr 3 min
-
Monitor your Datasets 0 hr 4 min
-
Dataset limitations
- Finishing up...
-
Quiz!
-
Summary
-
How was it?
Airflow: Datasets
Learn how to take advantage datasets to create data-aware scheduled DAGs.
Welcome! We're so glad you're here 😍
Want to execute your DAG on a dataset that another team created but you also don't want your DAG to be waiting for it and unnecessarily consume resources while at it.
Well good news is Datasets allow you to trigger a dag automatically when the source dataset is updated or made available for your dag.
Therefore In addition to scheduling DAGs based upon time, they can also be scheduled based upon a task updating a dataset.
This means you can divide your data pipeline using datasets into smaller, loosely coupled DAGs that can be orchestrated together.
🎯Objectives
At the end of this course, you'll be able to:
- Define what are datasets and their limitation
- Implement datasets in your DAGs
- Support smaller, more self-contained DAGs over big DAGs
👥 Audience
Who should take this course:
- Data Engineers
- Data Analysts
- Software Engineers
Set aside 17 minutes to complete the course.
💻 Setup Requirements
You need to have the following:
- Docker and Docker compose on your computer (cf: get Docker)
- The Astro CLI
- Access to a web browser