Airflow: Datasets

Airflow: Datasets

Learn how to take advantage datasets to create data-aware scheduled DAGs.

rate limit

Code not recognized.

About this course

Welcome! We're so glad you're here 😍

Want to execute your DAG on a dataset that another team created but you also don't want your DAG to be waiting for it and unnecessarily consume resources while at it.

Well good news is Datasets allow you to trigger a dag automatically when the source dataset is updated or made available for your dag.

Therefore In addition to scheduling DAGs based upon time, they can also be scheduled based upon a task updating a dataset.

This means you can divide your data pipeline using datasets into smaller, loosely coupled DAGs that can be orchestrated together.

🎯Objectives

At the end of this course, you'll be able to:

  • Define what are datasets and their limitation
  • Implement datasets in your DAGs
  • Support smaller, more self-contained DAGs over big DAGs

👥 Audience

Who should take this course:

  • Data Engineers
  • Data Analysts
  • Software Engineers

Set aside 17 minutes to complete the course.

💻 Setup Requirements

You need to have the following:

Curriculum0 hr 16 min

  • Introduction
  • Welcome! 0 hr 1 min
  • The New Paradigm
  • The use cases 0 hr 3 min
  • What is a Dataset? 0 hr 1 min
  • Data-aware scheduled DAG in action! 0 hr 3 min
  • Datasets to replace all?
  • Move further
  • Practice: Waiting for multiple Datasets
  • How does Airflow know Dataset updates? 0 hr 1 min
  • Conditional Dataset Scheduling 0 hr 3 min
  • Monitor your Datasets 0 hr 4 min
  • Dataset limitations
  • Finishing up...
  • Quiz!
  • Summary
  • How was it?

About this course

Welcome! We're so glad you're here 😍

Want to execute your DAG on a dataset that another team created but you also don't want your DAG to be waiting for it and unnecessarily consume resources while at it.

Well good news is Datasets allow you to trigger a dag automatically when the source dataset is updated or made available for your dag.

Therefore In addition to scheduling DAGs based upon time, they can also be scheduled based upon a task updating a dataset.

This means you can divide your data pipeline using datasets into smaller, loosely coupled DAGs that can be orchestrated together.

🎯Objectives

At the end of this course, you'll be able to:

  • Define what are datasets and their limitation
  • Implement datasets in your DAGs
  • Support smaller, more self-contained DAGs over big DAGs

👥 Audience

Who should take this course:

  • Data Engineers
  • Data Analysts
  • Software Engineers

Set aside 17 minutes to complete the course.

💻 Setup Requirements

You need to have the following:

Curriculum0 hr 16 min

  • Introduction
  • Welcome! 0 hr 1 min
  • The New Paradigm
  • The use cases 0 hr 3 min
  • What is a Dataset? 0 hr 1 min
  • Data-aware scheduled DAG in action! 0 hr 3 min
  • Datasets to replace all?
  • Move further
  • Practice: Waiting for multiple Datasets
  • How does Airflow know Dataset updates? 0 hr 1 min
  • Conditional Dataset Scheduling 0 hr 3 min
  • Monitor your Datasets 0 hr 4 min
  • Dataset limitations
  • Finishing up...
  • Quiz!
  • Summary
  • How was it?