Testing assets#

Creating testable and verifiable data pipelines is one of the focuses of Dagster. We believe ensuring data quality is critical for managing the complexity of data systems. Here, we'll cover how to write unit tests for individual assets, as well as for graphs of assets together.


Testing the cereal asset definitions#

Let's go back to the assets we defined in the prior section, and ensure that they work as expected by writing some unit tests.

We'll start by writing a test for the nabisco_cereals asset definition, which filters the larger list of cereals down to the those that were manufactured by Nabisco. To run the function that derives an asset from its upstream dependencies, we can invoke it directly, as if it's a regular Python function:

def test_nabisco_cereals():
    cereals = [
        {"name": "cereal1", "mfr": "N"},
        {"name": "cereal2", "mfr": "K"},
    ]
    result = nabisco_cereals(cereals)
    assert len(result) == 1
    assert result == [{"name": "cereal1", "mfr": "N"}]

We'll also write a test for all the assets together. To do that, we can put them in a list and then pass it to the materialize function. That returns an ExecuteInProcessResult, whose methods let us investigate, in detail, the success or failure of execution, the values produced by the computation, and other events associated with execution.

from dagster import materialize


def test_cereal_assets():
    assets = [
        nabisco_cereals,
        cereals,
        cereal_protein_fractions,
        highest_protein_nabisco_cereal,
    ]

    result = materialize(assets)
    assert result.success
    assert result.output_for_node("highest_protein_nabisco_cereal") == "100% Bran"

Now you can use pytest, or your test runner of choice, to run the unit tests.

pytest test_complex_asset_graph.py

Dagster is written to make testing easy in a domain where it has historically been very difficult. You can learn more about Testing in Dagster by reading the Testing page.



Conclusion#

🎉 Congratulations! Having reached this far, you now have a working, tested set of software-defined assets.

What if you want to do more?

  • Automating asset materialization - This tutorial covered how to manually materialize assets. Dagster can also kick off materializations automatically: on fixed schedules or when your sensor says to.
  • Partitioning assets - This tutorial covered assets whose entire contents get re-computed and overwritten with every materialization. When assets are large, it’s common to partition them, so that each run only materializes a single partition.
  • Customizing asset storage - The assets in this tutorial were materialized as pickle files on the local filesystem. IO managers let you customize how and where assets are stored - e.g. as tables in Snowflake or Parquet files in S3.
  • Non-asset jobs - This tutorial assumed that your only goal was producing assets. You can also build and run arbitrary jobs, on top of the same op and graph abstractions that underlie assets.