Testing ops and jobs#

You can find the code for this example on Github

Data applications are notoriously difficult to test and are therefore often un- or under-tested.

Creating testable and verifiable ops and jobs is one of the focuses of Dagster. We believe ensuring data quality is critical for managing the complexity of data systems. Here, we'll show how to write unit tests for Dagster jobs and ops.


Testing the job (and its ops)#

Let's go back to the diamond job we wrote in the prior section, and ensure that it's working as expected by writing some unit tests.

We'll start by writing a test for the test_get_total_size op, which takes a dictionary of file sizes as input and returns the sum of the file sizes. To run an op, we can invoke it directly, as if it's a regular Python function:

def test_get_total_size():
    file_sizes = {"file1": 400, "file2": 50}
    result = get_total_size(file_sizes)
    assert result == 450

We'll also write a test for the entire job. The JobDefinition.execute_in_process method synchronously executes a job and returns a ExecuteInProcessResult, whose methods let us investigate, in detail, the success or failure of execution, the outputs produced by ops, and (as we'll see later) other events associated with execution.

def test_diamond():
    res = diamond.execute_in_process()
    assert res.success
    assert res.output_for_node("get_total_size") > 0

Now we can use pytest, or another test runner of choice, to run these unit tests.

pytest test_complex_job.py

Obviously, in production we'll often execute jobs in a parallel, streaming way that doesn't admit this kind of API, which is intended to enable local tests like this.

Dagster is written to make testing easy in a domain where it has historically been very difficult. You can learn more about Testing in Dagster by reading the Testing page.



Conclusion#

πŸŽ‰ Congratulations! Having reached this far, you now have a working, testable, and maintainable op-based job.