Welcome to Dagster! In this guide, you'll build a simple data pipeline in Dagster that downloads the top 10 HackerNews stories. In three quick steps, you'll have functional code and begin exploring Dagster's user interface.
Create a file named hello-dagster.py that contains the following code:
import pandas as pd
import requests
from dagster import MetadataValue, Output, asset, repository
@assetdefhackernews_top_story_ids():"""
Get top stories from the HackerNews top stories endpoint.
API Docs: https://github.com/HackerNews/API#new-top-and-best-stories
"""
top_story_ids = requests.get("https://hacker-news.firebaseio.com/v0/topstories.json").json()return top_story_ids[:10]# asset dependencies can be inferred from parameter names@assetdefhackernews_top_stories(hackernews_top_story_ids):"""Get items based on story ids from the HackerNews items endpoint"""
results =[]for item_id in hackernews_top_story_ids:
item = requests.get(f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json").json()
results.append(item)
df = pd.DataFrame(results)# recorded metadata can be customized
metadata ={"num_records":len(df),"preview": MetadataValue.md(df[["title","by","url"]].to_markdown()),}return Output(value=df, metadata=metadata)
Click Materialize All to run the pipeline and create your assets. Materializing an asset runs the asset function and saves the result. This pipeline uses the Dagster defaults to save the result to a pickle file on disk.
That's it! You now have two materialized Dagster assets:
But wait - there's more. Because the hackernews_top_stories asset specified metadata, you can view the metadata right in Dagit:
Click the asset.
In the sidebar that displays, click the Show Markdown link in the Materialization in Last Run section. This opens a preview of the pipeline result, allowing you to view the top 10 HackerNews stories: