Write a task¶

Using the project structure from the previous tutorial, write your first task.

The task task_create_random_data is defined in src/my_project/task_data_preparation.py and generates a data set stored in bld/data.pkl.

The task_ prefix for modules and task functions is important so that pytask automatically discovers them.

my_project
│
├───.pytask
│
├───bld
│   └────data.pkl
│
├───src
│   └───my_project
│       ├────__init__.py
│       ├────config.py
│       └────task_data_preparation.py
│
└───pyproject.toml

Generally, a task is a function whose name starts with task_. Tasks produce outputs and the most common output is a file which we will focus on throughout the tutorials.

The following interfaces are different ways to specify the products of a task which is necessary for pytask to correctly run a workflow. The interfaces are ordered from most (left) to least recommended (right).

Important

You cannot mix different interfaces for the same task. Choose only one.

Annotated

The task accepts the argument path that points to the file where the data set will be stored. The path is passed to the task via the default value, BLD / "data.pkl". To indicate that this file is a product we add some metadata to the argument.

The type hint Annotated[Path, Product] uses Annotated syntax. The first entry specifies the argument type (Path), and the second entry (Product) marks this argument as a product.

# Content of task_data_preparation.py.
from pathlib import Path
from typing import Annotated

import numpy as np
import pandas as pd
from my_project.config import BLD

from pytask import Product


def task_create_random_data(path: Annotated[Path, Product] = BLD / "data.pkl") -> None:
    rng = np.random.default_rng(0)
    beta = 2

    x = rng.normal(loc=5, scale=10, size=1_000)
    epsilon = rng.standard_normal(1_000)

    y = beta * x + epsilon

    df = pd.DataFrame({"x": x, "y": y})
    df.to_pickle(path)

Tip

If you want to refresh your knowledge about type hints, read this guide.

produces

Tasks can use produces as an argument name. Every value, or in this case path, passed to this argument is automatically treated as a task product. Here, the path is given by the default value of the argument.

# Content of task_data_preparation.py.
from pathlib import Path

import numpy as np
import pandas as pd
from my_project.config import BLD


def task_create_random_data(produces: Path = BLD / "data.pkl") -> None:
    rng = np.random.default_rng(0)
    beta = 2

    x = rng.normal(loc=5, scale=10, size=1_000)
    epsilon = rng.standard_normal(1_000)

    y = beta * x + epsilon

    df = pd.DataFrame({"x": x, "y": y})
    df.to_pickle(produces)

Now, execute pytask to collect tasks in the current and subsequent directories.

$ pytask
────────────────────────── Start pytask session ─────────────────────────
Platform: win32 -- Python 3.12.0, pytask 0.5.3, pluggy 1.3.0
Root: C:\Users\pytask-dev\git\my_project
Collected 1 task.

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Task                                              ┃ Outcome ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ <span class="termynal-dim">task_data_preparation.py::</span>task_create_random_data │ <span class="termynal-success">.</span>       │
└───────────────────────────────────────────────────┴─────────┘

<span class="termynal-dim">─────────────────────────────────────────────────────────────────────────</span>
<span class="termynal-success">╭───────────</span> <span style="font-weight: bold;">Summary</span> <span class="termynal-success">────────────╮</span>
<span class="termynal-success">│</span> <span style="font-weight: bold;"> 1  Collected tasks </span>           <span class="termynal-success">│</span>
<span class="termynal-success">│</span> <span class="termynal-success-textonly"> 1  Succeeded        (100.0%) </span> <span class="termynal-success">│</span>
<span class="termynal-success">╰────────────────────────────────╯</span>
<span class="termynal-success">─────────────────────── Succeeded in 0.06 seconds ───────────────────────</span>

Customize task names¶

Use the @task decorator to mark a function as a task regardless of its function name. You can optionally pass a new name for the task. Otherwise, pytask uses the function name.

from pytask import task

# The id will be ".../task_data_preparation.py::create_random_data".

@task
def create_random_data(): ...

# The id will be ".../task_data_preparation.py::create_data".

@task(name="create_data")
def create_random_data(): ...

Customize task module names¶

Use the configuration value task_files if you prefer a different naming scheme for the task modules. task_*.py is the default. You can specify one or multiple patterns to collect tasks from other files.