Portability¶
This guide explains what you need to do to move a pytask project between machines and why the lockfile is central to that process.
See also
The lockfile format and behavior are documented in the reference guide.
How to port a project¶
Use this checklist when you move a project to another machine or environment.
Update state once on the source machine.
Run a normal build so
pytask.lockis up to date:$ pytask build
If you already have a recent lockfile and up-to-date outputs, you can skip this step.
Ship the right files.
Commit
pytask.lockto your repository and move it with the project. In practice, you should move:the project files tracked in version control (source, configuration, data inputs and
pytask.lock)the build artifacts you want to reuse (often in
bld/if you follow the tutorial layout)the
.pytaskfolder in case you are using the data catalog and it manages some of the files
Files outside the project
If you have files outside the project root (the folder with the
pyproject.tomlfile), you need to make sure that the same relative layout exists on the target machine.Run pytask on the target machine.
When states match, tasks are skipped. When they differ, tasks run and the lockfile is updated.
What makes a project portable¶
There are two things that must stay stable across machines:
First, task and node IDs must be stable. An ID is the unique identifier that ties a task
or node to an entry in pytask.lock. pytask builds these IDs from project-relative
paths anchored at the project root, so most users do not need to do anything. If you
implement custom nodes, make sure their IDs remain project-relative and stable across
machines.
Second, state values must be portable. The lockfile stores opaque state strings from
PNode.state() and PTask.state(), and pytask uses them to decide whether a task is up
to date. Content hashes are portable; timestamps or absolute paths are not. This mostly
matters when you define custom nodes or custom hash functions.
Tips for stable state values¶
Prefer file content hashes over timestamps for custom nodes.
For
PythonNodevalues that are not natively stable, provide a custom hash function.Avoid machine-specific paths or timestamps in custom
state()implementations.
See also
For custom nodes, see Writing custom nodes. For hashing guidance, see Hashing inputs of tasks.
Cleaning up the lockfile¶
pytask.lock is updated incrementally. Entries are only replaced when the corresponding
tasks run. If tasks are removed or renamed, their old entries remain as stale data and
are ignored.
To clean up stale entries without deleting the file, run:
$ pytask build --clean-lockfile
This rewrites the lockfile after a successful build with only the currently collected tasks and their current state values.