Basic principles

Provide – execute – collect

DataLad-remake (re)computes in three stages:

  • Provisioning: creates a temporary, partial copy of the dataset (worktree) to provide an isolated environment. Uses git-worktree (unless on Windows, where it clones the dataset instead). Checks out the given commit, if specified. Gets all input files in the worktree. Installs subdatasets as needed.

  • Execution: runs the computation in the provisioned worktree. Before running, gets and unlocks outputs which are already available; installs subdatasets as needed.

  • Collection: copies the output files from the provisioned worktree into the dataset (main worktree). Ensures that subdatasets which may receive outputs are installed before copying, and saves recursively afterwards.

Note: when a file is recomputed during datalad get, remake uses the commit originally recorded by datalad make to provide the worktree. This means that recompute will use the same versions of the input files as the original computation, even if the files got changed in the meantime.

Recomputing during datalad get assumes that the process is reproducible (bit-by-bit) and will error if the file checksum is different.

Because provisioning involves git worktree or clone, all Git-tracked files are automatically available in the provisioned worktree. Annexed files have to be declared as inputs, if they are to be provisioned.

Storing compute instructions

DataLad-remake stores compute instructions (command template, data dependencies, etc.) in text files committed to the dataset. These files are in the same branch as the (re)computed files. By default, DataLad remake expects commits adding compute instructions to be signed. For more details, see Files and Trusted execution.

DataLad-remake also uses git annex addurl to associate datalad-remake:// URLs with computed files in order to link them to the compute instructions. This URLs are handled by the remake special remote. They contain: label of the compute template, Git commit which added the compute specification, and the name (hash) of the compute specification file.

Prospective execution

By default, datalad make runs a computation, stores the compute specification, and associates it with the output files. However, it is also possible to skip computation, and only register compute instructions for future use by datalad get (for example, for files already computed via different means). This behavior can be chosen using the --prospective-execution flag.