Delineation from related solutions¶

DataLad (re)run¶

DataLad provides run and rerun commands which are similar to make in that they also (re)execute arbitrary commands and record their impact on a dataset. However, there are key differences:

  • While make can be used to compute a file for the first time, there is no “remake” command. Instead, recomputation is done by the remake special remote during get and therefore should behave no different from file downloads typically performed by get.

  • The remake special remote operates in a temporary worktree, set to the commit recorded by datalad make. rerun operates in the dataset’s main worktree and by default executes commands at HEAD (starting point can be specified with rerun --onto).

  • The goal of the remake special remote is to recompute the contents of an annexed file, and it will produce an error if the file can not be reproduced. rerun can be used to verify computational reproducibility but also to re-run same code with different inputs, so it creates a new commit if the outputs differ.

  • The specification of data dependencies and compute instructions is different, with make using committed files and run using commit messages.

Git-annex compute special remote¶

Git-annex provides a built-in compute special remote (see also: computing annexed files). This is a parallel development to DataLad-remake, and as such there are key differences in both implementation and behavior:

  • Specification of compute instructions and file dependencies is different. Git-annex expects a compute program to communicate inputs and outputs using standard input / output. DataLad-remake expects a configuration file with command parameterization (compute template) and a list of input and output file patterns.

  • The storage of compute instructions is different; git-annex uses its VURL backend for annex keys and stores additional information in the git-annex branch (unlike DataLad-annex, it does not commit additional files to the same branch as the computed files).

  • The trust model is different: while DataLad-remake relies on GPG-signed commits, Git-annex compute relies on a list of allowed compute programs

  • By default, git-annex does not assume that the computed file needs to be bit-by-bit reproducible (it has the --reproducible option to enforce computational reproducibility).

  • Git-annex does not operate on subdatasets (submodules), all inputs need to be gettable from the given Git repository.

datalad-remake

Navigation

  • Design
    • Basic principles
    • Files
    • Trusted execution
    • Delineation from related solutions
  • High-level API commands
  • Special remotes

Related Topics

  • Documentation overview
    • Design
      • Previous: Trusted execution
      • Next: High-level API commands
©2024, DataLad team. | Powered by Sphinx 9.1.0 & Alabaster 1.0.0 | Page source