Delineation from related solutions ================================== DataLad (re)run --------------- DataLad provides `run `_ and `rerun `_ commands which are similar to ``make`` in that they also (re)execute arbitrary commands and record their impact on a dataset. However, there are key differences: - While ``make`` can be used to compute a file for the first time, there is no "remake" command. Instead, recomputation is done by the remake special remote during ``get`` and therefore should behave no different from file downloads typically performed by ``get``. - The remake special remote operates in a temporary worktree, set to the commit recorded by ``datalad make``. ``rerun`` operates in the dataset's main worktree and by default executes commands at HEAD (starting point can be specified with ``rerun --onto``). - The goal of the remake special remote is to recompute the contents of an annexed file, and it will produce an error if the file can not be reproduced. ``rerun`` can be used to verify computational reproducibility but also to re-run same code with different inputs, so it creates a new commit if the outputs differ. - The specification of data dependencies and compute instructions is different, with ``make`` using committed files and ``run`` using commit messages. Git-annex compute special remote -------------------------------- Git-annex provides a built-in `compute special remote `_ (see also: `computing annexed files `_). This is a parallel development to DataLad-remake, and as such there are key differences in both implementation and behavior: - Specification of compute instructions and file dependencies is different. Git-annex expects a compute program to communicate inputs and outputs using standard input / output. DataLad-remake expects a configuration file with command parameterization (compute template) and a list of input and output file patterns. - The storage of compute instructions is different; git-annex uses its VURL backend for annex keys and stores additional information in the git-annex branch (unlike DataLad-annex, it does not commit additional files to the same branch as the computed files). - The trust model is different: while DataLad-remake relies on GPG-signed commits, Git-annex compute relies on a list of allowed compute programs - By default, git-annex does not assume that the computed file needs to be bit-by-bit reproducible (it has the ``--reproducible`` option to enforce computational reproducibility). - Git-annex does not operate on subdatasets (submodules), all inputs need to be gettable from the given Git repository.