Structure of Unpacked Experiments

While reprounzip is designed to allow users to reproduce an experiment without having to master the tool used to run it (e.g.: Vagrant and Docker), in some situations it might be useful to go behind the scenes and interact with the unpacked experiments directly.

This page describes in more details how the unpackers operate.

Note

Future versions of unpackers might work in a different way. No attempt is made to make unpacked experiments compatible across different versions of reprounzip. Bundles will always be compatible though.

Common Files across Unpackers

The unpacked directory contains the original configuration file as config.yml. In fact, the VisTrails integration relies on it.

A file named .reprounzip also marks the directory as an unpacked experiment. This is a Python pickle file containing a dictionary with various types of information:

  • unpacker maps to the unpacker’s name.
  • input_files is used by the uploader/downloader machinery to keep the state of the input files inside the experiment, as they may be replaced by the user or overwritten by runs.
  • Other information specific to the unpacker, as described next.

The directory Unpacker

The experiment directory contains:

  • The original configuration file config.yml.
  • The pickle file .reprounzip.
  • The tarball inputs.tar.gz, which contains the original files that were identifies as input files. This tarball is used for file restoration using upload :<input-id> (see Managing Input and Output Files).
  • A directory called root, which contains all the bundled files in their original path, with symbolic links to absolute paths rewritten to prepend the path to root.
unpacked-directory/
    .reprounzip
    config.yml
    inputs.tar.gz
    root/
        ...

When running the run command, the unpacker sets LD_LIBRARY_PATH and PATH to point inside root, and optionally DISPLAY and XAUTHORITY to the host’s ones.

The chroot Unpacker

The experiment directory contains:

  • The original configuration file config.yml.
  • The pickle file .reprounzip, which stores whether magic directories are mounted, as explained below.
  • The tarball inputs.tar.gz, which contains the original files that were identifies as input files. This tarball is used for file restoration using upload :<input-id> (see Managing Input and Output Files).
  • A directory called root, which contains all the bundled files in their original path, with no symbolic links rewritten and file ownership restored.
unpacked-directory/
    .reprounzip
    config.yml
    inputs.tar.gz
    root/
        dev/
        dev/pts/
        proc/
        ...

If a file is listed in the configuration file but wasn’t packed (i.e.: pack_files was set to false for a software package), such file is copied from the host; if this file does not exist on the host, a warning is shown when unpacking.

Unless --dont-bind-magic-dirs is specified when unpacking, the special directories /dev, /dev/pts, and /proc are mounted with mount -o bind from the host. Also, if /bin/sh or /usr/bin/env weren’t both packed, a static build of busybox is downloaded and put under /bin/busybox, and the missing binaries are created as symbolic links pointing to busybox.

Should you require a shell inside the experiment environment, you can use:

chroot root/ /bin/sh

The vagrant Unpacker

The experiment directory contains:

  • The original configuration file config.yml.
  • The pickle file .reprounzip, which stores whether a chroot is used, as explained below.
  • The tarball data.tgz, which is part of the .rpz file and used to populate the virtual machine (VM) when it gets created.
  • The setup script setup.sh.
  • The file rpz-files.list, which contains the list of files to unpack. This list is passed to tar -T while unpacking.
  • A Vagrantfile, which is used to build the VM.
unpacked-directory/
    .reprounzip
    config.yml
    data.tgz
    busybox
    Vagrantfile
    setup.sh
    rpz-files.list

Once vagrant up has been run by the setup/start command, a .vagrant subdirectory is created, and its content is managed by Vagrant (and appears to vary among different platforms).

Note that Vagrant drives VirtualBox or a similar virtualization software to run the VM. These will maintain state outside of the experiment directory. If you need to reconfigure or otherwise interact with the VM, you should do it from that virtualization software (e.g.: VirtualBox). The VM is named as the experiment directory with an additional suffix.

There are two modes for the virtual machine, controlled through command-line flags:

  • The default mode, --use-chroot, creates a chroot environment inside the VM at /experimentroot. This allows ReproZip to unpack very different file system hierarchies without breaking the base system of the VM (in particular, ssh needs to keep working for the VM to be usable). In this mode, software packages that were not packed (i.e.: pack_files was set to false) are installed in the VM and their required files are copied to the /experimentroot hierarchy. The software packages that were packed are simply copied over without any interaction with the VM’s system.
  • If --dont-use-chroot is used, no chroot environment is created. Files from software packages are never copied from the .rpz file; instead, they get installed from the package manager. Other files are simply unpacked in the VM system, possibly overwriting existing files. As long as reprounzip-vagrant manages to find a VM image with the same operating system as the original one, reproduction is expected to work reliably.

In the --use-chroot mode, a static build of busybox is downloaded and put under /experimentroot/busybox, and if /bin/sh wasn’t packed, it is created as a symbolic link pointing to busybox.

Uploading and downloading files from the environment is done via the shared directory /vagrant, which is the experiment directory mounted in the VM by Vagrant.

Should you require a shell inside the experiment environment, you can use:

vagrant ssh

Please be aware of whether --use-chroot is in use when accessing the experiment environment: in this case, the experiment’s files are located under /experimentroot.

The docker Unpacker

The experiment directory contains:

  • The original configuration file config.yml.
  • The pickle file .reprounzip, which stores the name of the images built by the unpacker, as explained below.
  • The tarball data.tgz, which is part of the .rpz file and used to populate the Docker container.
  • The file rpz-files.list, which contains the list of files to unpack. This list is passed to tar -T while unpacking.
  • A Dockerfile, which is used to build the original image.
unpacked-directory/
    .reprounzip
    config.yml
    data.tgz
    busybox
    rpzsudo
    Dockerfile
    rpz-files.list

Static builds of busybox and rpzsudo are always downloaded and put into the Docker image as /busybox and /rpzsudo, respectively.

Note that the docker command connects to a Docker daemon over a socket and that state will be changed there. The daemon might not be local; in particular, docker-machine might be used, which allows reprounzip-docker to be used on non-Linux machines, and the daemon might be in a virtual machine, on another host, or in the cloud. The docker unpacker will keep the environment variables set when calling Docker, notably DOCKER_HOST, so these can be set accordingly before running the unpacker.

Images and containers built by the unpacker are given a random name with the prefixes reprounzip_image_ and reprounzip_run_, respectively; they are cleaned up when the destroy command is invoked. There are two images of which reprounzip-docker keeps track in the .reprounzip pickle file: the initial image, i.e., the one built by setup/build by calling docker build, and the current image (initially the same as the initial image), which has been affected by a number of run and upload calls. Running the reset command returns to the initial image without having to rebuild. After each run invocation, the container is committed to a new current image so that state is kept.

A --detach option allows to start container and forget about them. reprounzip-docker leaves the container running and doesn’t wait for it; this means that you can start a service on a remote machine, but note that because that container won’t be committed to a new image, the side-effects of running it won’t affect later executions on the same unpacked folder.

Uploading files to the environment is done by running a simple Dockerfile that builds a new image. Downloading files is done via the docker cp command.