Structure of Unpacked Experiments¶
While reprounzip is designed to allow users to reproduce an experiment without having to master the tool used to run it (e.g.: Vagrant and Docker), in some situations it might be useful to go behind the scenes and interact with the unpacked experiments directly.
This page describes in more details how the unpackers operate.
Note
Future versions of unpackers might work in a different way. No attempt is made to make unpacked experiments compatible across different versions of reprounzip. Bundles will always be compatible though.
Common Files across Unpackers¶
The unpacked directory contains the original configuration file as config.yml. In fact, the VisTrails integration relies on it.
A file named .reprounzip also marks the directory as an unpacked experiment. This is a Python pickle file containing a dictionary with various types of information:
unpackermaps to the unpacker’s name.input_filesis used by the uploader/downloader machinery to keep the state of the input files inside the experiment, as they may be replaced by the user or overwritten by runs.Other information specific to the unpacker, as described next.
The directory Unpacker¶
The experiment directory contains:
The original configuration file
config.yml.The pickle file
.reprounzip.The tarball
inputs.tar.gz, which contains the original files that were identifies as input files. This tarball is used for file restoration usingupload :<input-id>(see Managing Input and Output Files).A directory called
root, which contains all the bundled files in their original path, with symbolic links to absolute paths rewritten to prepend the path toroot.
unpacked-directory/
.reprounzip
config.yml
inputs.tar.gz
root/
...
When running the run command, the unpacker sets LD_LIBRARY_PATH and PATH to point inside root, and optionally DISPLAY and XAUTHORITY to the host’s ones.
The chroot Unpacker¶
The experiment directory contains:
The original configuration file
config.yml.The pickle file
.reprounzip, which stores whether magic directories are mounted, as explained below.The tarball
inputs.tar.gz, which contains the original files that were identifies as input files. This tarball is used for file restoration usingupload :<input-id>(see Managing Input and Output Files).A directory called
root, which contains all the bundled files in their original path, with no symbolic links rewritten and file ownership restored.
unpacked-directory/
.reprounzip
config.yml
inputs.tar.gz
root/
dev/
dev/pts/
proc/
...
If a file is listed in the configuration file but wasn’t packed (i.e.: pack_files was set to false for a software package), such file is copied from the host; if this file does not exist on the host, a warning is shown when unpacking.
Unless --dont-bind-magic-dirs is specified when unpacking, the special directories /dev, /dev/pts, and /proc are mounted with mount -o bind from the host.
Also, if /bin/sh or /usr/bin/env weren’t both packed, a static build of busybox is downloaded and put under /bin/busybox, and the missing binaries are created as symbolic links pointing to busybox.
Should you require a shell inside the experiment environment, you can use:
chroot root/ /bin/sh
The vagrant Unpacker¶
The experiment directory contains:
The original configuration file
config.yml.The pickle file
.reprounzip, which stores whether a chroot is used, as explained below.The tarball
data.tgz, which is part of the.rpzfile and used to populate the virtual machine (VM) when it gets created.The setup script
setup.sh.The file
rpz-files.list, which contains the list of files to unpack. This list is passed totar -Twhile unpacking.A
Vagrantfile, which is used to build the VM.
unpacked-directory/
.reprounzip
config.yml
data.tgz
busybox
Vagrantfile
setup.sh
rpz-files.list
Once vagrant up has been run by the setup/start command, a .vagrant subdirectory is created, and its content is managed by Vagrant (and appears to vary among different platforms).
Note that Vagrant drives VirtualBox or a similar virtualization software to run the VM. These will maintain state outside of the experiment directory. If you need to reconfigure or otherwise interact with the VM, you should do it from that virtualization software (e.g.: VirtualBox). The VM is named as the experiment directory with an additional suffix.
There are two modes for the virtual machine, controlled through command-line flags:
The default mode,
--use-chroot, creates a chroot environment inside the VM at/experimentroot. This allows ReproZip to unpack very different file system hierarchies without breaking the base system of the VM (in particular,sshneeds to keep working for the VM to be usable). In this mode, software packages that were not packed (i.e.:pack_fileswas set tofalse) are installed in the VM and their required files are copied to the/experimentroothierarchy. The software packages that were packed are simply copied over without any interaction with the VM’s system.If
--dont-use-chrootis used, no chroot environment is created. Files from software packages are never copied from the.rpzfile; instead, they get installed from the package manager. Other files are simply unpacked in the VM system, possibly overwriting existing files. As long as reprounzip-vagrant manages to find a VM image with the same operating system as the original one, reproduction is expected to work reliably.
In the --use-chroot mode, a static build of busybox is downloaded and put under /experimentroot/busybox, and if /bin/sh wasn’t packed, it is created as a symbolic link pointing to busybox.
Uploading and downloading files from the environment is done via the shared directory /vagrant, which is the experiment directory mounted in the VM by Vagrant.
Should you require a shell inside the experiment environment, you can use:
vagrant ssh
Please be aware of whether --use-chroot is in use when accessing the experiment environment: in this case, the experiment’s files are located under /experimentroot.
The docker Unpacker¶
The experiment directory contains:
The original configuration file
config.yml.The pickle file
.reprounzip, which stores the name of the images built by the unpacker, as explained below.The tarball
data.tgz, which is part of the.rpzfile and used to populate the Docker container.The file
rpz-files.list, which contains the list of files to unpack. This list is passed totar -Twhile unpacking.A
Dockerfile, which is used to build the original image.
unpacked-directory/
.reprounzip
config.yml
data.tgz
busybox
rpzsudo
Dockerfile
rpz-files.list
Static builds of busybox and rpzsudo are always downloaded and put into the Docker image as /busybox and /rpzsudo, respectively.
Note that the docker command connects to a Docker daemon over a socket and that state will be changed there. The daemon might not be local; in particular, docker-machine might be used, which allows reprounzip-docker to be used on non-Linux machines, and the daemon might be in a virtual machine, on another host, or in the cloud. The docker unpacker will keep the environment variables set when calling Docker, notably DOCKER_HOST, so these can be set accordingly before running the unpacker.
Images and containers built by the unpacker are given a random name with the prefixes reprounzip_image_ and reprounzip_run_, respectively; they are cleaned up when the destroy command is invoked. There are two images of which reprounzip-docker keeps track in the .reprounzip pickle file: the initial image, i.e., the one built by setup/build by calling docker build, and the current image (initially the same as the initial image), which has been affected by a number of run and upload calls. Running the reset command returns to the initial image without having to rebuild. After each run invocation, the container is committed to a new current image so that state is kept.
A --detach option allows to start container and forget about them. reprounzip-docker leaves the container running and doesn’t wait for it; this means that you can start a service on a remote machine, but note that because that container won’t be committed to a new image, the side-effects of running it won’t affect later executions on the same unpacked folder.
Uploading files to the environment is done by running a simple Dockerfile that builds a new image. Downloading files is done via the docker cp command.