EMiL container

Tuesday, June 07th, 2016 | Author:

One of the proejcts we are currently working on is the DFG-founded EMiL project (Emulation of Multimedia Objects in Libraries). Its main goal is to provide our EaaS framework for libraries to use in their reading rooms. While this project has also spawned off our USB live systems, its latest outcome is a Docker container that allows (comparatively) easy access to emulation for everyone who wants to run born-digital objects.

The container relies on three auxiliary data sources:

  1. An image archive (it contains disk images of operating systems)
  2. An object archive (e.g. CDROMs, Floppys, etc – well, you want to access them, duh)
  3. A directory of environment descriptions (they are merely meta-metadata to track environments from the image archive)

Image Archive

The image archive is one of the core components of our EaaS software and has been so for several years. It contains the disk images that are required by the emulators in order to boot up a virtual machine and all metadata necessary for the emulator to do so. There already has been another blog-post about the image archive and how to set it up for use in a container, so I’ll refer to that.

For the impatient, there’s an image archive with free images available for download. Just run these commands:

curl -O http://bw-fla.uni-freiburg.de/image-archive.tgz
tar xf image-archive.tgz
cd image-archive/nbd-export/ 
ln -s ../images/base/doom.raw 
ln -s ../images/base/hatari_tOS206us.img 
ln -s ../images/base/qemu-i386-DOS_6.20_CDROM.raw

Object Archive

The object archive recently had its own article in this blog and contains the objects you want to access. The prepared docker image (wait for it!) has a preconfigured file-backend for the actual data. The file structure is pretty simple:

object-archive/
 |- object1/
 |   \ iso/
 |      |- disk1.iso
 |      \  disk2.iso
 |- object2/
 |   \ floppy/
 |      |- disk1.raw
 |     ...
...

As you can see, each object is represented by its own subdirectory. The directory’s name is also its id that is to be used later to access it. Inside this directory, there is either an iso/ subdirectory, or a floppy/ subdirectory, or both. The actual images (either CD-ROM images or floppy images) are located within their respective directories according to their media type. And that’s basically it, just copy your objects into a directory structure like above and you have your object archive ready.

For those using Rosetta, Marcus Bitzl (Bayrische Staatsbibliothek) has implemented a Rosetta-Object-Archive adapter (https://github.com/emil-emulation/emil-rosetta), allowing to retrieve objects directly from the Rosetta repository.

Environment Descriptions

Within our EMiL project, it became clear that the partnering institutions are only remotely interested in the technical metadata of our EaaS framework itself. This is especially true for our Emulation Environments that contain all the metadata necessary for re-enacting a virtual computer system. Information like memory, which hardware bus a drive is connected to, etc. is, at best, only remotely interesting if all you want is run a single multimedia object. Consequently, the institutions want a more abstract environment description that specifies the software environment, i.e. the operating system, installed software like a PDF viewer, text processors and so on. This metadata is stored separately from our technical emulation environments in the environment descriptions.

An environment description may look like this:

{ 
        "envId":"4404",
        "title":"Windows 98 (SE)",
        "description":"Windows 98 (SE)",
        "os": "Microsoft Windows 98 Second Edition",
        "version": "01052016",
        "emulator": "VirtualBox"
}

The envId field refers to our classical emulation environments with information like the harddisk image, available drives for removable media, CPU type and so on. The other fields in the environment descriptions are purely logical metadata. They are extensible and can identify e.g. the operating system, installed software and other information that the institution deems necessary or useful for managing their software environments. Curators or users can then select an appropriate environment according to their requirements, for instance select a Windows platform with MS Office installed, because the documentation for the multimedia object is provided as a .doc file.

In its current implementation, the environment descriptions are simple JSON files with one description per file, all stored in a single directory.

The container

Finally we come to the actual container. You should have three directories by now, image-archive/, object-archive/ and environments/.

The EMiL container is available from the docker.io hub and can be pulled via

docker pull eaas/bwfla:emil

The accompanying run script can be downloaded from the EMiL github repository:

./run-emil.sh --public-ip-port 132.230.8.226:8080 --archive-dir ./image-archive/ --environments-dir ./environments/ --objects ./object-archive/

The IP address should be the IP address from the machine you start the container on. Besides the port 8080 to communicate with the JBoss application server, the container also opens up port 1080 to provide access to our new UI. We have developed two shiny new Javascript-based UIs. The admin UI where you can test all the available operating system images is available from your browser at http://ip:1080/emil-admin-ui. To actually access a digital object, use http://ip:1080/emil-ui/. This URL will provide you with a list of available objects from the object-archive and auto-detect a suitable environment (Note: this feature is not available with our publicly available, limited image archive!).

Category: Uncategorized | Leave a Comment

Unifying Access to Virtual Disk Images – The EaaS Way

Tuesday, March 01st, 2016 | Author:

With the multitude of emulators EaaS currently supports – what is the best/optimal/ideal/future-proof file format for disk images? Until now, we had a simple answer to that: The best option is the format your desired emulator supports. If you want to use Qemu, you have a very wide range of choices available. Since most emulators support RAW disk images, should that be the preferred choice in general? Unfortunately, if you want to use VirtualBox, you cannot use RAW images, your image needs to be in VDI or VMDK format. You can still use RAW images for archiving purposes and convert images on demand from RAW to VDI when the environment is started, for instance using qemu-img. This, however, implies copying (migrating) the RAW file, and in the case of a typical Windows XP images this means copying about 20GB of data. This not only is inefficient in terms of disk and bandwidth use (most of the times, the environment will not need all of the 20GB), it also impairs the user experience as we have to wait for the image migration to finish before starting the environment. A further set of problematic images, are forensically packaged images in EnCase (EWF) or AFF format. For the preservation community, these image formats are of interest because they embed fixity information as part of the disk image itself. In order to start such an image with VirtualBox, we first have to extract the RAW image and convert it to VDI in a second step.

As emulation is what we are good at, it would of course be nice to just to emulate all the different image file formats for our emulators and have simple storage file format such as RAW or EWF.

xmount is a FUSE driver that emulates different disk image formats (VDI/VHD, VMDK and, of course, RAW) without the need for a complete copy. Hence, one can FUSE-mount a RAW image file as VDI using xmount. The resulting VDI file, however, is only virtual. All conversions required to turn the RAW file into a VDI are made on-the-fly. Unfortunately, the input formats supported by xmount are limited as it only supports EWF, AFF and RAW images as input. Additionally, all data written by the emulator to the virtual VDI file are stored separately in a proprietary block-diff file which was initially designed to be discarded once the image is unmounted again. Nevertheless, xmount is a good start but not yet sufficient for the purposes of EaaS.

The EaaS image archive exports images read-only. As a first step, a writeable layer is required to make a disk image usable with an emulator. To this purpose, a new empty qcow2 disk image is created, linking to a backing file from our image archive. Any data written by the emulator is stored in the qcow2 image, any modified data is then also read from the qcow, but unmodified data is read directly from the image archive.

The following example creates an empty writeback image in the qcow2 format:
qemu-img -f qcow2 -o backing_file=http://archive/object.raw writeback.cow

To support this workflow, xmount required some extensions. To support qcow2 and all its features, for instance using a HTTP URL as backing file, we have added the Qemu block driver to xmount as an input library. Through this, support of input formats of xmount has be widely improved, allowing us to directly mount all image file types supported by Qemu to be exported either as RAW, VDI/VHD or VMDK.

Because we want changes to be stored back into our qcow2 image to create derivatives of an environment (e.g. by installing additional software to a bare operating system image), we also added a write through option to xmount. This causes write operations not to end in xmount’s own proprietary file but to go back through Qemu’s block driver. In case of a qcow2 image as input, this means changes are written to this same image, preserving the link between the modified environment and its backing file.

The following command:

mkdir mountDir
xmount --in qemu writeback.cow --out vdi --cache writethrough --inopts qemuwritable=true mountDir

first creates a target directory (a mountpoint) and then mounts the writeback.cow image to mountDir. In this directory a file called object.raw.vdi can be found and can be used as a virtual disk by VirtualBox, e.g. reading data from the file http://archive/object.raw. Any data written is stored in the writeback image.

Source code at Github: https://github.com/eaas-framework/xmount

There are also pre-built debs for Ubuntu 14.04 available. First add our repository to your sources list:

echo "deb http://packages.bw-fla.uni-freiburg.de/ trusty bwfla" > /etc/apt/sources.list.d/bwfla.list

Prioritize the installation of our packages

echo -e "Package: *\nPin: release c=bwfla\nPin-Priority: 1001" > /etc/apt/preferences.d/pin-bwfla.pref

And finally update the packages list and install xmount and its dependencies

apt-get update
apt-get install qemu-utils qemu-block-extra libxmount-input-qemu xmount libcurl3 libcurl3-gnutls

Category: R&D | Leave a Comment