EMiL container

Tuesday, June 07th, 2016 | Author:

One of the proejcts we are currently working on is the DFG-founded EMiL project (Emulation of Multimedia Objects in Libraries). Its main goal is to provide our EaaS framework for libraries to use in their reading rooms. While this project has also spawned off our USB live systems, its latest outcome is a Docker container that allows (comparatively) easy access to emulation for everyone who wants to run born-digital objects.

The container relies on three auxiliary data sources:

  1. An image archive (it contains disk images of operating systems)
  2. An object archive (e.g. CDROMs, Floppys, etc – well, you want to access them, duh)
  3. A directory of environment descriptions (they are merely meta-metadata to track environments from the image archive)

Image Archive

The image archive is one of the core components of our EaaS software and has been so for several years. It contains the disk images that are required by the emulators in order to boot up a virtual machine and all metadata necessary for the emulator to do so. There already has been another blog-post about the image archive and how to set it up for use in a container, so I’ll refer to that.

For the impatient, there’s an image archive with free images available for download. Just run these commands:

curl -O http://bw-fla.uni-freiburg.de/image-archive.tgz
tar xf image-archive.tgz
cd image-archive/nbd-export/ 
ln -s ../images/base/doom.raw 
ln -s ../images/base/hatari_tOS206us.img 
ln -s ../images/base/qemu-i386-DOS_6.20_CDROM.raw

Object Archive

The object archive recently had its own article in this blog and contains the objects you want to access. The prepared docker image (wait for it!) has a preconfigured file-backend for the actual data. The file structure is pretty simple:

object-archive/
 |- object1/
 |   \ iso/
 |      |- disk1.iso
 |      \  disk2.iso
 |- object2/
 |   \ floppy/
 |      |- disk1.raw
 |     ...
...

As you can see, each object is represented by its own subdirectory. The directory’s name is also its id that is to be used later to access it. Inside this directory, there is either an iso/ subdirectory, or a floppy/ subdirectory, or both. The actual images (either CD-ROM images or floppy images) are located within their respective directories according to their media type. And that’s basically it, just copy your objects into a directory structure like above and you have your object archive ready.

For those using Rosetta, Marcus Bitzl (Bayrische Staatsbibliothek) has implemented a Rosetta-Object-Archive adapter (https://github.com/emil-emulation/emil-rosetta), allowing to retrieve objects directly from the Rosetta repository.

Environment Descriptions

Within our EMiL project, it became clear that the partnering institutions are only remotely interested in the technical metadata of our EaaS framework itself. This is especially true for our Emulation Environments that contain all the metadata necessary for re-enacting a virtual computer system. Information like memory, which hardware bus a drive is connected to, etc. is, at best, only remotely interesting if all you want is run a single multimedia object. Consequently, the institutions want a more abstract environment description that specifies the software environment, i.e. the operating system, installed software like a PDF viewer, text processors and so on. This metadata is stored separately from our technical emulation environments in the environment descriptions.

An environment description may look like this:

{ 
        "envId":"4404",
        "title":"Windows 98 (SE)",
        "description":"Windows 98 (SE)",
        "os": "Microsoft Windows 98 Second Edition",
        "version": "01052016",
        "emulator": "VirtualBox"
}

The envId field refers to our classical emulation environments with information like the harddisk image, available drives for removable media, CPU type and so on. The other fields in the environment descriptions are purely logical metadata. They are extensible and can identify e.g. the operating system, installed software and other information that the institution deems necessary or useful for managing their software environments. Curators or users can then select an appropriate environment according to their requirements, for instance select a Windows platform with MS Office installed, because the documentation for the multimedia object is provided as a .doc file.

In its current implementation, the environment descriptions are simple JSON files with one description per file, all stored in a single directory.

The container

Finally we come to the actual container. You should have three directories by now, image-archive/, object-archive/ and environments/.

The EMiL container is available from the docker.io hub and can be pulled via

docker pull eaas/bwfla:emil

The accompanying run script can be downloaded from the EMiL github repository:

./run-emil.sh --public-ip-port 132.230.8.226:8080 --archive-dir ./image-archive/ --environments-dir ./environments/ --objects ./object-archive/

The IP address should be the IP address from the machine you start the container on. Besides the port 8080 to communicate with the JBoss application server, the container also opens up port 1080 to provide access to our new UI. We have developed two shiny new Javascript-based UIs. The admin UI where you can test all the available operating system images is available from your browser at http://ip:1080/emil-admin-ui. To actually access a digital object, use http://ip:1080/emil-ui/. This URL will provide you with a list of available objects from the object-archive and auto-detect a suitable environment (Note: this feature is not available with our publicly available, limited image archive!).

Category: Uncategorized | Leave a Comment

EaaS Ubuntu Packages (14.04 LTS) BETA

Tuesday, June 07th, 2016 | Author:

Installing EaaS has always been a difficult task, since many different software components need to installed and configured. To ease this process we have created Ubuntu packages (currently 14.04 LTS only, 16.04 is on the way).

Preparations
To retrieve our  packages, you need to add our repository to your APT-sources list. Create a file /etc/apt/sources.list.d/bwfla.list containing the following line:

deb http://packages.bw-fla.uni-freiburg.de/ trusty bwfla

Furthermore, you need to prioritize our packages, as we provide patched versions of some system packages. Add a file /etc/apt/preferences.d/pin-bwfla.pref with the following content:

Package: *
Pin: release c=bwfla
Pin-Priority: 1001

Finally update your packages database (apt-get update).

Installation

The EaaS system consists of different components:

  1. eaas-common (dependency of all packages, will be installed automatically)
  2. eaas-emulators
  3. eaas-gateway
  4. eaas-image-archive
  5. eaas-fits
  6. eaas-object-archive
  7. eaas-software-archive
  8. eaas-server
  9. eaas-workflows

All components can either be installed on a single machine, or deployed individually.

Note: If install or re-configure an EaaS package while the application server is already running, make sure to restart the server to enable new features or settings (service bwfla restart).

EaaS Common
The eaas-common module installs common software dependencies (e.g. xmount, shell scripts) and prepares the system for running EaaS. This package is a dependency of all other EaaS modules and does not need to be installed manually.

common-1

Say “yes” if you want to follow to use the installation wizard to setup your EaaS instance, or say no if you want to configure your instance manually.

common-2

Set IP address and port number the EaaS server should listen to.

EaaS Emulators
The eaas-emulators package is preparing the system to run emulators and act as an EmuComp service. This package also installs all supported emulators.

emucomp-1

To make the system scaleable, the EmuComp service needs to be accessible for any client using emulators. Please provide the IP/port or FQDN how a client can access this service. Usually, this is the same IP/port as entered for eaas-common. Sometimes, however, the service is masked / proxied / etc. and is reachable through a different FQDN / port.

emucomp-2

EaaS Gateway
The eaas-gateway takes request from clients and dispatches these requests to EmuComp instances.

eaas-1

Add a list of configured EmuComp as instances separated by spaces, e.g. “1.2.3.4:8080,4 2.3.4.5:8181,8” — meaning that the instance running at 1.2.3.4 port 8080 provides 4 CPU cores and instance 2.3.4.5 running at port 8181 provides 8 CPU cores.

eaas-2

 

EaaS Image Archive
The eaas-image-archive package installs a basic image archive service.

image-archive-1

You need to provide a path to the image archive (currently file-system based).

image-archive-2

If the path exists, it is assumed that there is a valid (file-base) image archive. If the path does not exist, a new file-based image-archive will be instantiated and pre-populated with simple example images.

image-archive-3

Finally, one needs to decide how the image data is delivered to the emulators. Currently, two options are available: the network block device protocol (NBD) and through HTTP. NBD is usually faster and more bandwidth efficient than HTTP, but requires that EmuComp instance are able to connect to the image-archive through port 10809. Using HTTP requires only that port 80 is accessible for EmuComp instances.

EaaS Object-Archive
The eaas-object-archive package installs a basic object archive facade.

object-archive-1

The object archive facade is capable to manage multiple object archives. At this point only the directory has to be set where the individual object-archive configurations can be found.

object-archive-2

EaaS Software-Archive
The eaas-software-archive currently provides a thin meta-data layer on top of an object-archive. We will update information on using the software-archive soon. Currently WIP.

eaas-software-archive

 

EaaS Server
The eaas-server package is also a dependency package, installed by one of the functional modules. The server package installs and configures the EaaS application server. Optionally an upstart service will be installed, such that the EaaS services are started when the machine is booted.

server-1

Note: the eaas-server package does not start the server during or after installation. You need to either to reboot the machine or start the server manually (service start bwfla).

EaaS Workflows
The eaas-workflows package installs and configures a web-based UI implementing example workflows.

eaas-workflows-1

 

Configure the EaaS Gateway to be used. Provide IP/FQDN and port to a gateway instance. If the current instance has been configured to act as EaaS Gateway, the values are preset to its current configuration.

eaas-workflows-2

Configure the EaaS Image Archive to be used. Provide IP/FQDN and port to a gateway instance. If the current instance has been configured to act as EaaS Image Archive, the values are preset to its current configuration.

eaas-workflows-3

 

Configure the EaaS Object Archive to be used. Provide IP/FQDN and port to a gateway instance. If the current instance has been configured to act as EaaS Object Archive, the values are preset to its current configuration.

eaas-workflows-4

Category: R&D | Leave a Comment

EaaS Object Archive

Thursday, March 31st, 2016 | Author:

The EaaS object archive (or technically the EaaS object archive facade) is one key component to support seamless integration of institutional object-repositories with EaaS and to make EaaS a cost-effective and scalable access solution for born-digital content.

By design, EaaS separates an emulated computer system (emulation environment) and user objects. The emulation environment consists of the emulator configuration in combination with a bootable disk image containing an installed operating system with software applications etc. The user objects, on the other hand, are all media objects such as CD-ROMs or individual files that are to be rendered in the emulated computer system. Objects and emulation environment are combined only if a user requests a certain object to be rendered using a suitable emulation environment.

The main rationale for this design are preservation planning complexity and costs: as the number of objects are in the thousands or millions, the number of necessary emulation environments is rather small, typically about 10-20 “base” environments representing typical computer systems of technological epochs. By separating objects from their rendering environments, emulation-based preservation planning strategies can focus on a small set of (emulation environments) that have to be kept alive. The objects remain in a dedicated repository.

To make objects accessible with EaaS we have designed a flexible “object archive facade” to translate between the technical requirements of the emulation framework and a specific object repository.

Interfaces and Data Types

To allow EaaS to retrieve and render user objects an object repository provider has to implement a simple Java interface:

public interface DigitalObjectArchive
{
    public FileCollection getObjectReference(String objectId);

    // object archive identification 
    String getName();

    // optional, returns a list of object IDs
    public List<String> getObjectList(); 
}

The most important method to be implemented is getObjectReference()
which takes an object ID and returns a FileCollection description of the object. The FileCollection represents all individual files of an object as referenceable URLs. As a JAXB XML representation looks like:

<FileCollection id="OID1">
    <FileCollectionEntry 
       id="CD1"
       url="https://repo/id=CD1.iso" 
       type="CDROM" />
    [...]
</FileCollection>

Each FileCollectionEntry represents a media image (currently only floppy, disk and cdrom are supported) as a URL to the data stream. The URL may contain https(s), nbd or file transport protocols and ideally support random access to the data stream (i.e. for HTTP(S), the server has to support HTTP Range Requests). The URLs provided by the FileCollection need to be directly accessible by the EaaS infrastructure. Hence, a repository-specific implementation of getObjectReference() provides accessible links to the internal data streams for a given object, or retrieves the object to a temporary storage and creates filesystem links (in the case of direct file access) to these files.

Category: R&D | Leave a Comment

Unifying Access to Virtual Disk Images – The EaaS Way

Tuesday, March 01st, 2016 | Author:

With the multitude of emulators EaaS currently supports – what is the best/optimal/ideal/future-proof file format for disk images? Until now, we had a simple answer to that: The best option is the format your desired emulator supports. If you want to use Qemu, you have a very wide range of choices available. Since most emulators support RAW disk images, should that be the preferred choice in general? Unfortunately, if you want to use VirtualBox, you cannot use RAW images, your image needs to be in VDI or VMDK format. You can still use RAW images for archiving purposes and convert images on demand from RAW to VDI when the environment is started, for instance using qemu-img. This, however, implies copying (migrating) the RAW file, and in the case of a typical Windows XP images this means copying about 20GB of data. This not only is inefficient in terms of disk and bandwidth use (most of the times, the environment will not need all of the 20GB), it also impairs the user experience as we have to wait for the image migration to finish before starting the environment. A further set of problematic images, are forensically packaged images in EnCase (EWF) or AFF format. For the preservation community, these image formats are of interest because they embed fixity information as part of the disk image itself. In order to start such an image with VirtualBox, we first have to extract the RAW image and convert it to VDI in a second step.

As emulation is what we are good at, it would of course be nice to just to emulate all the different image file formats for our emulators and have simple storage file format such as RAW or EWF.

xmount is a FUSE driver that emulates different disk image formats (VDI/VHD, VMDK and, of course, RAW) without the need for a complete copy. Hence, one can FUSE-mount a RAW image file as VDI using xmount. The resulting VDI file, however, is only virtual. All conversions required to turn the RAW file into a VDI are made on-the-fly. Unfortunately, the input formats supported by xmount are limited as it only supports EWF, AFF and RAW images as input. Additionally, all data written by the emulator to the virtual VDI file are stored separately in a proprietary block-diff file which was initially designed to be discarded once the image is unmounted again. Nevertheless, xmount is a good start but not yet sufficient for the purposes of EaaS.

The EaaS image archive exports images read-only. As a first step, a writeable layer is required to make a disk image usable with an emulator. To this purpose, a new empty qcow2 disk image is created, linking to a backing file from our image archive. Any data written by the emulator is stored in the qcow2 image, any modified data is then also read from the qcow, but unmodified data is read directly from the image archive.

The following example creates an empty writeback image in the qcow2 format:
qemu-img -f qcow2 -o backing_file=http://archive/object.raw writeback.cow

To support this workflow, xmount required some extensions. To support qcow2 and all its features, for instance using a HTTP URL as backing file, we have added the Qemu block driver to xmount as an input library. Through this, support of input formats of xmount has be widely improved, allowing us to directly mount all image file types supported by Qemu to be exported either as RAW, VDI/VHD or VMDK.

Because we want changes to be stored back into our qcow2 image to create derivatives of an environment (e.g. by installing additional software to a bare operating system image), we also added a write through option to xmount. This causes write operations not to end in xmount’s own proprietary file but to go back through Qemu’s block driver. In case of a qcow2 image as input, this means changes are written to this same image, preserving the link between the modified environment and its backing file.

The following command:

mkdir mountDir
xmount --in qemu writeback.cow --out vdi --cache writethrough --inopts qemuwritable=true mountDir

first creates a target directory (a mountpoint) and then mounts the writeback.cow image to mountDir. In this directory a file called object.raw.vdi can be found and can be used as a virtual disk by VirtualBox, e.g. reading data from the file http://archive/object.raw. Any data written is stored in the writeback image.

Source code at Github: https://github.com/eaas-framework/xmount

There are also pre-built debs for Ubuntu 14.04 available. First add our repository to your sources list:

echo "deb http://packages.bw-fla.uni-freiburg.de/ trusty bwfla" > /etc/apt/sources.list.d/bwfla.list

Prioritize the installation of our packages

echo -e "Package: *\nPin: release c=bwfla\nPin-Priority: 1001" > /etc/apt/preferences.d/pin-bwfla.pref

And finally update the packages list and install xmount and its dependencies

apt-get update
apt-get install qemu-utils qemu-block-extra libxmount-input-qemu xmount libcurl3 libcurl3-gnutls

Category: R&D | Leave a Comment

Curate Gear 2016

Thursday, January 21st, 2016 | Author:

Slides of the this year’s Curate Gear demo are online.

The demo script can be downloaded from our GitHub page.

Category: Events, R&D | Leave a Comment

Boot to Emulation – EaaS as a Local Option (Beta)

Wednesday, October 21st, 2015 | Author:

Complementary to the release of the EaaS Docker containers, we’ve created a self-contained USB live system. The USB live-system boots a computer directly and runs emulated environments on local hardware. Running emulators on local machines (e.g. standard PCs) can be an interesting alternative for reading-room setups or museum displays, where cluster- or cloud-computing options are not suitable. Local execution of emulators allows to connect peripherals, such as joystick, printers, CRT monitors, but also supports an improved user experience for some applications (e.g. games, software based art, etc.) by providing native fullscreen and reduced (input-)latency.

The current live-system offers three different options:
  • A complete self-contained system
  • A self-contained system, which integrates with an existing EaaS setup
  • A boot-to-emulator system, suitable for public displays etc, which directly boots into a preconfigured emulation environment
System requirements
  • at least 2 Gb (4 Gb recommended) of RAM
  • boot option from USB (USB 3.0 recommended)
  • a USB pendrive/stick, at least 8 Gb
  • optional a cable connected network card
The Self-contained EMiL Live-System
First, download the USB image here (http://bw-fla.uni-freiburg.de/usb-demo.img) and write it to an USB pendrive. We recommend to use a fast USB 3.0 stick, with at least 8 GB capacity.

To write the image to the USB drive we recommend Linux and MacOSX users to use „dd“. E.g.

sudo dd if=/home/klaus/usb-demo.img of=/dev/<your usb device>
Windows users my use a tool like the win32 disk image writer (http://sourceforge.net/projects/win32diskimager/) or similar tools.

For a fully self-contained setup, just boot directly from the USB stick. We have preloaded the stick with some simple examples for demo purposes. A short cheat-sheet:
  •         stop an emulator with CRTL-ALT-ESC
  •         toggle between fullscreen and web view CRTL-ALT-F
In the non-fullscreen mode, the user may have options to cite an environment, create a screenshot, change a medium, etc…
Add your own images
If you want to add your own images, you can mount the USB stick on your desktop computer and you’ll see two partitions. The first partition contains a read-only ubuntu based live-distribution, the second partition, called „emil-data“ contains two folders:
  • configs/  contains user-writeable configuration files
  • image-archive/ contains a valid image-archive structure with some examples
You can copy your disk images to the image-archive/images/base folder and create meta-data accordingly. We will write a follow-up article on creating appropriate meta-data.
Currently, the second partition is rather small, but can be resized. Write the USB image to a large pen-drive, delete the second partition and re-creating it. Make sure that you retain the configs and image-archive directory and set the label of the second partition to „emil-data“. Any filesystem supported by Linux should be OK. We’ve chosen the proprietary filesystem exFAT (https://en.wikipedia.org/wiki/ExFAT) to support virtual disk images larger than 4Gb and be compatible with all major desktop operating systems.
Integrate with an EaaS Environment
Furthermore, the USB live-system integrates well with an existing EaaS environment, e.g. for hybrid (local / web) usage or as described in the following example a curation and efficient deployment tool in a reading-room environment.
To maintain your emulated environment centrally, in a first step you need to setup EaaS workflows and image-archive. Ideally you should start using our pre-packaged Docker containers (see also: http://bw-fla.uni-freiburg.de/wordpress/?p=817:
cd image-archive/nbd-export/ 
ln -s ../images/base/doom.raw 
ln -s ../images/base/hatari_tOS206us.img 
ln -s ../images/base/qemu-i386-DOS_6.20_CDROM.raw

Finally run:

./run-full-setup.sh --public-ip-port 192.168.99.100:8080 --docker  eaas/bwfla:demo-august-15 --archive-dir /Users/klaus/Downloads/image-archive

with a valid IP for your machine and archive-dir pointing to your image-archive.

Now you have can use the bwFLA workflows via web browser (e.g. open http://192.168.99.100:8080) but you also have a public image-archive running, serving the exported environments. To use this archive from the USB stick:
  • delete the image-archive folder from the second partition
  • edit configs/remote/WorkflowsConf.xml: set the <archiveGW> value to the IP and port of your docker instance. Make sure that the machine booting from USB has a cable network connection and the network is configured via DHCP. Also make sure that the USB machine is able to reach your Docker instance.

Boot to Emulator
Finally, the USB stick can be used for booting directly into a specific environment. For this simply put a file named “environment-id.txt” into the top-level directory of the second (“emil-data”) partition. The file should contain only the ID of the environment to load. You can find the ID of  an environment in its meta-data.
Note: this version is not tamper-proof. It is not recommended to use it for public displays. If you need a tamper-proof version please contact us.

The next steps ™
  • Improve usability and workflows
  • The current version is static in particular w.r.t. emulator curation. The next version will support centrally maintained, containerized components, in particular emulators. When the system starts it will check for updated software packages and will download new components if required.
  • Update of available workflows
  • Deployment for reading-rooms via PXE

Category: bwFLA Projekt, DP Projekte, R&D | Leave a Comment

Emulation as a Service as a Docker (Beta)

Thursday, October 08th, 2015 | Author:

Finally, Isgandar found some time to play with Docker. Docker or containers in general allow to shift installation configuration effort from end-users (admins) to the developers. As the bwFLA EaaS-framework is not easy to install that seemed a perfect fit.

Some prerequisites:

For this example we chose to divide & conquer all EaaS component, ie. every EaaS component is deployed as an individual container. This way, we can explain the functionality of each container and the configuration becomes readable. All explanations refer to the runner scripts from our github repository.
Common parameters (mandatory for all scripts)
  • We start all scripts with sudo. If you are unsure, you can run the container in non-priviliged mode, however with some functionality not available (e.g. uploading files directly into the emulated environment).
  • Choose the flavour / version the script should start with –docker eaas/bwfla:demo-august-15
  • As we package application servers, each component needs to be network accessible. Docker takes care of temporary setting of ‘iptables’ rules such that client doesn’t need to perform mapping of ‘host <-> guest’ ports (i.e. NAT). Make sure that your Firewall doesn’t interfere with the rules set-up by Docker application. Any IP valid for any network interface of the host machine, including 127.0.0.1 should be good.Example:  –public-ip-port 1.2.3.4:8080
Currently EaaS framework consists of at least four components (actually there are five, but the object-archive has not been packaged yet):
  • EmuComp – A container capable of running emulators. Typically such containers are deployed in cluster or cloud environments.
     sudo ./run-flavor-emucomp.sh --docker eaas/bwfla:demo-august-15 --public-ip-port 1.2.3.4:8081

    This script starts ’emucomp’ module, which is responsible for running individual emulators on dedicated compute nodes (possibly in a cluster/cloud environment). As of now emulators supported in the docker are the following: Qemu, SheepShaver, Basilisk, DosBox, Hatari.

  • EaaS– A container acting as gateway, assigning user-requests to EmuComps in a cloud / cluster deployment. This version only allows only fixed a EmuComp cluster. For dynamic cloud allocation please ask.
     sudo ./run-flavor-eaas.sh --public-ip-port 1.2.3.4:8082 --docker eaas/bwfla:demo-august-15 --emucomp 8.8.8.8:8080,4 --emucomp 9.9.9.9:8080,8

    At least one ‘–emucomp <VALUE>’ has to be specified. Repeat the argument if multiple ’emucomp’ modules have to be connected to this ‘eaas’ module. Value of the arguments is composed of the IP:PORT of the ’emucomp’ to be connected and the number of sessions it should supports (usually set to the node’s CPU-count) coming after comma.

  • Image-Archive– The image archive component manages virtual environments (disk images, as well as corresponding meta-data).
    sudo ./run-flavor-imagearchive.sh --docker eaas/bwfla:demo-august-15 --public-ip-port 1.2.3.4:8083 --archive-dir /mnt/data/image-archive

    This script run the ‘image-archive’ module. It requires the location of the image archive directory as a parameter. The ‘nbd-export’ directory should not contain any symbolic links that point to locations outside of the image-archive directory (i.e. only relative paths). This is due to the fact that the image-archive is mounted inside the docker container, which in turn has no access to the host’s file-system.

    Important note:  make sure that the public-ip-port setting chosen for your image-archive instance is accessible by the workflow module and emucomps, in particular make sure that port 10809 is not firewalled. 

    Update: An example image-archive is available for download (26 Mb). It contains 3 example images and meta-data. To use the archive, unpack and export all image files within the images/base directory using symbolic links (make sure you use relative links, see example below). If your OS does not support symbolic links, just copy the images to the nbd-export directory. 

    tar xf image-archive.tgz  #export images via nbd  
    cd image-archive/nbd-export/ 
    ln -s ../images/base/doom.raw 
    ln -s ../images/base/hatari_tOS206us.img 
    ln -s ../images/base/qemu-i386-DOS_6.20_CDROM.raw

     

  • Workflows – Frontend UI and ready-made workflows. The workflows connect the EaaS gateway (and its configured EmuComps) with user content, such as the image-archive. It contains sample preservation workflows which include archival “ingest/access” of digital objects, full disk images, any accompanying software/libraries. This contains exactly the same functionality as our demo site.
    sudo ./run-flavor-workflows.sh –docker eaas/bwfla:demo-august-15 –public-ip-port 1.2.3.4:8083 –image-archive 1.2.3.4:8083 –eaas-gateway 1.2.3.4:8082 [–object-files test/object-files –base-uri http://1.2.3.4/objects/]This script starts the ‘Workflows’ module, which represents a reference implementation of the bwFLA/EaaS API.The ‘–image-archive’ should point to a location of an image-archive, which contains base image of emulated systems, their derivatives, etc.

    The ‘–eaas-gateway’ should point to a location of an EAAS module, which will serve as a main ‘Facade’ for accepting and performing emulation tasks by using one or more ’emucomp’ modules on dedicated compute-nodes.

    The ‘–object-files’ should point to a directory containing user-objects that will appear in the ‘ingest’ workflow. The directory should contain objects in the form “OBJECT_NAME/OBJECT_NAME.iso”. E.g. ‘test/object-files/OBJECT1/OBJECT1.iso’. The ‘-base-uri’ must be specified iff ‘–object-files’ was specified previously. This argument defines the URL-prefix for the location of the objects via which the object can be download through HTTP protocol (’emucomp’ module needs to be able to download the object to inject it into the environment).

    NOTE: we will publish more information on the object-archive structure here. 

Category: bwFLA Projekt, R&D | Leave a Comment

Imaging (old) IDE disks – Harder than imagined

Wednesday, September 16th, 2015 | Author:

Despite the formerly wide spread use of (parallel) port IDE disks in x86 computers, there seem to be a couple of compatibility issues with these devices. While e.g. the 40pin the physical interface did not change, the logical layer did.

Background

The system to be preserved was an integral part of a large scale local language research program. It was set up in 1993 as a test case for computerized local language research and language/dialect mapping. The setup consisted of one server machine running on OS/2 driving an IBM DB2 database and six client machines offering access to the database over a LAN. The LAN infrastructure was running TCP/IPv4 on TokenRing infrastructure. All machines were x86 hardware, featuring 486DX2 for the clients and a Pentium/Overdrive CPU (upgraded from 486) in the server. The machines were equipped with 8 to 16MByte of RAM. The server hard disk was a SCSI disk of 1,1GByte of size, the client hard disks were IDE of 240MByte of size. All clients were identically equipped and the installation on the clients were originally identical.

Five identical IDE disk, manufactured 1993

Experimental Setup

Parallel port IDE got gradually phased out and has been replaced by SATA in the mid 2000. Thus, this kind of connector typically unavailable in new x86 machines. In the experiments a couple of different IDE implementations where used:

  • Intel 865 chipset with a BIOS from 2005
  • Intel 875 chipset with a BIOS from 2004
  • Two port onboard controller (same mainboard with Intel 875)
  • NVidia nForce chipset of 2005
  • Lindy cable multi (physical) IDE port to USB 2.0 adaptor
  • Davicontrol PCI dual-port IDE adaptor (Silicon Image PCI0680 Ultra ATA 166)
The IDE disks mainly considered where 240 MByte Quantum disks. The disks, taken from five client machines, where numbered 1 … 5.  For the system imaging procedure a fairly recent 3.2 Linux kernel (Ubuntu 12.04 LTS) was used. A preliminary requirement for disk imaging is that the system is recognizing the disk properly:
  • The USB adaptor “saw” the disk, but was unable to produce a proper capacity reading (guessed on 2 TByte)
    [  300.633955] scsi 6:0:0:0: Direct-Access     QUANTUM  19234688         0    PQ: 0 ANSI: 2 CCS
    [  300.634368] sd 6:0:0:0: Attached scsi generic sg2 type 0
    [  300.634944] sd 6:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
    [  300.635821] sd 6:0:0:0: [sdc] Using 0xffffffff as device size
    [ 300.635839] sd 6:0:0:0: [sdc] 4294967296 512-byte logical blocks: (2.19 TB/2.00 TiB)
    ...
  • Newer IDE adaptors, like e.g. the onboard controller did not recognize the disk at all
  • Several disks got recognized by the Intel 865, 875 controllers, but two failed (disk #2, #4)
  • The failed disk 4 got properly recognized on the nForce and Davicontrol controller

The disks did not get properly recognized in every boot-up cycle. They need a certain time to spin up to answer properly. Sometimes they “hang”, which is indicated by a permanently on disk activity LED. To see, if the disks got recognized by the operating system (Linux), the kernel messages give the information on which disks are visible to the system. Later on, run with administrator privileges, the fdisk command should give a proper listing of the disk’s partition table.

Some tips to deal with “hanging” disks:

  • Most of the disks did not get detected with every bootup. Pausing the machine, rebooting usually helps to get it finally done.
  • If a disk gets not properly recognized during bootup, then the unloading and loading of the IDE controller kernel module triggers the recognition. This usually helps.
  • The detection rate on the Davicontrol adaptor was different in different machines. The BIOS and bus device order (initialization in different order) seems to influence the process significantly

The tool of choice to produce identical copies of block devices in Linux/Unix systems is dd (or ddrescue). In standard configuration it reads the block device 512 Byte wise and writes this to a file (if asked to). After proper recognition the disk was present through the highlevel device, e.g. _/dev/sda_ and through the device for each partition e.g. _/dev/sda1…5_ (numbering depending on the partitions detected).

Every disk aside disk 3 got read from beginning to end with dd if=/dev/sda of=image-file. This procedure copies every thing including the master boot record as well as the partition table. dd finished the process without any errors in every run on every machine. The machine log did not show any errors either. Thus, it was concluded, that the process ran flawlessly. Unfortunately the simple partition check on the image file fdisk -l image-file did not produce the proper partition listing. This was assumed to be a deficiency of the tool. After trying to boot the resulted system image in emulator random filesystem errors occurred.

Investigating this issues further showed that each dd run produced a random md5sum. Unfortunately no errors was reported from the system or dd. We cross-checked the results with hexdump, which showed different content for each run. However, we could ruled out faulty HDD drives because the original hardware worked well with the original disks. The only source of errors could within the imaging process.

To rule out a faulty version of the used Linux kernel, the experiments were repeated with similar results on a three year older kernel with similar results. To check for implementation flaws in the IDE driver even older versions were booted, but then the hardware did not get fully recognized and a disk dump was impossible.

After getting repeatedly different results from reading the disks with the i865 chipset (default) other IDE controller where used. The AMD/nForce system from about the same era as the i865 system behaved pretty much the same, but was able to read disk #4 which was un-accessible on the i865 (reason not totally clear: We might have not tried hard enough to register the disk to the machine by restarting, delaying bootup or reloading the required kernel modules). After a couple of runs, fdisk produced different results for the same disk, same for the dd runs. The produced images also differed from the images produced on the i865 system.

The Davicontrol adaptor was a recent addition to our hardware pool and was used next. Up to now it never failed to produce a proper partition table reading of different disks. The produced images were exactly the same in different runs (using md5sum). The partition listing of the resulted image files looked as expected (identical to the reading from the original source). Same was true for mounting the partitions (none of the previous filesystem errors encountered from the other images). Thus, it should be pretty obvious that the setup is producing the expected results and has none of the hardware issues of the setups used before. Nevertheless, there might be still hidden errors around.

Conclusion

dd does exactly the job as expected, imaging block devices to files. If the input is corrupted for some reason (but not reported as an error by the operating system), dd has no means to detect such an issue. The verification process of the system imaging is to be done thoroughly using different tools and methods to ensure correctness.

Leassons learned: 

  1. As long as the md5sum (or similar fixity information) is not gathered from a different source, it is (almost) useless to compare the disk’s md5sum with the imaging result
  2. Do at least two consecutive imaging runs and verify that the results are identical

The encountered errors highlight also once again the importance of a hardware archive offering various options for the same task. Adaptors, controllers and system buses get out of use and result in device obsolescence which might prevent the proper preservation of a system.

 

by Dirk von Suchodoletz & Klaus Rechert

Category: R&D | Leave a Comment

bwFLA EaaS: Releasing Digital Art into the Wild

Saturday, December 21st, 2013 | Author:

With the bwFLA Emulation-as-a-Service you can enable users to view your (interactive) objects without actually giving the environment+object to the user. This is a nice feature, especially for dig. art and similar: you can provide access to an almost unlimited amount of people being able to view, use and interact with a piece of dig. art without being able to copy it. The owner remains in control of the object and is able to restrict access any time.

Jon Thomson and Alison Craighead (www.thomson-craighead.net) nicely prepared and integrated two of their art pieces for public access using the bwFLA EaaS infrastructure. Please take a look at:

http://www.thomson-craighead.net/docs/thal.html

and

http://www.triggerhappy.org

Please be patient it may take a minute or so to load. More information can be found here http://thomson-craighead.blogspot.de/2013/12/thalamus-now-emulated-online.html

Category: bwFLA Projekt, R&D | Leave a Comment

Symposion “Open Data – Closed Data” an der HTW in Berlin

Saturday, September 28th, 2013 | Author:

Am 27. und 28. September 2013 fand an der Hochschule für Technik und Wirtschaft in Berlin ein Symposion zu “Open Data – Closed Data – Leaked Data” statt. Dieses wurde durch den GI-Fachbereich »Informatik und Gesellschaft« organisiert. Der erste Tag beschäftigte sich mit der Frage der Zugänglichkeit (bzw. der Unzugänglichkeit) von Daten in Forschung und Politik. “Informatik und Gesellschaft” setzt sich mit den auch nicht-technischen, gesellschaftlichen Kontexten und ihrer Planung, Entwurf und Konstruktion auseinander. Dies gilt in besonderer Weise für ihre (beabsichtigt und unbeabsichtigt) anfallenden Daten und das Spannungsfeld von Open und Closed Data. more…

Category: Events | Leave a Comment