EaaS Object Archive

Thursday, March 31st, 2016 | Author:

The EaaS object archive (or technically the EaaS object archive facade) is one key component to support seamless integration of institutional object-repositories with EaaS and to make EaaS a cost-effective and scalable access solution for born-digital content.

By design, EaaS separates an emulated computer system (emulation environment) and user objects. The emulation environment consists of the emulator configuration in combination with a bootable disk image containing an installed operating system with software applications etc. The user objects, on the other hand, are all media objects such as CD-ROMs or individual files that are to be rendered in the emulated computer system. Objects and emulation environment are combined only if a user requests a certain object to be rendered using a suitable emulation environment.

The main rationale for this design are preservation planning complexity and costs: as the number of objects are in the thousands or millions, the number of necessary emulation environments is rather small, typically about 10-20 “base” environments representing typical computer systems of technological epochs. By separating objects from their rendering environments, emulation-based preservation planning strategies can focus on a small set of (emulation environments) that have to be kept alive. The objects remain in a dedicated repository.

To make objects accessible with EaaS we have designed a flexible “object archive facade” to translate between the technical requirements of the emulation framework and a specific object repository.

Interfaces and Data Types

To allow EaaS to retrieve and render user objects an object repository provider has to implement a simple Java interface:

public interface DigitalObjectArchive
{
    public FileCollection getObjectReference(String objectId);

    // object archive identification 
    String getName();

    // optional, returns a list of object IDs
    public List<String> getObjectList(); 
}

The most important method to be implemented is getObjectReference()
which takes an object ID and returns a FileCollection description of the object. The FileCollection represents all individual files of an object as referenceable URLs. As a JAXB XML representation looks like:

<FileCollection id="OID1">
    <FileCollectionEntry 
       id="CD1"
       url="https://repo/id=CD1.iso" 
       type="CDROM" />
    [...]
</FileCollection>

Each FileCollectionEntry represents a media image (currently only floppy, disk and cdrom are supported) as a URL to the data stream. The URL may contain https(s), nbd or file transport protocols and ideally support random access to the data stream (i.e. for HTTP(S), the server has to support HTTP Range Requests). The URLs provided by the FileCollection need to be directly accessible by the EaaS infrastructure. Hence, a repository-specific implementation of getObjectReference() provides accessible links to the internal data streams for a given object, or retrieves the object to a temporary storage and creates filesystem links (in the case of direct file access) to these files.

Tags »

Trackback: Trackback-URL | Comments Feed: RSS 2.0
Category: R&D

Responses are currently closed, but you can
Trackback the post from your own site.