Wednesday, September 16th, 2015 | Author: Klaus Rechert
Despite the formerly wide spread use of (parallel) port IDE disks in x86 computers, there seem to be a couple of compatibility issues with these devices. While e.g. the 40pin the physical interface did not change, the logical layer did.
The system to be preserved was an integral part of a large scale local language research program. It was set up in 1993 as a test case for computerized local language research and language/dialect mapping. The setup consisted of one server machine running on OS/2 driving an IBM DB2 database and six client machines offering access to the database over a LAN. The LAN infrastructure was running TCP/IPv4 on TokenRing infrastructure. All machines were x86 hardware, featuring 486DX2 for the clients and a Pentium/Overdrive CPU (upgraded from 486) in the server. The machines were equipped with 8 to 16MByte of RAM. The server hard disk was a SCSI disk of 1,1GByte of size, the client hard disks were IDE of 240MByte of size. All clients were identically equipped and the installation on the clients were originally identical.
Parallel port IDE got gradually phased out and has been replaced by SATA in the mid 2000. Thus, this kind of connector typically unavailable in new x86 machines. In the experiments a couple of different IDE implementations where used:
- Intel 865 chipset with a BIOS from 2005
- Intel 875 chipset with a BIOS from 2004
- Two port onboard controller (same mainboard with Intel 875)
- NVidia nForce chipset of 2005
- Lindy cable multi (physical) IDE port to USB 2.0 adaptor
- Davicontrol PCI dual-port IDE adaptor (Silicon Image PCI0680 Ultra ATA 166)
- The USB adaptor “saw” the disk, but was unable to produce a proper capacity reading (guessed on 2 TByte)
[ 300.633955] scsi 6:0:0:0: Direct-Access QUANTUM 19234688 0 PQ: 0 ANSI: 2 CCS [ 300.634368] sd 6:0:0:0: Attached scsi generic sg2 type 0 [ 300.634944] sd 6:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16). [ 300.635821] sd 6:0:0:0: [sdc] Using 0xffffffff as device size [ 300.635839] sd 6:0:0:0: [sdc] 4294967296 512-byte logical blocks: (2.19 TB/2.00 TiB) ...
- Newer IDE adaptors, like e.g. the onboard controller did not recognize the disk at all
- Several disks got recognized by the Intel 865, 875 controllers, but two failed (disk #2, #4)
- The failed disk 4 got properly recognized on the nForce and Davicontrol controller
The disks did not get properly recognized in every boot-up cycle. They need a certain time to spin up to answer properly. Sometimes they “hang”, which is indicated by a permanently on disk activity LED. To see, if the disks got recognized by the operating system (Linux), the kernel messages give the information on which disks are visible to the system. Later on, run with administrator privileges, the fdisk command should give a proper listing of the disk’s partition table.
Some tips to deal with “hanging” disks:
- Most of the disks did not get detected with every bootup. Pausing the machine, rebooting usually helps to get it finally done.
- If a disk gets not properly recognized during bootup, then the unloading and loading of the IDE controller kernel module triggers the recognition. This usually helps.
- The detection rate on the Davicontrol adaptor was different in different machines. The BIOS and bus device order (initialization in different order) seems to influence the process significantly
The tool of choice to produce identical copies of block devices in Linux/Unix systems is dd (or ddrescue). In standard configuration it reads the block device 512 Byte wise and writes this to a file (if asked to). After proper recognition the disk was present through the highlevel device, e.g. _/dev/sda_ and through the device for each partition e.g. _/dev/sda1…5_ (numbering depending on the partitions detected).
Every disk aside disk 3 got read from beginning to end with dd if=/dev/sda of=image-file. This procedure copies every thing including the master boot record as well as the partition table. dd finished the process without any errors in every run on every machine. The machine log did not show any errors either. Thus, it was concluded, that the process ran flawlessly. Unfortunately the simple partition check on the image file fdisk -l image-file did not produce the proper partition listing. This was assumed to be a deficiency of the tool. After trying to boot the resulted system image in emulator random filesystem errors occurred.
Investigating this issues further showed that each dd run produced a random md5sum. Unfortunately no errors was reported from the system or dd. We cross-checked the results with hexdump, which showed different content for each run. However, we could ruled out faulty HDD drives because the original hardware worked well with the original disks. The only source of errors could within the imaging process.
To rule out a faulty version of the used Linux kernel, the experiments were repeated with similar results on a three year older kernel with similar results. To check for implementation flaws in the IDE driver even older versions were booted, but then the hardware did not get fully recognized and a disk dump was impossible.
After getting repeatedly different results from reading the disks with the i865 chipset (default) other IDE controller where used. The AMD/nForce system from about the same era as the i865 system behaved pretty much the same, but was able to read disk #4 which was un-accessible on the i865 (reason not totally clear: We might have not tried hard enough to register the disk to the machine by restarting, delaying bootup or reloading the required kernel modules). After a couple of runs, fdisk produced different results for the same disk, same for the dd runs. The produced images also differed from the images produced on the i865 system.
The Davicontrol adaptor was a recent addition to our hardware pool and was used next. Up to now it never failed to produce a proper partition table reading of different disks. The produced images were exactly the same in different runs (using md5sum). The partition listing of the resulted image files looked as expected (identical to the reading from the original source). Same was true for mounting the partitions (none of the previous filesystem errors encountered from the other images). Thus, it should be pretty obvious that the setup is producing the expected results and has none of the hardware issues of the setups used before. Nevertheless, there might be still hidden errors around.
dd does exactly the job as expected, imaging block devices to files. If the input is corrupted for some reason (but not reported as an error by the operating system), dd has no means to detect such an issue. The verification process of the system imaging is to be done thoroughly using different tools and methods to ensure correctness.
- As long as the md5sum (or similar fixity information) is not gathered from a different source, it is (almost) useless to compare the disk’s md5sum with the imaging result
- Do at least two consecutive imaging runs and verify that the results are identical
The encountered errors highlight also once again the importance of a hardware archive offering various options for the same task. Adaptors, controllers and system buses get out of use and result in device obsolescence which might prevent the proper preservation of a system.
by Dirk von Suchodoletz & Klaus Rechert