Archive for August, 2008

Checking devices for bad sectors

I recently had a friend contact me because he was getting an error similar to the following in his Redhat Linux system log (I didn’t save the error while debugging the problem, so I grabbed this one from the web):

kernel: disk I/O error: dev 08:01, sector 25590410
kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 28000002

At first glance, I thought the disk drive had failed, and told him to back up all of his data to safe media. Once the data was backed up, I decided to run a full SMART self test on the disk drive to check the drives health:

$ smartctl -t long /dev/hda

smartctl version 5.36 [sparc-sun-solaris2.10] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 84 minutes for test to complete.
Test will complete after Sun Aug 27 19:41:01 2006

Use smartctl -X to abort test.

The SMART long test completed successfully, but dd was failing when attempting to read sector 25590410 (we weren’t using the continue on error option). Since all modern disk drive controllers contain logic to remap faulty sectors when they are detected, and the number of reallocated sectors as reported by smartctl was well below the manufacturers failure threshold, I wondered if the sector was “stuck.” To test my theory, I booted from a Linux CD, and ran the Linux badblocks utility on the disk partition (I didn’t save the badblocks output from his drive, so the following is a sample from another machine):

$ badblocks -sv /dev/hda

Checking blocks 0 to 8192016
Checking for bad blocks (read-only test): 222400/ 8192016

Badblocks completed successfully, and an fsck of the file system reported that the file system was clean (We also used the ext3 file system debugger to see if a file was using the block. It wasn’t, so my theory is that the errors occurred when a new file was being created). Next we rebooted the system, and the number of reallocated sectors reported by smartmontools had increased by one. This completely surprised me, and I am still confused why the disk controller didn’t remap the sector when we were booted from the disk drive. I had fun debugging this problem, and learning about how IDE disk drives work.

Blogmarks BlogMemes BlogLines del.icio.us de.lirio.us Digg Facebook Google Google Reader LinkaGoGo Ask.com MyStuff Ask.com Yahoo! MyWeb Netscape Sphere StumbleUpon Plugin by Dichev.com

Leave a Comment

Building a network storage appliance

I came across two interesting storage appliance projects this week:

The openfiler project:
http://www.openfiler.com/

The freenas project:
http://www.freenas.org/

If your looking for a solution to host numerous terabytes of data at home, these might be for you.

Blogmarks BlogMemes BlogLines del.icio.us de.lirio.us Digg Facebook Google Google Reader LinkaGoGo Ask.com MyStuff Ask.com Yahoo! MyWeb Netscape Sphere StumbleUpon Plugin by Dichev.com

Leave a Comment

Locating the device that contains an EXT3 label

On most Linux hosts, the first field of the /etc/fstab file contains labels instead of disk partions. This simplifies file system management, since you don’t have to update the fstab file if you move a drive to a new controller, or add additional drives to a system. If you want to locate the partition that is associated with a label, you can use the findfs utility:

$ /sbin/findfs LABEL=/
/dev/hda1

You can also use the findfs utility to locate a partition by UUID

$ /sbin/findfs UUID=b4ce6d24-000c-45a3-8258-cbf9f826c0ce
/dev/hda1

The findfs utility is extremely useful, and is just one of a number of cool programs (others include blkid, e2label, partinfo and findsuper) in the e2fsprogs package!

Blogmarks BlogMemes BlogLines del.icio.us de.lirio.us Digg Facebook Google Google Reader LinkaGoGo Ask.com MyStuff Ask.com Yahoo! MyWeb Netscape Sphere StumbleUpon Plugin by Dichev.com

Leave a Comment

Monitoring system utilization on Linux hosts

I am always on the look out for tools to analyze system performance. One nifty tool I recently came across is atop, which is an advanced system performance monitor for Linux. When atop is run, it displays overall system utilization in the header, and per-process CPU, memory, network or disk utilization information in the body (you need to patch your kernel to get disk and network utilization).
Blogmarks BlogMemes BlogLines del.icio.us de.lirio.us Digg Facebook Google Google Reader LinkaGoGo Ask.com MyStuff Ask.com Yahoo! MyWeb Netscape Sphere StumbleUpon Plugin by Dichev.com

Leave a Comment