Network RAID Storage: Proof of Concept

I recently discovered a post on Hack a Day linking to a proof of concept how-to on setting up a software RAID on FTP servers. While the guide is a simple approach to running a network based RAID 5 configuration, a number of tools the original developer used are less than optimal. First, the set up requires both Windows and Linux, meaning you will either need two physical machines, or a virtualized machine in your configuration. The second, and larger problem is that it is restricted to RAID 5 and FTP servers. This article is my attempt to alleviate both of these issues.

The goals of this project will be simple: establish a software RAID configuration that supports all of the common RAID levels, works across network share protocol, works with any number of network shares, works with different types of network shares simultaneously, and is based entirely on open source software. Simple.

To start, this proof of concept has two network shares mounted an NFS share and an SMB share:

MotomaSTYLE raiddemo # ls -la total 4 drwxr-xr-x 4 root root 80 Dec 27 21:49 ./ drwxr-xr-x 33 root root 80 Dec 27 21:49 ../ drwxr-xr-x 2 root root 80 Dec 27 21:49 nfsmountpoint/ drwxr-xr-x 2 root root 80 Dec 27 21:49 smbmountpoint/
Software RAID is nothing new in the Linux world and has been around for ages in the form of the raidtools package. Raidtools itself is difficult to use and maintain, and its feature set is limited. For those reasons, the project will rely on the mdadm tool to facilitate the network RAID architecture. The benefits mdadm is that it supports all of the major RAID levels, works on any properly partitioned device, and it a breeze to use.

The difficult part of this project is the process of allocating space on the remote servers in such a way that you are able to feed them to mdadm as devices. Luckily, "everything is considered a file" in Linux, and devices such as hard disks are no exception. With hard disks considered files, there is no reason why files could not be considered hard disks. The first task to complete is to set up a regular file on each share, filled to the appropriate size. In this example, each "disk" file will be approximately 1GB:

MotomaSTYLE raiddemo # dd of=nfsmountpoint/image bs=1024 count=0 seek=1M 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000592534 seconds, 0.0 kB/s MotomaSTYLE raiddemo # dd of=smbmountpoint/image bs=1024 count=0 seek=1M 0+0 records in 0+0 records out 0 bytes (0 B) copied, 29.2139 seconds, 0.0 kB/s
The next step in the process is to partition each "disk" file in a way that will allow mdadm to use them in the RAID. Using fdisk the proper partition type is "Linux raid autodetect":

MotomaSTYLE raiddemo # fdisk smbmountpoint/image Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. You must set cylinders. You can do this from the extra functions menu. Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): x Expert command (m for help): c Number of cylinders (1-1048576): 1024 Expert command (m for help): r Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-1024, default 1): 1 Last cylinder or +size or +sizeM or +sizeK (1-1024, default 1024): 1024 Command (m for help): t Selected partition 1 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: Re-reading the partition table failed with error 22: Invalid argument. The kernel still uses the old table. The new table will be used at the next reboot. Syncing disks.
At this point, the files are "partitioned," however, they are not in a usable state. The partitions cannot be fed to mdadm until they are recognized as devices. For this, two loop devices are made using the mknod tool. The major number for a loop device is 7; the minor revision numbers were chosen to avoid existing loop devices as the test machine had 0-99 already:

MotomaSTYLE raiddemo # mknod -m660 loopnfs b 7 100 MotomaSTYLE raiddemo # mknod -m660 loopsmb b 7 101
In order to attach the "partitions" of the "disks" to the loop devices the offset of each partition must be known. Fdisk will display this information with the -ul flag:

MotomaSTYLE raiddemo # fdisk -ul nfsmountpoint/image You must set cylinders. You can do this from the extra functions menu. Disk nfsmountpoint/image: 0 MB, 0 bytes 255 heads, 63 sectors/track, 0 cylinders, total 0 sectors Units = sectors of 1 * 512 = 512 bytes Device Boot Start End Blocks Id System nfsmountpoint/image1 63 16450559 8225248+ fd Linux raid autodetect MotomaSTYLE raiddemo # fdisk -ul smbmountpoint/image You must set cylinders. You can do this from the extra functions menu. Disk smbmountpoint/image: 0 MB, 0 bytes 255 heads, 63 sectors/track, 0 cylinders, total 0 sectors Units = sectors of 1 * 512 = 512 bytes Device Boot Start End Blocks Id System smbmountpoint/image1 63 16450559 8225248+ fd Linux raid autodetect
By multiplying the Start by the Unit size the proper offset for the partition is achieved. In this case 63 * 512 yields 32256, the number to feed to losetup when binding the partitions to the loop devices:

MotomaSTYLE raiddemo # losetup -o 32256 loopnfs nfsmountpoint/image MotomaSTYLE raiddemo # losetup -o 32256 loopsmb smbmountpoint/image
One last thing that mdadm requires to set up a software RAID is an "md" device. Once again mknod is used, this time with a major number of 9, and a minor number that will not conflict with existing devices:

MotomaSTYLE raiddemo # mknod raiddev b 9 100
Now combine the "disks" in a software RAID, in this test I will create (-C) a RAID level 5 (-l5) device (raiddev) from two (-n2) existing devices (loopnfs and loopsmb):

MotomaSTYLE raiddemo # mdadm -C -v raiddev -l5 -n2 loopnfs loopsmb mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 64K mdadm: size set to 1048448K
At this point in the test system, the two files located on the network shares (nfsmountpoint/image and smbmountpoint/image) are mapped to loop devices (loopnfs and loopsmb), which have been combined in a software RAID to form one device (raiddev). In order to utilize the new disk that has been created, it must first be formatted:

mdadm: array raiddev started. MotomaSTYLE raiddemo # mke2fs raiddev mke2fs 1.38 (30-Jun-2005) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 131072 inodes, 262112 blocks 13105 blocks (5.00%) reserved for the super user First data block=0 8 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Writing inode tables: done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 20 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
And then mounted:

MotomaSTYLE raiddemo # mkdir raidmount MotomaSTYLE raiddemo # mount raiddev raidmount/ MotomaSTYLE raiddemo # ls -la raidmount total 20 drwxr-xr-x 3 root root 4096 Dec 27 22:25 ./ drwxr-xr-x 5 root root 160 Dec 27 22:27 ../ drwx------ 2 root root 16384 Dec 27 22:25 lost+found/
There you have it. If you have followed along, you now have a network based RAID device. If you find this article useful, interesting, offensive, or stupid, please leave me a note or send me an email.

References:

Post your comment

Comments

  • This is the first step-by-step discussion I have found about how to set up network RAID. I am trying to do this for an iSCSI SAN I am creating. Being a linux newbie I am unsure about how to apply your solution to my problem. Currently, I have two CentOS servers running iSCSI acting as initiators and two storage boxes running iSCSI target. Ideally, there would be n storage boxes and using network RAID only n-1 boxes would be required for the data to stay available. I have successfully allowed both storage boxes to appear as disks via iSCSI on both the servers, now I have to use software RAID to make one network RAID device which I will then use inside GFS. It seems you are building a RAID device using files "masquerading" as disks, but I have iSCSI disks so should I create files that use all the dive space on my iSCSI disks and follow your steps or can I shortcut these steps as I have disks available? I intend to do some research to better understand what you are doing, but I thought there was no harm in asking. Thanks in advance for any help you can give.

    Posted by Michael, 29/05/2008 1:16pm (3 months ago)

  • Is there a benchmark or something to see how well it performs?

    Posted by , 29/05/2008 1:15pm (3 months ago)

  • Man, this is sweet, I've got a bunch of crap machines laying around and now I can use them to provide redundancy via a network RAID setup.

    Thanks for the great article!

    Posted by chrisd, 29/05/2008 1:15pm (3 months ago)

  • Hi
    I have realized the same concept with iSCSI protocol.

    The last step is put the softraid under heartbeat control to have a fault tolerant system.

    Posted by , 29/05/2008 1:14pm (3 months ago)

  • This is just wonderful, you answered my question.
    We have some brand new PC's in the office with disk space to spare. I thought what a waste and I have a shortage of disc space on the linux server I put together with abandoned parts. My first thought, was to use them as backup. Secondly I wondered if would be possible to use RAID to add some safety. I had no idea how to tackle this and lacked the time to research this idea.
    You not only saved me the time I lack, but know now that it is indeed possible.
    I'm eager to try out your proof of concept when I'm back at the office next week.

    Posted by , 29/05/2008 1:06pm (3 months ago)

  • This would be pretty sweet with gmailfs or similar "mount free storage" hacks, combining into larger container, potentially with redundancy for possible losses if spread wide enough. Might also cut back on time/overfrequent access by RAIDing over several sources instead of bottlenecking it through one. I'm going to have to play some, I never realized how cool software RAID had gotten. FUSE rocks so hard. Thanks for the tips.

    Posted by , 29/05/2008 1:06pm (3 months ago)

  • why would someone find this offensive?

    Posted by , 29/05/2008 1:05pm (3 months ago)

  • Hy, can I "implement" your proof of concept into my original please?
    thanks, wonderer

    Posted by wonderer, 29/05/2008 1:05pm (3 months ago)

RSS feed for comments on this page