Do you need a file server on the cheap that is easy to setup, "rock solid" reliable with Email Alerting? will show you how to use Ubuntu, software RAID and SaMBa to accomplish just that.

Overview

Despite the recent buzz to move everything to the "all mighty"cloud,  sometimes you may not want your information in someone else's server or it just maybe unfeasible to download the volumes of data that you require from the internet every time (for example image deployment). So before you clear out a place in your budget for a storage solution, consider a configuration that is licensing free with Linux.

With that said, going cheap/free does not mean "throwing caution to the wind", and to that end, we will note points to be aware of, configurations that should be set in place in addition to using software RAID, to achieve the maximum price to reliability ratio.

Image by Filomena Scalise

About software RAID

As the name implies, this is a RAID (Redundant Array of Inexpensive Disks) setup that is done completely in software instead of using a dedicated hardware card. The main advantage of such a thing is cost, as this dedicated card is an added premium to the base configuration of the system. The main disadvantages are basically performance and some reliability as such a card usually comes with it's own RAM+CPU to perform the calculations required for the redundancy math, data caching for increased performance, and the optional backup battery that keeps unwritten operations in the cache until power has been restored in case of a power out.

With a software RAID setup your sacrificing some of the systems CPU performance in order to reduce total system cost, however with todays CPUs the overhead is relatively  negligible (especially if your going to mainly dedicate this server to be a "file server"). As far as disk performance go, there is a penalty... however I have never encountered a bottleneck from the disk subsystem from the server to note how profound it is. The Tom's Hardware guide "Tom's goes RAID5" is an oldie but a goody exhaustive article about the subject, which I personally use as reference, however take the benchmarks with a grain of salt as it is talking about windows implementation of software RAID (as with everything else, i'm sure Linux is much better :P).

Prerequisites

  • Patience young one, this is a long read.
  • It is assumed you know what RAID is and what it is used for.
  • This guide was written using Ubuntu server9.10 x64, therefore it is assumed that you have a Debian based system to work with as well.
  • You will see me use VIM as the editor program, this is just because I'm used to it... you may use any other editor that you'd like.
  • The Ubuntu system I used for writing this guide, was installed on a disk-on-key. Doing so allowed me to use sda1 as part of the RAID array, so adjust accordingly to your setup.
  • Depending on the type of RAID you want to create you will need at least two disks on your system and in this guide we are using 6 drives.

Choosing the disks that make the array

The first step in avoiding a trap is knowing of it's existence (Thufir Hawat from Dune).

Choosing the disks is a vital step that should not be taken lightly, and you would be wise to capitalize on yours truly's experience and heed this warning:

Do NOT use "consumer grade" drives to create your array, use "server grade" drives!!!!!!

Now i know what your thinking, didn't we say we are going to go on the cheap? and yes we did, but, this is exactly one of the places where doing so is reckless and should be avoided. Despite of their attractive price, consumer grade hard drives are not designed to be used in a 24/7 "on" type of a use. Trust me, yours truly has tried this for you. At least four consumer grade drives in the 3 servers I have setup like this (due to budget constraints) failed after about 1.5 ~ 1.8 years from the server's initial launch day. While there was no data loss, because the RAID did it's job well and survived... moments like this shorten the life expectancy of the sysadmin, not to mention down time for the company for the server maintenance (something which may end up costing more then the higher grade drives).

Some may say that there is no difference in fail rate between the two types. That may be true, however despite these claims, server grade drives still have a higher level of S.M.A.R.T restrictions and QAing behind them (as can be observed by the fact that they are not released to the market as soon as the consumer drives are), so i still highly recommend that you fork out the extra $$$ for the upgrade.

Choosing the RAID level.

While I'm not going to go into all of the options available (this is very well documented in the RAID wikipedia entry), I do feel that it is noteworthy to say that you should always opt for at least RAID 6 or even higher (we will be using Linux RAID10). This is because when a disk fails, there is a higher chance of a neighboring disk failure and then you have a "two disk" failure on your hands. Moreover, if your going to use large drives, as larger disks have a higher data density on the platter's surface, the chance for failure is higher. IMHO disks from 2T and beyond will always fall into this category, so be aware.

Let's get cracking

Partitioning disks

While in Linux/GNU, we could use the entire block device for storage needs, we will use partitions because it makes it easier to use disk rescue tools in case the system has gone bonkers. We are using the "fdisk" program here, but if your going to use disks larger then 2T you are going to need to use a partitioning program that supports GPT partitioning like parted.

        sudo fdisk /dev/sdb
    

Note: I have observed that it is possible to make the array without changing the partition type, but because this is the way described all over the net I'm going to follow suit (again when using the entire block device this is unnecessary).

Once in fdisk the keystrokes are:

n          ; for a new partition

enter

p          ; for a primary partition

enter

1          ; number of partition

enter    ; accept the default

enter    ; accept the default

t          ; to change the type

fd        ; sets the type to be "Linux raid auto detect" (83h)

w         ; write changes to disk and exit

Rinse and repeat for all the disks that will be part of the array.

Creating a Linux RAID10 array

The advantage of using "Linux raid10" is that it knows how to take advantage of a non-even number of disks to boost performance and resiliency even further then the vanilla RAID10, in addition to the fact that when using it the "10" array can be created in one single step.

Create the array from the disks we have prepared in the last step by issuing:

        sudo mdadm --create /dev/md0 --chunk=256 --level=10 -p f2 --raid-devices=5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 --verbose
    

Note: This is all just one line despite the fact that the representation breaks it into two.

Let's break the parameters down:

  • "--chunk=256" -  The size of bytes the raid stripes are broken to, and this size is recommended for new/large disks (the 2T drives used to make this guide were without a doubt in that category).
  • "--level=10" - Uses the Linux raid10 (if a traditional raid is required, for what ever reason, you would have to create two arrays and join them).
  • "-p f2" - Uses the "far" rotation plan see note below for more info and "2" tells that the array will keep two copies of the data.

Note: We use the "far" plan because this causes the physical data layout on the disks to NOT be the same. This helps to overcome the situation where the hardware of one of the drives fails due to a manufacturing fault (and don't think "this won't happen to me" like yours truly did). Due to the fact that the two disks are of the same make and model, have been used in the same fashion and traditionally have been keeping the data on the same physical location... The risk exists that the drive holding the copy of the data has failed too or is close to and will not provide the required resiliency until a replacement disk arrives. The "far" plan makes the data distribution to a completely different physical location on the copy drives in addition to using disks that are not close to each other within the computer case. More information can be found here and in the links below.

Once the array has been created it will start its synchronization process. While you may wish to wait for traditions' sake (as this may take a while), you can start using the array immediately.

The progress can be observed using:

        watch -d cat /proc/mdstat
    

Create the mdadm.conf Configuration File

While it has been proven that Ubuntu simply knows to scan and activate the array automatically on startup, for completeness sake and courtesy for the next sysadmin we will create the file. Your system doesn't automatically create the file and trying to remember all the components/partitions of your RAID set, is a waist of the system admin's sanity. This information can, and should be kept in the mdadm.conf file. The formatting can be tricky, but fortunately the output of the mdadm --detail --scan --verbose command provides you with it.

Note: It has been said that: "Most distributions expect the mdadm.conf file in /etc/, not /etc/mdadm. I believe this is a "ubuntu-ism" to have it as /etc/mdadm/mdadm.conf". Due to the fact that we are using Ubuntu here, we will just go with it.

        sudo mdadm --detail --scan --verbose > /etc/mdadm/mdadm.conf
    

IMPORTANT! you need to remove one "0" from the newly created file because the syntax resulting from the command above isn't completely correct (GNU/Linux isn't an OS yet).

If you want to see the problem that this wrong configuration causes, you can issue the "scan" command at this point, before making the adjustment:

        mdadm --examine --scan
    

To overcome this, edit the file /etc/mdadm/mdadm.conf and change:

        metadata=00.90
    

To read:

        metadata=0.90
    

Running the mdadm --examine --scan command now should return without an error.

Filesystem setup on the array

I used ext4 for this example because for me it just built upon the familiarity of the ext3 filesystem that came before it while providing promised better performance and features.

I suggest taking the time to investigate what filesystem better suits your needs and a good start for that is our "Which Linux File System Should You Choose?" article.

        sudo mkfs.ext4 /dev/md0
    

Note: In this case i didn't partition the resulting array because, i simply didn't need it at the time, as the requesting party specifically requested at least 3.5T of continuous space. With that said, had i wanted to create partitions, i would have had to use a GPT partitioning capable utility like "parted".

Mounting

Create the mount point:

        sudo mkdir /media/raid10
    

Note: This can be any location, the above is only an example.

Because we are dealing with an "assembled device" we will not use the filesystem's UUID that is on the device for mounting (as recommended for other types of devices in our "what is the linux fstab and how does it work" guide) as the system may actually see part of the filesystem on an individual disk and try to incorrectly mount it directly. to overcome this we want to explicitly wait for the device to be "assembled" before we try mounting it, and we will use the assembled array's name ("md") within fstab to accomplish this.

Edit the fstab file:

        sudo vim /etc/fstab
    

And add to it this line:

        /dev/md0 /media/raid10/ ext4 defaults 1 2
    

Note: If you change the mount location or filesystem from the example, you will have to adjust the above accordingly.

Use mount with the automatic parameter (-a) to simulate a system boot, so you know that the configuration is working correctly and that the RAID device will be automatically mounted when the system restarts:

        sudo mount -a
    

You should now be able to see the array mounted with the "mount" command with no parameters.

Email Alerts for the RAID Array

Unlike with hardware RAID arrays, with a software array there is no controller that would start beeping to let you know when something went wrong. Therefore the Email alerts are going to be our only way to know if something happened to one or more disks in the array, and thus making it the most important step.

Follow the "How To Setup Email Alerts on Linux Using Gmail or SMTP" guide and when done come back here to perform the RAID specific steps.

Confirm that mdadm can Email

The command below, will tell mdadm to fire off just one email and close.

        sudo mdadm --monitor --scan --test --oneshot
    

If successful you should be getting an Email, detailing the array's condition.

Set the mdadm configuration to send an Email on startup

While not an absolute must, it is nice to get an update from time to time from the machine to let us know that the email ability is still working and of the array's condition. your probably not going to be overwhelmed by Emails as this setting only affects startups (which on servers there shouldn't be many).

Edit the mdadm configuration file:

        sudo vim /etc/default/mdadm
    

Add the --test parameter to the DAEMON_OPTIONS section so that it would look like:

        DAEMON_OPTIONS="--syslog --test"
    

You may restart the machine just to make sure your "in the loop" but it isn't a must.

Samba Configuration

Installing SaMBa on a Linux server enables it to act like a windows file server. So in order to get the data we are hosting on the Linux server available to windows clients, we will install and configure SaMBa.

It's funny to note that the package name of SaMBa is a pun on the Microsoft's protocol used for file sharing called SMB (Service Message Block).

In this guide the server is used for testing purposes, so we will enable access to its share without requiring a password, you may want to dig a bit more into how to setup permissions once setup is complete.

Also it is recommended that you create a non-privileged user to be the owner of the files. In this example we use the "geek" user we have created for this task. Explanations on how to create a user and manage ownership and permissions can be found in our "Create a New User on Ubuntu Server 9.10" and "The Beginner's Guide to Managing Users and Groups in Linux" guides.

Install Samba:

        aptitude install samba
    

Edit the samba configuration file:

        sudo vim /etc/samba/smb.conf
    

Add a share called "general" that will grant access to the mount point "/media/raid10/general" by appending the below to the file.

        [general]
path = /media/raid10/general
force user = geek
force group = geek
read only = No
create mask = 0777
directory mask = 0777
guest only = Yes
guest ok = Yes

The settings above make the share addressable without a password to anyone and makes the default owner of the files the user "geek".

For your reference, this smb.conf file was taken from a working server.

Restart the samba service for the settings to take affect:

        sudo /etc/init.d/samba restart
    

Once done you can use the testparm command to see the settings applied to the samba server.

that's it, the server should now be, accessible from any windows box using:

        \server-namegeneral
    

Troubleshooting

When you need to troubleshoot a problem or a disk has failed in an array, I suggest referring to the mdadm cheat sheet (that's what I do...).

In general you should remember that when a disk fails you need to "remove" it from the array, shutdown the machine, replace the failing drive with a replacement and then "add" the new drive to the array after you have created the appropriate disk layout (partitions) on it if necessary.

Once that's done you may want to make sure that the array is rebuilding and watch the progress with:

        watch -d cat /proc/mdstat
    

Good luck! :)

References:

mdadm cheat sheet

RAID levels break down

Linux RAID10 explained

mdadm command man page

mdadm configuration file man page

Partition limitations explained


Using software RAID won't cost much... Just your VOICE ;-)