Brief overview: this is about Linux Home Server, called Amahi, which Iām trying to figure out whether it suits my needs and I can migrate to it from Windows Home Server. Currently, the server is installed in VIrtualBox.
The task: simulate an unrecoverable hard disk failure of one of non-system disks. E.g. one of the disks that contain your data and participate in mirroring, managed by Greyhole.
Environment:
VirtualBox with Fedora 12 and Amahi on top.
3 hard disks attached to the virtual machine. One is system disk, second and third contain user data, managed by Greyhole.
MediaWiki installed ā just to check what happens to applications.
Scenario:
Perform hard-reset of the system, simulating power failure
Remove the third disk
Recover the system
Add replacement disk
So now the story. After I shut the system down and removed the third hard disk, I started the system and got not-so-nice error during boot ā system couldnāt find device and cannot continue to load. I was asked to fix the problems or reboot. Since rebooting doesnāt help, I understood that recovering will not be easy, or at least not automatic. I actually hoped that system can recognize the missing disk, warn, but continue. Remember, Iām not talking about the disk where the system is installed, it is just one of the data disks. Because Iām not a Linux pro, more like newbie (the only Linux command I always remember is ādirā, since it appears in MSDOS as well ), I had to dig and find what are the steps to recover the system and let it continue. Turns out, there is a special file which contains all devices to mount on start up and all I need is to edit it and remove the line with missing drive. Here are the steps:
1. After the system started, you will be notified about missing device and boot sequence will stop with command prompt, asking you to fix the problems:
Type your root password to get to console.
2. The root file system is most likely mounted as read-only, so we need to remount it, as we are going to change one of the system files. Do this by typing the following command:
mount ān āo remount /
3. Open ā/etc/fstabā file for editing, with following command:
nano āBw /etc/fstab
In my case, the missing drive is shown in the last line.
-Bw switch tells Nano to create a backup copy when you save the file. Just in case.
4. Find the line with your missing drive and remove it completely. Hit Control+O to save your changes and Control-X to exit editing.
Note, if you get an error saying something like “Cannot write file, the system is read-onlyā, it means the previous command didnāt work, exit the editor and try it again. Donāt miss the trailing slash ā this is the root file system path.
5. Hit Control+D to restart the system. It should boot properly now. At least we have a running system again, so letās continue fixing it and adding a disk replacement.
6. Start LVM (Logical Volume Management). It can be found in System->Administration. You should see your failed disk as āunknown deviceā in the tree:
This is not good and we need to repair it.
6. CAUTION! BE VERY CAUTIOS IN THIS STEP! YOU MAY CAUSE A LOSS OF DATA IF YOU REMOVE THE WRONG VOLUME! Youāve been warned.
First, we should remove the logical volume. In my case, the volume that is pointing to physical failed device is ālv_data1ā. In your case it may be something else, figure it out and delete it, by selecting it in the tree and clicking āRemove Logical Volumeā.
7. Now we need to remove the physical drive. Start console and su to root (e.g. type āsuā in the command line, without quotas, then your root password”). Type the following command, which will remove missing devices from the system:
vgreduce āremovemissing vg_hda
Change āvg_hdaā to your volume group name, which contains the missing device.
Reload LVM (View->Reload) and you should not see any more āunknown deviceās anymore. Our system is fully repaired:
8. Next step, is to install a replacement disk. If you donāt have it, just stop here, as there is nothing more to do at this point.
Shutdown the system, insert your new disk and start again as usual. You can follow the guide posted on Amahiās Wiki here, but it takes the path of command line, which I donāt really like, so if you want to do everything with UI, continue reading.
9. You will see your new drive in āUninitialized Entitiesā group. Go ahead, select it and hit āInitialize Entityā:
The drive will be moved to Unallocated Volumes group:
Hit āAdd to Existing Volume Groupā, select your group and add it. Now our group has expanded with new unused space:
Select āLogical Viewā and hit āCreate New Logical Volumeā. A dialog for adding new volume will appear. Fill in the details and remember to mount your new volume somewhere under ā/var/hda/filesā. In my case, I mount it in ā/var/hda/files/drives/sdc1ā:
Select file system (Ext4) and check both Mount and Mount when rebooted and click OK. This may be a lengthy operation, so be patient.
10. Go to your HAD, by navigating to http://hda and add your new volume into storage pool as usual. Check the configuration of each folderās pool and we are done.
11. Optionally, you may want to force Greyhole to resynchronize all data and copy it wherever needed by executing the following command in console:
greyhole āfsck
Thatās it, we are done! And please remember, Iām a total noob in Linux, so if you find any issue in what I wrote above, feel free to post about it in comments.