October 21, 2022

Switching from JBOD to LVM RAID5

Before getting into the nitty-gritty, I want to make sure we have some understanding of the tools in place. For the sake of argument, we're going to assume three disks with 4TB of capacity each.

SMART

Self-Monitoring, Analysis, and Reporting Technology (SMART) is a feature of disks that provide means of monitor disk health. 

This is usually done on the fly during reads of a sector. When a sector read fails, that sector will be marked as invalid. The next write to that sector, will actually reallocate to a spare sector on the disk. The problem is that by the time this is identified, the data is typically already lost. The reallocation will allow another sector to take this failed sector's place to avoid issues in the future.

Disks can support short and long self tests to periodically scan sectors for before an issue occurs.

RAID

Redundant Array of Independent Disks (RAID) can provide redundancy, performance improvement, or both to a group of disks. A group of disks are combined to make the operating system see a single disk that's accessible. RAID is never an alternative for back ups. There are more types of RAID that you can look into at your own, but below is a gist that you need to understand the rest of this post.

JBOD

Just a bunch of disks. This is just mounting disks into your computer and just treating them as completely independent. With this setup, you can utilize all size disk and combination. So in our case, 12 TB of capacity.

RAID0

To improve performance, data is spread across disks so when a data is read, disks can be read/written from/to in parallel to improve the performance. This allows for what appears as a single disk at 12 TB of capacity. Because of the striping of data, if there's a disk failure it's the equivalent of the all data across disks being lost. 

RAID1

To support redundancy, data is mirrored across disks. So if a disk fails, a copy of that data exists in another disk. However, this means that half of the total disk size can be used, or 6 TB total.

RAID5

To support both performance and redudancy, data and its parity is striped across the disks. So if one disk fails, that data can be restored using the parity. In our case, this means 8 TB total capacity.

LVM

Logical Volume Management abstracts the physical layers from the file system layers. Allowing you to easily grow and scale file systems without the need to hard map them to drives.

Current Setup

I have four disks of different sizes. One 1TB, one 3 TB, and two 4 TB. To start with, my JBOD is just mounted as /mnt/drive1, /mnt/drive2, etc. To make life easier, I used symbolic links in my /shares directory to represent my different content: /shares/pictures/personal, /shares/backup, /shares/movies, etc. My NFS shares these with my other devices and Kubernetes (k8s) cluster.

I have backups to Google Cloud, Blurays/DVDs, and other devices (using Syncthing) across the internet. I actually just bought a new 4TB drive to replace my 3TB. 

Problem

When a device starts to fail, my SMART daemon will notify me of potential device failure. From experience, one sectore failing is not a big deal. But when several sectors start failing consistently, the drive will typically be dead within a year.

This actually happened to me. One of my devices got a bad sector, which had no file data stored on it. So it was easy correct (write 0's to the sector). I was getting notified of a drive failure for almost a year before it died. It only really died because I was trying to avoid pulling from my backups and directly copying my files off the drive. I got a lot of files off the disk, but eventually while recovering the data the drive died.

Restoring my data from Blurays/DVDs will take me a lot of time. With 4GB DVDs and 25 GB Blurays, there's a lot of data to restore a 3TB drive's worth of data. 

The real problem is the time to recover from disk failure. It's going to take me a long time to get that data back from the other physical disks. 

Solution

I analyzed my data usage across the drives and noticed I was doing some weird things. First, I would copy data (rsync) from different directories into my backup directory. My backup directory automatically got backed up to other devices and the cloud. I can reduce data usage by backing the directories themselves.

Secondly, I only had used a little less than 4 TB of data on my drives. So having 13 TB of drives for 4 TB of data is kind of a waste of money. As a result, I'm going with an LVM with RAID5 across the three drives to provide me with 8TB + 1TB of data. The real challenge will be to migrate the data.

Migration

The real challenge lies with moving data on the fly to the new setup. At a high level, here are the steps involved. Those lines highlighted are yet to be done.
  1. Reduce existing filesystems.
    1. resize2fs /dev/sdb1 2048G
  2. Change partitions to 2TB and create new LVM partitions for the remaining space.
    1. sgdisk --backup /sdb.original.backup /dev/sdb
    2. gdisk /dev/sdb
      1. delete partition
      2. create new partition with 2TB
      3. create 2nd partition with remaining using 8e00 type
    3. partprobe
  3. Create PVs for the new partitions and add them to a new copy VG.
    1. pvcreate /dev/sdb2
    2. vgcreate copy /dev/sdb2 ...
  4. Create a new copy LV into the copy VG.
    1. lvcreate -n main -l 100%FREE copy
  5. Create filesystem and mount
    1. mkfs.ext4 /dev/copy/main
    2. mkdir -p /mnt/copy
    3. mount /dev/copy/main /mnt/copy
  6. Copy all the existing files to the copy LV.
    1. rsync --times -rav /src/dir /mnt/copy/
  7. Reboot just to be safe and mount drives back
    1. reboot
    2. mount /dev/copy/main /mnt/copy
  8. Replace old partitions with LVM partitions.
    1. unmount /src/dir
    2. sgdisk --backup /sdb.phase1.backup /dev/sdb
    3. gdisk /dev/sdb
      1. delete partition 1
      2. create new partition of 8e00
  9. Create PVs for the old partitions and add them to disks VG
    1. pvcreate /dev/sdb1
    2. vgcreate disks /dev/sdb1 ... 
  10. create a RAID5 LV on the disks VG and wait until the LV is ready.
    1. lvcreate --type raid5 -i 2 -L 100%FREE -n data disks
    2. lvs
      1. Wait until Cpy%Sync column is 100%. This took me 3 days to complete for three 2TB partitions.
  11. Make filesystem and copy all the data from the copy LV.
    1. mkfs.ext4 /dev/mapper/disks-data
    2. mkdir -p /shares
    3. mount /dev/mapper/disks-data /shares
    4. rsync --times -rav /mnt/copy  /shares
  12. Reboot just to be safe and mount drives back
    1. reboot
    2. mount /dev/copy/main /mnt/copy
    3. mount /dev/mapper/disks-data /shares
  13. Remove the copy LV, VG, and PV.
    1. unmount /mnt/copy
    2. lvchange -an copy/main
    3. lvremove copy/main
    4. vgremove copy
    5. pvremove /dev/sdb2 ...
  14. Update the partitions so that only one LVM partition exists for the entire disk.
    1. sgdisk --backup /sdb.phase2.backup /dev/sdb
    2. gdisk /dev/sdb
      1. Remove both partitions
      2. Create a single 8e00 full disk partition
    3. partprobe
  15. Update the PVs to match their partitions and increase the LV to match the VG.
    1. pvresize /dev/sdb1
    2. lvextend -l +100%FREE disks/data
  16. Wait for LV to sync
    1. lvs
  17. Extend the filesystem
    1. umount /shares
    2. fsck -f /dev/mapper/disks-data
    3. resize2fs /dev/mapper/disks-data 
    4. fsck -f /dev/mapper/disks-data
    5. mount /dev/mapper/disks-data /shares
  18. Update /etc/fstab accordingly
    1. blkid
  19. Reboot to confirm everything okay

1 comment:

SuperFlyXXI said...

Working on the following to add a disk.

1. gdisk /dev/sdf
2. pvcreate /dev/sdf1
3. vgs
4. vgextend disks /dev/sdf1
5. lvs -o name,vgname,segtype,attr,size,stripesize,data_stripes
- views the stripes (2)
6. lvchange --syncaction check /dev/disks/data
- force checking the status of the disk
7. umount /shares
8. lvconvert --stripes 3 /dev/disks/data
9. lvchange --syncaction check /dev/disks/data
10. fsck -f /dev/mapper/disks-data
11. resize2fs /dev/mapper/disks-data
12. fsck -f /dev/mapper/disks-data
13. mount -av

Switching from JBOD to LVM RAID5

Before getting into the nitty-gritty, I want to make sure we have some understanding of the tools in place. For the sake of argument, we...