SMART
Self-Monitoring, Analysis, and Reporting Technology (SMART) is a feature of disks that provide means of monitor disk health.
This is usually done on the fly during reads of a sector. When a sector read fails, that sector will be marked as invalid. The next write to that sector, will actually reallocate to a spare sector on the disk. The problem is that by the time this is identified, the data is typically already lost. The reallocation will allow another sector to take this failed sector's place to avoid issues in the future.
Disks can support short and long self tests to periodically scan sectors for before an issue occurs.
RAID
Redundant Array of Independent Disks (RAID) can provide redundancy, performance improvement, or both to a group of disks. A group of disks are combined to make the operating system see a single disk that's accessible. RAID is never an alternative for back ups. There are more types of RAID that you can look into at your own, but below is a gist that you need to understand the rest of this post.
JBOD
Just a bunch of disks. This is just mounting disks into your computer and just treating them as completely independent. With this setup, you can utilize all size disk and combination. So in our case, 12 TB of capacity.
RAID0
To improve performance, data is spread across disks so when a data is read, disks can be read/written from/to in parallel to improve the performance. This allows for what appears as a single disk at 12 TB of capacity. Because of the striping of data, if there's a disk failure it's the equivalent of the all data across disks being lost.
RAID1
To support redundancy, data is mirrored across disks. So if a disk fails, a copy of that data exists in another disk. However, this means that half of the total disk size can be used, or 6 TB total.
RAID5
To support both performance and redudancy, data and its parity is striped across the disks. So if one disk fails, that data can be restored using the parity. In our case, this means 8 TB total capacity.
LVM
Logical Volume Management abstracts the physical layers from the file system layers. Allowing you to easily grow and scale file systems without the need to hard map them to drives.
Current Setup
I have four disks of different sizes. One 1TB, one 3 TB, and two 4 TB. To start with, my JBOD is just mounted as /mnt/drive1, /mnt/drive2, etc. To make life easier, I used symbolic links in my /shares directory to represent my different content: /shares/pictures/personal, /shares/backup, /shares/movies, etc. My NFS shares these with my other devices and Kubernetes (k8s) cluster.
I have backups to Google Cloud, Blurays/DVDs, and other devices (using Syncthing) across the internet. I actually just bought a new 4TB drive to replace my 3TB.
Problem
When a device starts to fail, my SMART daemon will notify me of potential device failure. From experience, one sectore failing is not a big deal. But when several sectors start failing consistently, the drive will typically be dead within a year.
This actually happened to me. One of my devices got a bad sector, which had no file data stored on it. So it was easy correct (write 0's to the sector). I was getting notified of a drive failure for almost a year before it died. It only really died because I was trying to avoid pulling from my backups and directly copying my files off the drive. I got a lot of files off the disk, but eventually while recovering the data the drive died.
Restoring my data from Blurays/DVDs will take me a lot of time. With 4GB DVDs and 25 GB Blurays, there's a lot of data to restore a 3TB drive's worth of data.
The real problem is the time to recover from disk failure. It's going to take me a long time to get that data back from the other physical disks.
Solution
Migration
- Reduce existing filesystems.
- resize2fs /dev/sdb1 2048G
- Change partitions to 2TB and create new LVM partitions for the remaining space.
- sgdisk --backup /sdb.original.backup /dev/sdb
- gdisk /dev/sdb
- delete partition
- create new partition with 2TB
- create 2nd partition with remaining using 8e00 type
- partprobe
- Create PVs for the new partitions and add them to a new copy VG.
- pvcreate /dev/sdb2
- vgcreate copy /dev/sdb2 ...
- Create a new copy LV into the copy VG.
- lvcreate -n main -l 100%FREE copy
- Create filesystem and mount
- mkfs.ext4 /dev/copy/main
- mkdir -p /mnt/copy
- mount /dev/copy/main /mnt/copy
- Copy all the existing files to the copy LV.
- rsync --times -rav /src/dir /mnt/copy/
- Reboot just to be safe and mount drives back
- reboot
- mount /dev/copy/main /mnt/copy
- Replace old partitions with LVM partitions.
- unmount /src/dir
- sgdisk --backup /sdb.phase1.backup /dev/sdb
- gdisk /dev/sdb
- delete partition 1
- create new partition of 8e00
- Create PVs for the old partitions and add them to disks VG
- pvcreate /dev/sdb1
- vgcreate disks /dev/sdb1 ...
- create a RAID5 LV on the disks VG and wait until the LV is ready.
- lvcreate --type raid5 -i 2 -L 100%FREE -n data disks
- lvs
- Wait until Cpy%Sync column is 100%. This took me 3 days to complete for three 2TB partitions.
- Make filesystem and copy all the data from the copy LV.
- mkfs.ext4 /dev/mapper/disks-data
- mkdir -p /shares
- mount /dev/mapper/disks-data /shares
- rsync --times -rav /mnt/copy /shares
- Reboot just to be safe and mount drives back
- reboot
- mount /dev/copy/main /mnt/copy
- mount /dev/mapper/disks-data /shares
- Remove the copy LV, VG, and PV.
- unmount /mnt/copy
- lvchange -an copy/main
- lvremove copy/main
- vgremove copy
- pvremove /dev/sdb2 ...
- Update the partitions so that only one LVM partition exists for the entire disk.
- sgdisk --backup /sdb.phase2.backup /dev/sdb
- gdisk /dev/sdb
- Remove both partitions
- Create a single 8e00 full disk partition
- partprobe
- Update the PVs to match their partitions and increase the LV to match the VG.
- pvresize /dev/sdb1
- lvextend -l +100%FREE disks/data
- Wait for LV to sync
- lvs
- Extend the filesystem
- umount /shares
- fsck -f /dev/mapper/disks-data
- resize2fs /dev/mapper/disks-data
- fsck -f /dev/mapper/disks-data
- mount /dev/mapper/disks-data /shares
- Update /etc/fstab accordingly
- blkid
- Reboot to confirm everything okay
1 comment:
Working on the following to add a disk.
1. gdisk /dev/sdf
2. pvcreate /dev/sdf1
3. vgs
4. vgextend disks /dev/sdf1
5. lvs -o name,vgname,segtype,attr,size,stripesize,data_stripes
- views the stripes (2)
6. lvchange --syncaction check /dev/disks/data
- force checking the status of the disk
7. umount /shares
8. lvconvert --stripes 3 /dev/disks/data
9. lvchange --syncaction check /dev/disks/data
10. fsck -f /dev/mapper/disks-data
11. resize2fs /dev/mapper/disks-data
12. fsck -f /dev/mapper/disks-data
13. mount -av
Post a Comment