So your reshape/grow crashed for some reason. mdadm says this:
# mdadm -A --scan
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
# mdadm -A --scan --verbose
mdadm: /dev/md/5 is identified as a member of /dev/md/3, slot 3.
mdadm: /dev/md/4 is identified as a member of /dev/md/3, slot 2.
mdadm: /dev/md/1 is identified as a member of /dev/md/3, slot 0.
mdadm: /dev/md/2 is identified as a member of /dev/md/3, slot 1.
mdadm:/dev/md/3 has an active reshape - checking if critical section needs to be restored
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
## Set the devices we want to overlay
DEVICES="/dev/md/1 /dev/md/2 /dev/md/4 /dev/md/5"
## Create a /dev/loop for each of the files
parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}' ::: $DEVICES
## Create some sparse files. If 4000G is too big, lower that
parallel truncate -s4000G overlay-{/} ::: $DEVICES
## Setup the overlay
parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES
## Make a variable with the overlay devices
OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
## Print them
echo $OVERLAYS
## Check the status on the overlay devices
dmsetup status
You will later undo the overlay files with:
## Dont do this now.
parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
parallel losetup -d ::: /dev/loop[0-9]*
Check you get the same error:
# mdadm -A /dev/md3 $OVERLAYS
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
Check how far the reshape got (Reshape pos'n):
# mdadm -E $OVERLAYS
/dev/mapper/1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 7529d9f4:3e69776c:df6bf129:ffd1f902
Name : lemaitre:3 (local to host lemaitre)
Creation Time : Mon Nov 5 17:52:58 2012
Raid Level : raid4
Raid Devices : 5
Avail Dev Size : 54698242800 (26082.16 GiB 28005.50 GB)
Array Size : 109396484096 (104328.62 GiB 112022.00 GB)
Used Dev Size : 54698242048 (26082.15 GiB 28005.50 GB)
Data Offset : 16 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f4d73202:5d6af3b9:c13d9a84:ef67abc0
Reshape pos'n : 109169573888 (104112.22 GiB 111789.64 GB)
Update Time : Wed Jul 10 09:51:37 2013
Checksum : f16a7a58 - correct
Events : 15431828
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA. ('A' == active, '.' == missing)
Assembling still does not work with --force and --run:
# mdadm --assemble --force --run --verbose $OVERLAYS
mdadm: device /dev/mapper/1 exists but is not an md array.
root@lemaitre:/lemaitre-internal# mdadm --assemble --force --run --verbose /dev/md3 $OVERLAYS
mdadm: looking for devices for /dev/md3
mdadm: /dev/mapper/1 is identified as a member of /dev/md3, slot 0.
mdadm: /dev/mapper/2 is identified as a member of /dev/md3, slot 1.
mdadm: /dev/mapper/4 is identified as a member of /dev/md3, slot 2.
mdadm: /dev/mapper/5 is identified as a member of /dev/md3, slot 3.
mdadm:/dev/md3 has an active reshape - checking if critical section needs to be restored
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
# mdadm --assemble --verbose --invalid-backup --force /dev/md3 $OVERLAYS
mdadm: looking for devices for /dev/md3
mdadm: /dev/mapper/1 is identified as a member of /dev/md3, slot 0.
mdadm: /dev/mapper/2 is identified as a member of /dev/md3, slot 1.
mdadm: /dev/mapper/4 is identified as a member of /dev/md3, slot 2.
mdadm: /dev/mapper/5 is identified as a member of /dev/md3, slot 3.
mdadm:/dev/md3 has an active reshape - checking if critical section needs to be restored
mdadm: Failed to find backup of critical section
mdadm: continuing without restoring backup
mdadm: added /dev/mapper/2 to /dev/md3 as 1
mdadm: added /dev/mapper/4 to /dev/md3 as 2
mdadm: added /dev/mapper/5 to /dev/md3 as 3
mdadm: no uptodate device for slot 4 of /dev/md3
mdadm: added /dev/mapper/1 to /dev/md3 as 0
mdadm: array: Cannot grow - need backup-file
mdadm: failed to RUN_ARRAY /dev/md3: No such file or directory
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md3 : active (read-only) raid4 dm-0[0] dm-3[3] dm-2[4] dm-1[1]
109396484096 blocks super 1.2 level 4, 512k chunk, algorithm 5 [5/4] [UUUU_]
Looks good but is read-only. Make it read-write with:
# mdadm --readwrite /dev/md3
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md3 : active raid4 dm-0[0] dm-3[3] dm-2[4] dm-1[1]
109396484096 blocks super 1.2 level 4, 512k chunk, algorithm 5 [5/4] [UUUU_]
[===================>.] reshape = 99.7% (27294226944/27349121024) finish=30.4min speed=30013K/sec
Much better. But the reshape is going into the overlay files, so we may run out of disk space. Slow down the reshape for now:
# echo 0 > /proc/sys/dev/raid/speed_limit_max
# echo 0 > /proc/sys/dev/raid/speed_limit_min
# sleep 30
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md3 : active raid4 dm-0[0] dm-3[3] dm-2[4] dm-1[1]
109396484096 blocks super 1.2 level 4, 512k chunk, algorithm 5 [5/4] [UUUU_]
[===================>.] reshape = 99.8% (27295675392/27349121024) finish=1670176.0min speed=0K/sec
Great. Now let us assess the damage.
fsck /dev/md3
If the file system uses xfs:
xfs_repair /dev/md3
Because we are doing this on overlay files there is no need to do a read-only fsck.
parallel losetup -d ::: /dev/loop[0-9]*
# Re-do the steps that worked for you
# For me it was:
DEVICES="/dev/md/1 /dev/md/2 /dev/md/4 /dev/md/5"
mdadm --assemble --verbose --invalid-backup --force /dev/md3 $DEVICES
cat /proc/mdstat
mdadm --readwrite /dev/md3
echo 0 > /proc/sys/dev/raid/speed_limit_max
echo 0 > /proc/sys/dev/raid/speed_limit_min
sleep 30
cat /proc/mdstat
xfs_repair /dev/md3
mkdir /mnt/disk
Now let the reshape complete:
echo 30000 > /proc/sys/dev/raid/speed_limit_max
echo 30000 > /proc/sys/dev/raid/speed_limit_min
# mdadm -A --scan
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
# mdadm -A --scan --verbose
mdadm: /dev/md/5 is identified as a member of /dev/md/3, slot 3.
mdadm: /dev/md/4 is identified as a member of /dev/md/3, slot 2.
mdadm: /dev/md/1 is identified as a member of /dev/md/3, slot 0.
mdadm: /dev/md/2 is identified as a member of /dev/md/3, slot 1.
mdadm:/dev/md/3 has an active reshape - checking if critical section needs to be restored
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
Limit the damage
First step is to limit the damage. We may have to experiment a little and we really do not want to restore a backup just because we changed a few MB. You can overlay a device with a file: Writes will go to the overlay file, and reads will try the overlay file first and then the actual device. The files can be sparse files and thus will only take up as much space as is written to them.## Set the devices we want to overlay
DEVICES="/dev/md/1 /dev/md/2 /dev/md/4 /dev/md/5"
## Create a /dev/loop for each of the files
parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}' ::: $DEVICES
## Create some sparse files. If 4000G is too big, lower that
parallel truncate -s4000G overlay-{/} ::: $DEVICES
## Setup the overlay
parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES
## Make a variable with the overlay devices
OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
## Print them
echo $OVERLAYS
## Check the status on the overlay devices
dmsetup status
You will later undo the overlay files with:
## Dont do this now.
parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
parallel losetup -d ::: /dev/loop[0-9]*
Check you get the same error:
# mdadm -A /dev/md3 $OVERLAYS
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
Check how far the reshape got (Reshape pos'n):
# mdadm -E $OVERLAYS
/dev/mapper/1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 7529d9f4:3e69776c:df6bf129:ffd1f902
Name : lemaitre:3 (local to host lemaitre)
Creation Time : Mon Nov 5 17:52:58 2012
Raid Level : raid4
Raid Devices : 5
Avail Dev Size : 54698242800 (26082.16 GiB 28005.50 GB)
Array Size : 109396484096 (104328.62 GiB 112022.00 GB)
Used Dev Size : 54698242048 (26082.15 GiB 28005.50 GB)
Data Offset : 16 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f4d73202:5d6af3b9:c13d9a84:ef67abc0
Reshape pos'n : 109169573888 (104112.22 GiB 111789.64 GB)
Update Time : Wed Jul 10 09:51:37 2013
Checksum : f16a7a58 - correct
Events : 15431828
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA. ('A' == active, '.' == missing)
Assembling still does not work with --force and --run:
# mdadm --assemble --force --run --verbose $OVERLAYS
mdadm: device /dev/mapper/1 exists but is not an md array.
root@lemaitre:/lemaitre-internal# mdadm --assemble --force --run --verbose /dev/md3 $OVERLAYS
mdadm: looking for devices for /dev/md3
mdadm: /dev/mapper/1 is identified as a member of /dev/md3, slot 0.
mdadm: /dev/mapper/2 is identified as a member of /dev/md3, slot 1.
mdadm: /dev/mapper/4 is identified as a member of /dev/md3, slot 2.
mdadm: /dev/mapper/5 is identified as a member of /dev/md3, slot 3.
mdadm:/dev/md3 has an active reshape - checking if critical section needs to be restored
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
Try --invalid-backup
Next step is try to assemble without the backup file:# mdadm --assemble --verbose --invalid-backup --force /dev/md3 $OVERLAYS
mdadm: looking for devices for /dev/md3
mdadm: /dev/mapper/1 is identified as a member of /dev/md3, slot 0.
mdadm: /dev/mapper/2 is identified as a member of /dev/md3, slot 1.
mdadm: /dev/mapper/4 is identified as a member of /dev/md3, slot 2.
mdadm: /dev/mapper/5 is identified as a member of /dev/md3, slot 3.
mdadm:/dev/md3 has an active reshape - checking if critical section needs to be restored
mdadm: Failed to find backup of critical section
mdadm: continuing without restoring backup
mdadm: added /dev/mapper/2 to /dev/md3 as 1
mdadm: added /dev/mapper/4 to /dev/md3 as 2
mdadm: added /dev/mapper/5 to /dev/md3 as 3
mdadm: no uptodate device for slot 4 of /dev/md3
mdadm: added /dev/mapper/1 to /dev/md3 as 0
mdadm: array: Cannot grow - need backup-file
mdadm: failed to RUN_ARRAY /dev/md3: No such file or directory
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md3 : active (read-only) raid4 dm-0[0] dm-3[3] dm-2[4] dm-1[1]
109396484096 blocks super 1.2 level 4, 512k chunk, algorithm 5 [5/4] [UUUU_]
Looks good but is read-only. Make it read-write with:
# mdadm --readwrite /dev/md3
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md3 : active raid4 dm-0[0] dm-3[3] dm-2[4] dm-1[1]
109396484096 blocks super 1.2 level 4, 512k chunk, algorithm 5 [5/4] [UUUU_]
[===================>.] reshape = 99.7% (27294226944/27349121024) finish=30.4min speed=30013K/sec
Much better. But the reshape is going into the overlay files, so we may run out of disk space. Slow down the reshape for now:
# echo 0 > /proc/sys/dev/raid/speed_limit_max
# echo 0 > /proc/sys/dev/raid/speed_limit_min
# sleep 30
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md3 : active raid4 dm-0[0] dm-3[3] dm-2[4] dm-1[1]
109396484096 blocks super 1.2 level 4, 512k chunk, algorithm 5 [5/4] [UUUU_]
[===================>.] reshape = 99.8% (27295675392/27349121024) finish=1670176.0min speed=0K/sec
Great. Now let us assess the damage.
fsck
See what fsck says:fsck /dev/md3
If the file system uses xfs:
xfs_repair /dev/md3
Because we are doing this on overlay files there is no need to do a read-only fsck.
mount
Assuming the fsck completed and fixed any errors, mount the file system.
mkdir /mnt/disk
mount /dev/md3 /mnt/disk
XFS sometimes like to get unmounted again before first use, so:
umount /dev/md3
mount /dev/md3 /mnt/disk
Look around in /mnt/disk. Check /mnt/disk/lost+found:
find /mnt/disk/lost+found
If there are no files there, fsck did not rescue any files. That is typically a good sign, as that can mean that no directories were corrupt. Now check if the disk usage is what you expect:
# df /mnt/disk
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md3 109394397184 48950231280 60444165904 45% /mnt/disk
/dev/md3 109394397184 48950231280 60444165904 45% /mnt/disk
That looks good, I expected around 45% free, so if anything got lost it would be small things. And if fsck did not complain at all, then nothing was lost.
Doing it for real
Now it is time to disable the overlay files and do the same for real. You should have taken notes of the exact steps that worked for you earlier. If you did not, remove and add the overlay and do it again.
umount /mnt/disk
mdadm --stop /dev/md3
# Remove overlay
parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICESparallel losetup -d ::: /dev/loop[0-9]*
# Re-do the steps that worked for you
# For me it was:
DEVICES="/dev/md/1 /dev/md/2 /dev/md/4 /dev/md/5"
mdadm --assemble --verbose --invalid-backup --force /dev/md3 $DEVICES
cat /proc/mdstat
mdadm --readwrite /dev/md3
echo 0 > /proc/sys/dev/raid/speed_limit_max
echo 0 > /proc/sys/dev/raid/speed_limit_min
sleep 30
cat /proc/mdstat
xfs_repair /dev/md3
mkdir /mnt/disk
mount /dev/md3 /mnt/disk
umount /dev/md3
mount /dev/md3 /mnt/disk
find /mnt/disk/lost+found
df /mnt/disk
umount /dev/md3
Now let the reshape complete:
echo 30000 > /proc/sys/dev/raid/speed_limit_max
echo 30000 > /proc/sys/dev/raid/speed_limit_min
No comments:
Post a Comment