Intro
As an IT guy, I have started to get a little bit of my own personal lab, and when sometimes something breaks, I always try to see it as a fun way to gain some experience and put on my troubleshooting hat. The fun stops however, when you know that you might possibly lost all of your personal data on your NAS .
Background
This is what happened on my Synology DS415play after a power cut.
It started with the fact that my Synology was having disk errors on one of the disks and that the Synology itself was getting some weird problems. The Synology became sometimes unresponsive and became unreachable. For instance, ssh was rejected, shares couldn’t be accessed, the web interface was also down, and DSM assistant couldn’t even reach or see the Synology. A soft reset was not possible through one the interfaces, and the Synology also didn’t reboot by pushing the power button. So not an ideal situation. This all happened after an update, but I think this may also have been a problem in combination with the disk failure of one of the 4 disks that I had. I hoped that it was busy with some consistency checks, but after a few days letting it do its own thing, I noticed that the disks spinned up and down every 10 min by the sound that it was making. So the Synology seemed to be stuck in a sort of boot loop.
So stuck in a boot loop, with no possible way to view the web interface or console, I was getting pretty nervous. Luckily, I knew which disk was having problems, and so I decided to pull out the probably defected disk. I replaced the disk but still no improvement. Then, I decided to boot up without the faulty or replacement disk, and booted the Synology with only 3 disks. After that action Synology finally booted up, but the next panic attack started since I saw the screen above. Logic said that my data was still there, since I used a RAID 5 (SHR) setup with 4 disks, and only 1 disk was broken. I also knew that I removed the right disk, and that I didn’t touch the others. With my Synology up and running again, it was time to troubleshoot.
Getting my Synology volume back to work / mounted again.
Unfortunately, I’ve lost some of the screenshots that I made off the console, but I still know the commands and step that I’ve used. So first I connected to my Synology with ssh and checked if there was still a volume present. So I scanned for the Physical Volume (PV) with the command.
Syn-vSAM> lvm vgscan Reading all physical volumes. This may take a while... Found volume group "vg1000" using metadata type lvm2
So this was good news, It saw a volume group called “vg1000”. I Then tried to Enable this with the command.
Syn-vSAM> lvm vgchange -a y vg1000 1 logical volume(s) in volume group "vg1000" now active
So the Logical Volume in the volume group vg1000 became active.
A small step in the recovery process, but a big step for getting my hopes up.
After that I tried to mount the volume with the command.
Syn-vSAM> mount /dev/vg1000/lv /volume1 mount: mounting /dev/vg1000/lv on /volume1 failed: No such device
But unfortunately, this didn’t work. The next command gave some good insight about the logical structure of the PV, LV and VG.
Syn-vSAM> vgdisplay -v Finding all volume groups Finding volume group "vg1000" --- Volume group --- VG Name vg1000 System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 4 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 5.90 TB PE Size 4.00 MB Total PE 1376322 Alloc PE / Size 1376322 / 5.90 TB Free PE / Size 0 / 0 VG UUID SSc872-duUFD-D8dbds…….. --- Logical volume --- LV Name /dev/vg1000/lv VG Name vg1000 LV UUID SSc872-duUFD-D8dbds…….. LV Write Access read/write LV Status available # open 0 LV Size 5.90 TB Current LE 1376322 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 253:0 --- Physical volumes --- PV Name /dev/md2 PV UUID Sodi8-YHNCi-…. PV Status allocatable Total PE / Free PE 1376322/ 0
The data was still there as I could see that the Logical Volume Manager (LVM) still had the PV, LV, and VG mapped. So most probably there was some corruption that blocked the mounting of the volume.
Syn-vSAM> sudo cd /dev/vg1000/ Syn-vSAM> ls -la drwxr-xr-x 4 root root 45 Sep 22 15:35 . drwxr-xr-x 15 root root 13000 Sep 22 15:35 .. lrwxrwxrwx 1 root root 25 Sep 22 15:35 lv -> /dev/mapper/vg1000-lv
After that I checked the mapper.
Syn-vSAM> ls -la /dev/mapper/vg1000-lv brw------- 1 root root 253, 0 Sep 22 15:41 /dev/mapper/vg1000-lv
I noticed the 0, which after some googling, means 0 bytes. I actually found a similar post that was having a similar issue, which reconfirmed my thoughts. The Lv was corrupted.
So time to do a system file check.
Syn-vSAM> cat /etc/fstab none /proc proc defaults 0 0 /dev/root / ext4 defaults 1 1 /dev/vg1000/lv /volume1 ext4 usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl 0 0
With that I knew that the volume was ext4, and thus I could start the file system check with:
Syn-vSAM> fsck.ext4 /dev/vg1000/lv
After that I got a lot of (scary) messages which almost gave me the feeling that it was not recoverable. For a long time the Synology was checking the file system, and constantly prompting me if I wanted to take a corrective action for the corrupted data, which I replied with yes. After several hours, it was finally done.
I tried to mount the volume once more, but this time in read-only. Just to be sure.
Syn-vSAM> mount -o ro /dev/vg1000/lv
Afther that I listed the content with “ls” and I finally saw that it was mounted successfully.
Happy happy joy joy ^_^.
Then I rebooted the Synology so that it could mount the volume naturally. Once the Synology was rebooted, I had my volume back. The only thing left to do was rebuilding the RAID setup on a new replaced disk.
After that Everything was good 😊.
Samir is the author of vSAM.Pro & a Life enthusiast who works as a consultant in the field of IT. With a great passion for Tech & Personal Development, he loves to help people with their problems, but also inspire them with a positive outlook on life.
Besides that, he is also a big Sport & Music junky that loves to spend a big chunk of his time on producing music or physically stretching himself.
Thanks
It helped me a lot !!
just notice that I had to free the device mapper before fsck. If not, I cannot do fsck :
/dev/vg1000/lv is in use.
e2fsck: Cannot continue, aborting.
So with :
dmsetup info -c
brw——- 1 root root 253, 0 Jun 15 13:36 /dev/dm-0
brw——- 1 root root 253, 1 Jun 15 13:36 /dev/dm-1
ls -la /sys/dev/block/253\:0/holders
lrwxrwxrwx 1 root root 0 Jun 15 16:54 dm-1 -> ../../dm-1
253 is Maj and 0 is Min block for me
I can check which mapper I need to remove, and then :
dmsetup remove /dev/dm-1
Now the logical volume is ready to be checked:
fsck.ext4 /dev/vg1000/lv
@Laurent, thank you for this information. I didn’t encounter that problem, but thank you for sharing this with me. Will be really helpful for other people who have this issue, so thank you for this. Were you able to recover your data afterwards?
Best Regards,
Samir
Thank you both, guys! That helped me too.
Happy To hear ^_^
Just checking in to say.. THANK YOU! I had a power outage and when the system came back up, my volume was crashed. You saved me. Time to get a UPS and cloud backup, thanks again!
Hey Eric,
Great to hear, happy that this post could help you and thank you for your feedback as well 😊.
Don’t forget that you can also backup your data on an external HDD with Synology.
I always keep an offline backup as well every month, in case my complete system maybe gets crashed or maybe get one of those infamous crypto lockers. It makes you less invulnerable when something happens to your Synology.
Have a great day.
Dear Samir,
Dear Samir,
Your post give me some hope.
I’m exactly in the same situation but with a BTRFS file system, not ext4.
I’m looking for a solution from days now and nobody want to help me.
For the moment I desperate front of a lost of all my family pictures, movies, backups devices etc…near 4To of life here.
I opened a ticket with Sonology support, but as my NAs is out of warranty, they don’t want to help me.
I seen others poeple into the forum Syno that they had receive a remote help & the technician gibed a mount in read only for a SHR1 BTFRS volume crashed, this to give the way to copy the data to another place.
Do you mind you will able to help me ?
Dear Pascal,
I’m so sorry to hear this. To help you properly, I need to know a little bit more than this. At the same time I would like to give you some advice that maybe much more better for you.
First off, your data is probably still there, but it is probably just not readable. I don’t know what happened, but there is a big chance that some sectors just became corrupt, which prevent the Operating System of Synology to boot up. Now the first and best tip I can give that prevent further damage, is to take your Time!!!
If you trying to fix it in a haste, you’ll probably end up damaging it even further. I know it is difficult, but try to stay calm on the thought that your data is probably still there, but you just can’t access it currently.
Now are a few paths that you can follow in this scenario.
Path 1: you try to fix the Synology, and try to access the files after you fixed the Synology.
Path 2: If possible, and you can access the synology with ssh and still use quite some command, you could try to copy the files over to another device. Probably the method which you reffered to earlier. And if possible, your best, quickest and safest option
Path 3: you try to access the files with some external Recovery Tools and make a RAW image which you then use to modify.
Path 1: Is probably the most quickest way, but can also be one of the most harmful ones. If you don’t know what you are doing, stop making adjustments. There were a lot of times in my life that I had to repair something, which in that moment I didn’t had the proper skillset for. You can then choose to take your time and learn that skillset, or you’ll let someone else do it.
Path 2: If this is possible, it is one the best options for you to secure your data. If it is possible to just simply copy the files to another device, you’re good.
Path 3: Is probably the most intense one, but it is also the best option for trying to save as much data as possible. It will cost you a lot of storage space on another device, since a RAW image is really big. But it will give you also an option to go back to, in case you fuck something up.
But in all honesty, I don’t know how difficult it is to make a RAW image of a RAID system. For a single disk, this is really simple. You hook the Hard drive up to a Linux machine, you make sure it is not mounted and read only, and you make a raw image. I even did this with defected disks. Since you’ll probably use RAID something, taking each disk separately and copy it will probably wont work. It may work as a back up for restoring the RAID in the future, but I wouldn’t bet on it or try it if you are not sure.
So, first stay calm and don’t stress. And think of how important that data is and how secure you feel with troubleshooting yourself. If you don’t feel good enough, you can choose to learn it or just pay someone to fix it. However, never experiment with that personal data if you are still learning. Try to test it somewhere. If IT Is not your thing, then please pay someone to fix it. Since he can have the proper tools (Both Hardware as Software) to do it. To me, personal data means a lot and I’m willing to pay for it or to take a lot of time to fix it. For my parents I try to fix a pc that had a cryptolocker on it. The project is already taking more than a few years. But since almost no one can fix those issues, I’m willing to take my time for that and to learn.
Hope this advice could help you in any kind of way.
Let me know how this is working out for you.
Samir
Thank You Samir for all these advises.
I made many reshearch and I got now a copy from all the data that I really need to keep.
The next step now it’s to try to fix the volume crashed state into my synology.
My syno can still boot as he got two volumes.
1 SSD basic 1disk ext4 volume where is installed all my applications and docker.
and the second volume btrfs raid SHR1 4disk where all my data are, show good state for the raid and smart disk but volume stay crashed into my storage manager and shared folders are not show.
I went into root SSH and
use “vgchange -ay” and “mount -o recovery,ro /dev/vg1000/lv /volume2” to mount it in read only mode and did a “rsync” to take a copy to another NAS.
Now I want to try to resolve the crashed volume situation into DSM.
Do you think you can help me with this ?
I can’t promise you anything but I will see and try to help you as good as possible.
Send me an e-mail at:
info@vsam.pro
And make a clear description of the status it is currently in, the errors that it shows, and please send me as much screenshots as possible. The more info, the better I can help.
Best Regards,
Samir
I’m trying now to fix the btrfs as the dirty stop of my system just corrupt the fs superblock and I got apparently few block failed into the parent transit.
here is what I just tried:
btrfs rescue super-recover /dev/vg1000/lv
nosuccess!
btrfs rescue zero-log /dev/vg1000/lv
nosuccess!
btrfs rescue chunk-recover /dev/vg1000/lv
this take a while! and still in action…cross fingers.
For other readers,
Happy to confirm that Pascal was able to recover all of its data.
Hi
My volume suddenly crashed, but storage pool was ok. All drives OK. Volume was in read only mode, and I copied importat files before anything.
I did your comands, all ok, but in the mount one it said already mounted, and couldn’t umount because said it was in use.
I was so scared to do anything, because I have 2 disk fault tolerance and out of nowere volume crashed.
Now I did reboot and then all was fine, volume green, all working properly?!
What was the reason then? I don’t want to this happen when I need it the most, so I want to find the cause. could it be bad ram? controller?
in dmesg I found many of this logs:
BTRFS warning (device dm-0): csum failed ino 2810 off 36864 csum 2795832003 expected csum 0
should I stop using this box?
Is BTRFS safe to use or will it cause more of this in the future?
thanks and sorry for the anger.
I sorry, I’ve missed this comment completely.
For all other readers, just a simple tip. If you already experienced some instability or downtime issues, don’t be lazy or wait off your chances to hope it keeps stable. Immediatly make a back-up of all your data, which is actually an advice you should always do. The fact that you have some redundancy is nice, but it doesn’t mean you are safe for all the disasters. If a powercut happends, a crypto locker goes into your synology, or several disks go broke at the same time (which happens often because most of the time all disks are bought from the same vendor at the same time) you can lose everything. Especially if it is the place where you save everything of your personal stuff.
Hope I could help others with this comment and hope you all recover your data
Thanks a lot Samir for your work and sharing. Unfortunately my DSM volume crashed last week-end. I have followed your post and my volume is back online. You saved my day & my photos.
My wife thanks you too 🙂
Awesome to hear, really happy for you and your wife. Wish you the best.
HEllo Samir,
Thank you for this tutorial. I have followed it until : this part :
root@dsm01:/# cat /etc/fstab
none /proc proc defaults 0 0
/dev/root / ext4 defaults 1 1
root@dsm01:/# fsck.ext4 /dev/vg1/volume_1
e2fsck 1.42.6 (21-Sep-2012)
ext2fs_open2: Bad magic number in super-block
fsck.ext4: Superblock invalid, trying backup blocks…
fsck.ext4: Bad magic number in super-block while trying to open /dev/vg1/volume_1
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193
what should I do more ?
Any hint it’s greatly appreciated !
Daniel