Crashed Synology Volume, and how to Restore (DS415 Play)

Intro

As an IT guy, I have started to get a little bit of my own personal lab, and when sometimes something breaks, I always try to see it as a fun way to gain some experience and put on my troubleshooting hat. The fun stops however, when you know that you might possibly lost all of your personal data on your NAS .

Background

This is what happened on my Synology DS415play after a power cut.

It started with the fact that my Synology was having disk errors on one of the disks and that the Synology itself was getting some weird problems. The Synology became sometimes unresponsive and became unreachable. For instance, ssh was rejected, shares couldn’t be accessed, the web interface was also down, and DSM assistant couldn’t even reach or see the Synology. A soft reset was not possible through one the interfaces, and the Synology also didn’t reboot by pushing the power button. So not an ideal situation. This all happened after an update, but I think this may also have been a problem in combination with the disk failure of one of the 4 disks that I had. I hoped that it was busy with some consistency checks, but after a few days letting it do its own thing, I noticed that the disks spinned up and down every 10 min by the sound that it was making. So the Synology seemed to be stuck in a sort of boot loop.

So stuck in a boot loop, with no possible way to view the web interface or console, I was getting pretty nervous. Luckily, I knew which disk was having problems, and so I decided to pull out the probably defected disk. I replaced the disk but still no improvement. Then, I decided to boot up without the faulty or replacement disk, and booted the Synology with only 3 disks. After that action Synology finally booted up, but the next panic attack started since I saw the screen above. Logic said that my data was still there, since I used a RAID 5 (SHR) setup with 4 disks, and only 1 disk was broken. I also knew that I removed the right disk, and that I didn’t touch the others. With my Synology up and running again, it was time to troubleshoot.

Getting my Synology volume back to work / mounted again.

Unfortunately, I’ve lost some of the screenshots that I made off the console, but I still know the commands and step that I’ve used. So first I connected to my Synology with ssh and checked if there was still a volume present. So I scanned for the Physical Volume (PV) with the command.

Syn-vSAM> lvm vgscan
Reading all physical volumes.  This may take a while...
Found volume group "vg1000" using metadata type lvm2

 So this was good news, It saw a volume group called “vg1000”. I Then tried to Enable this with the command.

Syn-vSAM> lvm vgchange -a y vg1000
  1 logical volume(s) in volume group "vg1000" now active

So the Logical Volume in the volume group vg1000 became active.
A small step in the recovery process, but a big step for getting my hopes up.
After that I tried to mount the volume with the command.

Syn-vSAM> mount /dev/vg1000/lv /volume1
mount: mounting /dev/vg1000/lv on /volume1 failed: No such device

But unfortunately, this didn’t work. The next command gave some good insight about the logical structure of the PV, LV and VG.

Syn-vSAM> vgdisplay -v
Finding all volume groups
Finding volume group "vg1000"
--- Volume group ---
VG Name               vg1000
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No  4
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                1
Open LV               0
Max PV                0
Cur PV                1
Act PV                1
VG Size               5.90 TB
PE Size               4.00 MB
Total PE              1376322
Alloc PE / Size       1376322 / 5.90 TB
Free  PE / Size       0 / 0
VG UUID               SSc872-duUFD-D8dbds……..

  --- Logical volume ---
LV Name                /dev/vg1000/lv
VG Name                vg1000
LV UUID                SSc872-duUFD-D8dbds……..
LV Write Access        read/write
LV Status              available
# open                 0
LV Size                5.90 TB
Current LE             1376322
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     4096
Block device           253:0

  --- Physical volumes ---
PV Name               /dev/md2
PV UUID               Sodi8-YHNCi-….
PV Status             allocatable
Total PE / Free PE    1376322/ 0

The data was still there as I could see that the Logical Volume Manager (LVM) still had the PV, LV, and VG mapped. So most probably there was some corruption that blocked the mounting of the volume.

Syn-vSAM> sudo cd /dev/vg1000/
Syn-vSAM> ls -la
drwxr-xr-x    4 root     root            45 Sep 22 15:35 .
drwxr-xr-x   15 root     root         13000 Sep 22 15:35 ..
lrwxrwxrwx    1 root     root            25 Sep 22 15:35 lv -> /dev/mapper/vg1000-lv

After that I checked the mapper.

Syn-vSAM> ls -la /dev/mapper/vg1000-lv
brw-------    1 root     root      253,   0 Sep 22 15:41 /dev/mapper/vg1000-lv

I noticed the 0, which after some googling, means 0 bytes. I actually found a similar post that was having a similar issue, which reconfirmed my thoughts. The Lv was corrupted.

So time to do a system file check.

Syn-vSAM> cat /etc/fstab
none /proc proc defaults 0 0
/dev/root / ext4 defaults 1 1
/dev/vg1000/lv /volume1 ext4 usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,synoacl 0 0

With that I knew that the volume was ext4, and thus I could start the file system check with:

Syn-vSAM> fsck.ext4 /dev/vg1000/lv

After that I got a lot of (scary) messages which almost gave me the feeling that it was not recoverable. For a long time the Synology was checking the file system, and constantly prompting me if I wanted to take a corrective action for the corrupted data, which I replied with yes. After several hours, it was finally done.

I tried to mount the volume once more, but this time in read-only. Just to be sure.

Syn-vSAM> mount -o ro /dev/vg1000/lv

Afther that I listed the content with “ls” and I finally saw that it was mounted successfully.

Happy happy joy joy ^_^.

Then I rebooted the Synology so that it could mount the volume naturally. Once the Synology was rebooted, I had my volume back. The only thing left to do was rebuilding the RAID setup on a new replaced disk.

After that Everything was good 😊.

↑↑ Follow me on my Socialz ↑↑ - Or - ↓↓ Care & Share ↓↓

17 thoughts on “Crashed Synology Volume, and how to Restore (DS415 Play)

  1. Thanks
    It helped me a lot !!

    just notice that I had to free the device mapper before fsck. If not, I cannot do fsck :
    /dev/vg1000/lv is in use.
    e2fsck: Cannot continue, aborting.

    So with :
    dmsetup info -c
    brw——- 1 root root 253, 0 Jun 15 13:36 /dev/dm-0
    brw——- 1 root root 253, 1 Jun 15 13:36 /dev/dm-1

    ls -la /sys/dev/block/253\:0/holders
    lrwxrwxrwx 1 root root 0 Jun 15 16:54 dm-1 -> ../../dm-1

    253 is Maj and 0 is Min block for me

    I can check which mapper I need to remove, and then :
    dmsetup remove /dev/dm-1

    Now the logical volume is ready to be checked:
    fsck.ext4 /dev/vg1000/lv

    1. @Laurent, thank you for this information. I didn’t encounter that problem, but thank you for sharing this with me. Will be really helpful for other people who have this issue, so thank you for this. Were you able to recover your data afterwards?

      Best Regards,
      Samir

      1. Thank you both, guys! That helped me too.

        1. Happy To hear ^_^

  2. Just checking in to say.. THANK YOU! I had a power outage and when the system came back up, my volume was crashed. You saved me. Time to get a UPS and cloud backup, thanks again!

    1. Hey Eric,
      Great to hear, happy that this post could help you and thank you for your feedback as well 😊.

      Don’t forget that you can also backup your data on an external HDD with Synology.
      I always keep an offline backup as well every month, in case my complete system maybe gets crashed or maybe get one of those infamous crypto lockers. It makes you less invulnerable when something happens to your Synology.

      Have a great day.

  3. Dear Samir,
    Dear Samir,
    Your post give me some hope.
    I’m exactly in the same situation but with a BTRFS file system, not ext4.
    I’m looking for a solution from days now and nobody want to help me.
    For the moment I desperate front of a lost of all my family pictures, movies, backups devices etc…near 4To of life here.
    I opened a ticket with Sonology support, but as my NAs is out of warranty, they don’t want to help me.
    I seen others poeple into the forum Syno that they had receive a remote help & the technician gibed a mount in read only for a SHR1 BTFRS volume crashed, this to give the way to copy the data to another place.
    Do you mind you will able to help me ?

    1. Dear Pascal,

      I’m so sorry to hear this. To help you properly, I need to know a little bit more than this. At the same time I would like to give you some advice that maybe much more better for you.

      First off, your data is probably still there, but it is probably just not readable. I don’t know what happened, but there is a big chance that some sectors just became corrupt, which prevent the Operating System of Synology to boot up. Now the first and best tip I can give that prevent further damage, is to take your Time!!!
      If you trying to fix it in a haste, you’ll probably end up damaging it even further. I know it is difficult, but try to stay calm on the thought that your data is probably still there, but you just can’t access it currently.

      Now are a few paths that you can follow in this scenario.
      Path 1: you try to fix the Synology, and try to access the files after you fixed the Synology.
      Path 2: If possible, and you can access the synology with ssh and still use quite some command, you could try to copy the files over to another device. Probably the method which you reffered to earlier. And if possible, your best, quickest and safest option
      Path 3: you try to access the files with some external Recovery Tools and make a RAW image which you then use to modify.

      Path 1: Is probably the most quickest way, but can also be one of the most harmful ones. If you don’t know what you are doing, stop making adjustments. There were a lot of times in my life that I had to repair something, which in that moment I didn’t had the proper skillset for. You can then choose to take your time and learn that skillset, or you’ll let someone else do it.

      Path 2: If this is possible, it is one the best options for you to secure your data. If it is possible to just simply copy the files to another device, you’re good.

      Path 3: Is probably the most intense one, but it is also the best option for trying to save as much data as possible. It will cost you a lot of storage space on another device, since a RAW image is really big. But it will give you also an option to go back to, in case you fuck something up.

      But in all honesty, I don’t know how difficult it is to make a RAW image of a RAID system. For a single disk, this is really simple. You hook the Hard drive up to a Linux machine, you make sure it is not mounted and read only, and you make a raw image. I even did this with defected disks. Since you’ll probably use RAID something, taking each disk separately and copy it will probably wont work. It may work as a back up for restoring the RAID in the future, but I wouldn’t bet on it or try it if you are not sure.

      So, first stay calm and don’t stress. And think of how important that data is and how secure you feel with troubleshooting yourself. If you don’t feel good enough, you can choose to learn it or just pay someone to fix it. However, never experiment with that personal data if you are still learning. Try to test it somewhere. If IT Is not your thing, then please pay someone to fix it. Since he can have the proper tools (Both Hardware as Software) to do it. To me, personal data means a lot and I’m willing to pay for it or to take a lot of time to fix it. For my parents I try to fix a pc that had a cryptolocker on it. The project is already taking more than a few years. But since almost no one can fix those issues, I’m willing to take my time for that and to learn.

      Hope this advice could help you in any kind of way.
      Let me know how this is working out for you.

      Samir

      1. Thank You Samir for all these advises.
        I made many reshearch and I got now a copy from all the data that I really need to keep.
        The next step now it’s to try to fix the volume crashed state into my synology.
        My syno can still boot as he got two volumes.
        1 SSD basic 1disk ext4 volume where is installed all my applications and docker.
        and the second volume btrfs raid SHR1 4disk where all my data are, show good state for the raid and smart disk but volume stay crashed into my storage manager and shared folders are not show.
        I went into root SSH and
        use “vgchange -ay” and “mount -o recovery,ro /dev/vg1000/lv /volume2” to mount it in read only mode and did a “rsync” to take a copy to another NAS.
        Now I want to try to resolve the crashed volume situation into DSM.
        Do you think you can help me with this ?

        1. I can’t promise you anything but I will see and try to help you as good as possible.
          Send me an e-mail at:
          info@vsam.pro

          And make a clear description of the status it is currently in, the errors that it shows, and please send me as much screenshots as possible. The more info, the better I can help.

          Best Regards,
          Samir

        2. I’m trying now to fix the btrfs as the dirty stop of my system just corrupt the fs superblock and I got apparently few block failed into the parent transit.

          here is what I just tried:
          btrfs rescue super-recover /dev/vg1000/lv
          nosuccess!

          btrfs rescue zero-log /dev/vg1000/lv
          nosuccess!

          btrfs rescue chunk-recover /dev/vg1000/lv
          this take a while! and still in action…cross fingers.

          1. For other readers,
            Happy to confirm that Pascal was able to recover all of its data.

  4. Hi
    My volume suddenly crashed, but storage pool was ok. All drives OK. Volume was in read only mode, and I copied importat files before anything.
    I did your comands, all ok, but in the mount one it said already mounted, and couldn’t umount because said it was in use.
    I was so scared to do anything, because I have 2 disk fault tolerance and out of nowere volume crashed.
    Now I did reboot and then all was fine, volume green, all working properly?!
    What was the reason then? I don’t want to this happen when I need it the most, so I want to find the cause. could it be bad ram? controller?
    in dmesg I found many of this logs:
    BTRFS warning (device dm-0): csum failed ino 2810 off 36864 csum 2795832003 expected csum 0
    should I stop using this box?
    Is BTRFS safe to use or will it cause more of this in the future?
    thanks and sorry for the anger.

    1. I sorry, I’ve missed this comment completely.

      For all other readers, just a simple tip. If you already experienced some instability or downtime issues, don’t be lazy or wait off your chances to hope it keeps stable. Immediatly make a back-up of all your data, which is actually an advice you should always do. The fact that you have some redundancy is nice, but it doesn’t mean you are safe for all the disasters. If a powercut happends, a crypto locker goes into your synology, or several disks go broke at the same time (which happens often because most of the time all disks are bought from the same vendor at the same time) you can lose everything. Especially if it is the place where you save everything of your personal stuff.

      Hope I could help others with this comment and hope you all recover your data

  5. Thanks a lot Samir for your work and sharing. Unfortunately my DSM volume crashed last week-end. I have followed your post and my volume is back online. You saved my day & my photos.

    My wife thanks you too 🙂

    1. Awesome to hear, really happy for you and your wife. Wish you the best.

  6. HEllo Samir,
    Thank you for this tutorial. I have followed it until : this part :

    root@dsm01:/# cat /etc/fstab
    none /proc proc defaults 0 0
    /dev/root / ext4 defaults 1 1
    root@dsm01:/# fsck.ext4 /dev/vg1/volume_1
    e2fsck 1.42.6 (21-Sep-2012)
    ext2fs_open2: Bad magic number in super-block
    fsck.ext4: Superblock invalid, trying backup blocks…
    fsck.ext4: Bad magic number in super-block while trying to open /dev/vg1/volume_1
    The superblock could not be read or does not describe a correct ext2
    filesystem. If the device is valid and it really contains an ext2
    filesystem (and not swap or ufs or something else), then the superblock
    is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193

    what should I do more ?

    Any hint it’s greatly appreciated !
    Daniel

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.