Author Topic: How to recover a failed system  (Read 1574 times)

Offline SteveH2

  • Newbie
  • *
  • Posts: 17
  • Karma: +1/-0
How to recover a failed system
« on: April 12, 2016, 07:57:04 am »
So, my Vera edge, running the latest version of the software and with only a dozen or so lighting controllers and relays connected to it just failed for the 2nd time with the Z wave light off.  As a result I can't get into the FE.  The logs showed nothing, to my eyes, significant apart from a reference to a failed Sonos device.  After the first failure I tried the various remedies scattered around this forum but eventually could only recover it by doing a hard reset and using a previous backup.  This required 3 attempts including powering off the box, removing the ethernet cable etc but eventually the system restarted. This recovered most of the devices but 3 Fibaro devices had disappeared and required re-pairing (requiring lifting of floorboards to get at the devices). The second time I found another 'solution' on the forum which consisted of replacing an existing file - 'user_data.json.lzo' with an earlier ( '.5') file.  I have no idea what this file is but the change seemed easily reversible so I went ahead and applied it. The Z Wave light on the controller immediately came on and all was well. Unfortunately the forum posting offered no background on how or why this 'fix' might work.
I am finding the Vera edge controller unreliable and am nervous about adding more devices and complexity if I don't have a rock solid backup and recovery process in place but I am unable to find any consistent documentation on such a process.  Does anyone on this forum know of, or possess, such a thing please?  I am looking for something that clarifies what is being backed up and why, how to do it and then how to efficiently recover the system (without having to re-sync a bunch of devices).       

Offline Z-Waver

  • Master Member
  • *******
  • Posts: 4437
  • Karma: +247/-120
Re: How to recover a failed system
« Reply #1 on: April 12, 2016, 08:49:59 am »
Backups:
When working correctly, Vera backs up every night and uploads a copy of the backup to the MiOS servers. The backup is a .tar.gz file that contains everything of any significance to a Vera, including all configuration files, app files, serial numbers... It also includes a file that is a dump of the Z-Wave chip, within Vera,  that contains all of the Z-Wave network information including House#, devices, and everything else about the Z-Wave network. This file is very important.

When you include or exclude a device, it usually triggers an event that tells Vera to dump the Z-Wave chip data to the dump file, making a backup of the Z-Wave network. This has shown to not always work and results in backups that contain an old dump file, sometimes very old, with incomplete network information. Restoring such a backup will naturally result in missing devices similar to what you saw. Though I cannot say that this was the cause of what you saw, there are several other possibilities.

What I recommend is that whenever you add a new Z-Wave device, you take a manual backup. This saves a copy of the backup to your local computer. It is very important when taking the manual backup to take a backup of the Z-Wave network. This will create a new and recent dump of the Z-Wave chip and then you can have a complete backup of your system at that point in time. Also, this new dump will be included in the automatic nightly backup to the Vera servers.

Restores:
When restoring from backup, the process restores all configuration files on Vera. This includes apps, scenes, everything, EXCEPT the Z-Wave network. In order to restore the dump file to the Z-Wave chip, you must check the box that says to also restore the Z-Wave network.

Performing a full restore, including Z-Wave network, is absolutely complete. You could take a new Vera with different serial numbers and passwords, perform a full restore and when done everything will match to old Vera, including serial numbers passwords everything. The machines will essentially be identical clones of each other.

Having done all this, it is possible to still have Z-Wave devices missing. The devices will exist in the Z-Wave chip and Vera is therefore aware of them, but if Vera can't communicate with them or if the routing has changed/broken, or if they're just flaky, they can disappear. At this point you have to re-include them to get them back. I'd guess that the most common cause of this scenario is poor signal/route quality.

Offline SteveH2

  • Newbie
  • *
  • Posts: 17
  • Karma: +1/-0
Re: How to recover a failed system
« Reply #2 on: April 12, 2016, 02:21:30 pm »
Z-Waver, many thanks, I appreciate you taking the time to write that out.  Just one clarification please -  When I take a manual Z-Wave and then a controller backup I see that the controller backup ("ha-gateway-backup...") ends up in my local download folder (I'm working in Ubuntu) but I don't see where the Z-Wave backup is stored on my local PC? 

Offline RichardTSchaefer

  • Master Member
  • *******
  • Posts: 9990
  • Karma: +755/-141
    • RTS Services Plugins
Re: How to recover a failed system
« Reply #3 on: April 12, 2016, 03:09:49 pm »
The ZWave backup is stored internally in the controller backup.

In fact every controller backup has a ZWave backup.
It's just that the ZWave backup may have been made quite a while ago.

If you have ZWave problems you may need an OLD ZWave backup.

When you create a Zwave backup ...the NEXT controller backup will have the new ZWave backup included in it.

Offline SteveH2

  • Newbie
  • *
  • Posts: 17
  • Karma: +1/-0
Re: How to recover a failed system
« Reply #4 on: April 14, 2016, 06:37:13 am »
Richard -thank you.  If I'm understanding you correctly you are saying that the only way to get a Z-Wave backup is to create one manually after which the nightly backup that the controller automatically takes (or at least it does on my controller) incorporates it into that backup. I think that means that as a matter of process it is good practice to manually take a backup of the Z-Wave network each day so that I know that if the system was working yesterday and I decide to do a restore I am falling back to a known good controller backup and known good Z-Wave backup?  I appreciate that many things might conspire to create a system failure but (and if the system logs are silent) simplistically speaking is that valid?             

Offline Z-Waver

  • Master Member
  • *******
  • Posts: 4437
  • Karma: +247/-120
Re: How to recover a failed system
« Reply #5 on: April 14, 2016, 11:51:24 am »
A backup of the Z-Wave network is supposed triggered when you Include or Exclude a Z-Wave device. Once that backup of the Z-Wave network/chip has occurred and the dump file has been created on Vera, there is no need to do it again, until the next Include/Exclude. The only time meaningful changes are made to the Z-Wave network/chip is when you Include/Exclude a device.

Unfortunately, the triggered backup does not always get triggered or work. For this reason, I recommend manually running a Z-Wave network/chip backup every time you successfully Include/Exclude a Z-Wave device.

There is no reason at all to perform a Z-Wave network backup every day, unless you are Including/Excluding every day.

Offline SteveH2

  • Newbie
  • *
  • Posts: 17
  • Karma: +1/-0
Re: How to recover a failed system
« Reply #6 on: April 14, 2016, 01:06:09 pm »
Ok, makes sense.  Thanks again

Offline jmassination

  • Newbie
  • *
  • Posts: 17
  • Karma: +0/-0
Re: How to recover a failed system
« Reply #7 on: February 08, 2018, 08:29:15 am »
Just to add to this, I know its an old thread but, my veralite stopped working and I ended up having to do a factory reset and restore my backed up file and zwave network (I'm thinkful that they are stored online because I didnt have a backup file!). When i restored, about 30 of my 60-70 devices were missing, and I found this thread, where Z-Waver mentioned that you may have to re-include them back. That would have been a ton of work, and I'm guessing I would have to re-do all my PLEGs and logic to account for the new device numbers/names and what not. Thankfully, over night it fixed itself. I slept on it, and this morning all my devices were back. So perhaps it just had to poll them all or something. Not sure, but they're back. Thought I'd share this in case someone else goes searching with a similar issue and finds this thread.