Sunday, July 14, 2013

Fix For: Doubletake Server Protection - Start Failed!



If you use Doubletake disaster recovery solution in your environment, this notorious error is something you will see again and again, when you try to create and start a new protection job.
This is what the support guys say about the error.


The first is having .net 4 installed on the server.  You would need 5.2.2 to be able to work effectively with .net 4 as there are some other issues you can encounter.  If you cannot use 5.2.2 just remove .net 4.

The other is a bad route selected in the job workflow.  On the confirmation screen of the workflow you have the option to change configuration on 2-3 settings.  the top one (not sure what the section is called) but it has compression and such in it...  at the bottom you need to make sure that you select an actual IP address on the target to route to.  Sometimes by default the wizard picks 0.0.0.0 as the IP - and well that won’t work.
Doubletake Protection Job- Start Failed!


Well, he is partially correct. This indeed happens because of a bad route. But even after you delete the old job and recreate it again with the correct routing interface the job will still fail to start. After many failed attempts this is how i got it to work.

Start Fresh , If you can. If you have other healthy protection jobs running side by side  then this option will not be viable. In that case you can remove whatever files associated with the failed job instead of all jobs.

1. Delete the failed Job. 

Select the failed job from the Doubletake Recover now -> Monitor Jobs and trash it.
Stop the Double take service in the Source server. This will complete the trash job process.

2. Delete all configuration/log/temp files in the source server

In the source server go to Doubletake installation directory
C:\Program Files\Vision Solutions\Double-Take

Delete all files not related to the actual installation. If you know the installation date of the application, then remove all other files created after that installation date. 

Go to C:\Program Files\Vision Solutions\Double-Take\Service\Data
And delete all files in this directory

3. In the target/repository server, go to Doubletake installation directory 

In the source server go to Doubletake installation directory
C:\Program Files\Vision Solutions\Double-Take

Delete all files not related to the actual installation. If you know the installation date of the application, then remove all other files created after that installation date. If you have other healthy jobs running then, be careful not to delete files associated with those jobs.

Files to remove : <source server name>.db ,  <source server name>.shr , <source server name>.xfp

Go to C:\Program Files\Vision Solutions\Double-Take\Service\Data
And delete <Job*>.dat and  <Job*>.xml file associated with the failed job. If you are starting fresh , then delete all files in this directory

4. Remove the old, mirrored copy of the source server from the repository.

You may need to log in as the local administrator, take the ownership of the whole folder/sub folders and then delete everything.

5. Start the Doubletake service in the Source server

Once you have removed all references to the old job, create the protection job again.
In the last window of the Server protection wizard, provide the actual main IP address of the repository server.

Cross your fingers and press finish to start the job.

This may work in some instances or may not. Job will sit on "Starting Protection...” stage for some time and then fail.

If the job fails again try the steps below.

1. Stop the Doubletake service in the Source server.

2. Go to the repository and browse to the place where the failed job stored its mirrored copy of the failed job.

3. There should be a file called VRASSM.xml in the root of that directory under the source server name folder.

4. Open this file using notepad and remove all interfaces leaving just the network interface you are planning to use. This is the same interface you have selected in the last step of the protection job creation wizard.
 
Remove rouge interfaces from the VRASSM.xml file


5. Save the file and restart the Doubletake service on the target  server and then on the source server.

6. Check Doubletake Recover now console and see if the status of the job has changed from "Communication Error" to "Calculating"

Alternatively you can stop puling you hair trying to get it working and call the superb customer support service line for a consultant to have a look at the mess  you have created and fix it remotely.