veritas

Solution for VSS exceptions with VMware guests / VM tools

Solution for VSS exceptions with VMware guests / VM tools

Today’s blog is about “Solution for VSS exceptions with VMware guests / VM tools” and was initially posted by myself here: https://vox.veritas.com/t5/Backup-Exec/Solution-for-VSS-exceptions-with-VMware-guests-VM-tools/td-p/829072 – it actually became a KB entry for an as of today older version of Veritas Backup Exec – but I did not want to leave it out of my blog.. Here is the link to the KB article: https://www.veritas.com/docs/000009506

Here is the article I wrote – starting with a description of the actual issue:

We have been experiencing VSS issues with VMware guests in regards to the installed Backup Exec Agent and a previously installed VMware Tools VSS option.

Uninstalling the VMware Tools VSS option in various ways including restarts did not fix our issues. If you search the internet for solutions, you find many attempts but no real solution or explanation.

One of our admins spend several hours with the Veritas support without a solution, he was about to escalate the issue with them, when we found the root cause and could actually fix our issues.

First the steps to solve this:

  1. Uninstall the VMware Tools VSS option (no restart will be requiered)
  2. Make sure VMware VSS service was deleted
    1. If this is not the case, you might need to do so manually and remove additional DLL’s etc. as well as restart the system, but this is independent from this solution
  3. You might have already done steps 1 and 2 but you still get VSS exceptions from the backup that says you have more than one VSS agent installed:
    1. V-79-8192-38331 – The backup has detected that both the VMware VSS Provider and the Symantec VSS Provider have been installed on the virtual machine ‘hostname’. However, only one VSS Provider can be used on a virutal machine. You must uninstall the VMware VSS Provider.
    2. Now you wonder what causes this and you get stuck
    3. You could uninstall the Veritas/Symantec Backup Exec Agent and only back the system up per VMDK
    4. You would lose the GRT / granular backup / restore capabilities
  4. Check your registry for the following reg key
    1. HKLM\System\CurrentControlSet\Services\BeVssProviderConflict
    2. If this key exists, but your VMware VSS provider is uninstalled, you need to follow up with step 5
  5. Open a notepad as administrator
  6. Open this file in the notepad
    1. C:\ProgramData\VMware\VMware Tools\manifest.txt
  7. Search for the following two entries:
    1. Vcbprovider_2003.installed
    2. vcbprovider.installed
  8. Make sure both of them are set to FALSE, most likely one of them is TRUE
  9. Run a test backup
    1. This test backup now should not show the exception anymore
    2. The registry key should vanish (refresh/press F5) without you taking action

So what happened?

You uninstalled the VMware Tools VSS provider, but this manifest file did not get updated. We actually could see that it sometimes does get updated and sometimes does not. This seems to be some kind of issue with the VMware Tools uninstalled/installer.

But why this manifest.txt file?

As we found out, there scripts that get executed by Symantec/Veritas Backup Exec before the backup. You might find them in two locations, and it seems to depend a bit on the Windows version which script is executed (at which location). You could edit them both and just undo the checks in the scripts, but this wouldn’t be correct. It is more correct to update the manifest.txt file. If you want to, you can check the date/time of the manifest.txt file before you change it – you might see it was not updated while you uninstalled the VMware Tools VSS provider (assuming you did only do this and not do additional installs/uninstalls within the VMware Tools / please note as well that this only is true when you still experienced those issues).

Now, back to those scripts, you find them here:

  • C:\Windows
  • C:\Program Files\Symantec\Backup Exec\RAWS\VSS Provider

The name of the script that matters:

  • pre-freeze-script.bat

This script checks several DLLs, registry entries, paths and on Windows 2008 and newer the ProgramData-Path for this specific manifest.txt file and the two entries mentioned.

Once you uninstall the VMware VSS provider, and the file did not get updated, you might see this issue and wonder how to solve it. The solution is to simply update it to mirror the uninstallation of the VMware VSS provider (vcbprovider). We double checked this with several installations and could see if the file actually gets updated, those two values are set to FALSE, if it doesn’t, at least one of the values remains true, what causes the pre-freeze-script.bat to write the registry key mentioned earlier and therefor causing the issue in the backup – exceptions.

If you still have the same issues after updating the manifest.txt, simply check all the DLL’s that are mentioned in the script and make sure they don’t exist. You might also consider to manually delete the registry-key (it seems to be just a dumy-key) to make sure there is no issue that prevents the script from deleting it. Make sure it does not re-appear after a backup! Otherwise you might still have some DLLs left on your system that cause the script to re-create the registry key.

Hope this helps a few of you out there. This was an ongoing issue for a while and I came accross those issues many times ever since Windows 2008. This applies to Windows 2008 R2, Windows 2012, Windows 2012 R2 and pretty sure to Windows 2016 as well.

It helped us getting rid of those issues completely and actually not even needing a single restart of the guest VM to solve the issues (removing the VMware VSS provide did not need a restart).

PRTG – Veritas Backup Exec monitoring

This was originally posted by myself here: https://kb.paessler.com/en/topic/58233-symantec-backupexec-monitoring#reply-262024

We monitor backups a little bit more advanced – I thought I should share this knowledge as well…

Single Job Monitoring: Monitor a single job and it’s results – allows you configure the Job-Name with a PRTG filter value in the SQL sensor. The results will include various values – most notable are:

  • FinalJobStatus (as text)
  • TotalDataSizeBytes
  • TotalNumberOfDirectories
  • TotalNumberOfFiles
  • TotalRateMBMin

In order to implement this sensor, add a SQLv2 Sensor and configure like this:

  • Database: BEDB
  • Instance Name: BKUPEXEC (in most cases)
  • Use input parameter: specify the exact job name
  • DBNull = Error

Channels: Most channels are rather simple to configure, they are counters, SpeedDisk or BytesDisk – as PRTG has those channel types integrated already. The special channel is FINALJOBSTATUS – in order to have this working you will need the “backupexec.jobstatus.ovl” file in your %programfilesx86%\PRTG…\Lookups directory – see below for the file.

SQL Script for single job monitoring:

The backupexec.jobstatus.ovl file:

Another SQL script we use is the one below – this actually approaches the whole monitoring more in an overview – it still depends on the JobHistory table, meaning, the job must have been running.. in theory you could work around this and actually get information on the scheduler etc.. the script below is a pure example.

Finally I wanted to mention what are our real challenges are, and we don’t yet have a really good solution: Our backup runs FULL starting Friday evening… during the week we run incremental backups. Now the incremental backups are not as critical… so let’s focus on the weekends.

What happened every now and then was that e.g. only a few tapes where write able and other might still have been locked or one of our libraries jammed etc..

In the end, it means – we e.g. came in Monday morning and discovered that 50+ % of the backups did not run.

Now, the question is, how do you monitor this. There are about a 150 jobs – they are stacked on each other. In theory I expect let say 5 running jobs, 0 completed and a 145 pending – starting Friday night – over the weekend this number will change constantly.

What I did not yet find is that a good solution that when Backup Exec waits for user interaction like insert tapes, offline library, etc. does wait for user interaction.

As well as the fact that I can’t tell PRTG on Friday I expect e.g. 150 jobs pending, on Saturday 1 PM the number should be more like 75 jobs pending and on Sunday 6 AM is should be down to 50 pending and Sunday 8 PM it should be 0 pending and 150 successful.

This is very granular, making it hard to find a solution. The jobs in our case will not finish – they are within their weekend time-window and will not be auto cancelled and therefor only manually looking in to Backup Exec will tell you if we are making progress or not.

It could be a solution to constantly see if the Total Bytes backed up goes up – but this again is challenging, we would need to compare values over time.. PRTG is as far as I know not directly able to do so and this would mean we would need to have a temp file with values form the last check in some kind of script or database that we would compare too…

So far I did not come up with the ultimate solution – every now and then I think about it a little more.. but well, I am not there yet.