PRTG – Veritas Backup Exec monitoring

This was originally posted by myself here:

We monitor backups a little bit more advanced – I thought I should share this knowledge as well…

Single Job Monitoring: Monitor a single job and it’s results – allows you configure the Job-Name with a PRTG filter value in the SQL sensor. The results will include various values – most notable are:

  • FinalJobStatus (as text)
  • TotalDataSizeBytes
  • TotalNumberOfDirectories
  • TotalNumberOfFiles
  • TotalRateMBMin

In order to implement this sensor, add a SQLv2 Sensor and configure like this:

  • Database: BEDB
  • Instance Name: BKUPEXEC (in most cases)
  • Use input parameter: specify the exact job name
  • DBNull = Error

Channels: Most channels are rather simple to configure, they are counters, SpeedDisk or BytesDisk – as PRTG has those channel types integrated already. The special channel is FINALJOBSTATUS – in order to have this working you will need the “backupexec.jobstatus.ovl” file in your %programfilesx86%\PRTG…\Lookups directory – see below for the file.

SQL Script for single job monitoring:

The backupexec.jobstatus.ovl file:

Another SQL script we use is the one below – this actually approaches the whole monitoring more in an overview – it still depends on the JobHistory table, meaning, the job must have been running.. in theory you could work around this and actually get information on the scheduler etc.. the script below is a pure example.

Finally I wanted to mention what are our real challenges are, and we don’t yet have a really good solution: Our backup runs FULL starting Friday evening… during the week we run incremental backups. Now the incremental backups are not as critical… so let’s focus on the weekends.

What happened every now and then was that e.g. only a few tapes where write able and other might still have been locked or one of our libraries jammed etc..

In the end, it means – we e.g. came in Monday morning and discovered that 50+ % of the backups did not run.

Now, the question is, how do you monitor this. There are about a 150 jobs – they are stacked on each other. In theory I expect let say 5 running jobs, 0 completed and a 145 pending – starting Friday night – over the weekend this number will change constantly.

What I did not yet find is that a good solution that when Backup Exec waits for user interaction like insert tapes, offline library, etc. does wait for user interaction.

As well as the fact that I can’t tell PRTG on Friday I expect e.g. 150 jobs pending, on Saturday 1 PM the number should be more like 75 jobs pending and on Sunday 6 AM is should be down to 50 pending and Sunday 8 PM it should be 0 pending and 150 successful.

This is very granular, making it hard to find a solution. The jobs in our case will not finish – they are within their weekend time-window and will not be auto cancelled and therefor only manually looking in to Backup Exec will tell you if we are making progress or not.

It could be a solution to constantly see if the Total Bytes backed up goes up – but this again is challenging, we would need to compare values over time.. PRTG is as far as I know not directly able to do so and this would mean we would need to have a temp file with values form the last check in some kind of script or database that we would compare too…

So far I did not come up with the ultimate solution – every now and then I think about it a little more.. but well, I am not there yet.