Backup Monitoring: Part 2—Defining Success

In part one of this blog series, I discussed backup monitoring and the ability to manage by exception (if you haven’t read it yet, you can find it here). In this installment we’ll start to build out custom filters using the Advanced Search query language so you can truly define your own measure of success or failure. If you’re wondering why, both MSPs and IT departments looking to reduce costs and improve efficiency can benefit from the inherent flexibility. It can mean the difference between creating and working a service ticket or not—saving both time and resources. So, launch your browser, log into https://backup.management, and follow along as we dive in.

Advanced search

We’ve previously covered some basic backup console navigation including interactive doughnut widgets, built-in dashboard views, turning on filters, adding columns, saving views, and emailing views. Now, we’re going to expand on this by looking at how you can use the Advanced Search query language to build even more specific filters based on statistics—like date, time, duration, size, error count, etc. This makes it possible to review performance, usage, deduplication ratios, and under-protected devices. This and future blogs will walk you through creating basic and advanced search queries you can string together later to get a view of the exact devices you want to manage.

Building advanced search queries

To enable Advanced Search, click the right drop-down arrow in the Normal Search bar (it’s situated to the left of the Select Filter dialog). Then select Advanced Search. You can easily build Advanced Search expressions one at a time by selecting and editing a filter and then toggling between Normal Search and Advanced Search to see how the filter expression is constructed.

You can find reference documentation on the Advanced Search filter expressions here. It includes the short code and data type for the columns in the management console. This will be useful to you later as we build our first expressions. It will give you a sneak peek at the type of data you can work with. You can also reference the Columns drop-down to search for a column name and see the short code.

You can interact with individual expressions, filter them by time and size, group them together in parentheses, or separate them with AND/OR conditions to fine tune your results. You can access reference documentation on the Advanced Search filter syntax here.

Last successful session time

Filtering by the Last Successful Session time of a data source or device lets you know if you’re operating within the SLAs you established with your customers. It also lets you set priority for resolving issues. For example, a device that hasn’t backed up in the past 24 hours may not be as critical as a device that hasn’t backed up in more than 48 hours. While these default time-based filters may be good enough thresholds during the work week, they can generate false positives when it comes to devices that are purposely offline over holidays, long weekends, or employee vacations. Let’s start to build out some basic time-based expressions.

 

Expression Description
TL == 0 (Total last successful session Equal to Never)
TL < 72.hours().ago() (Total last successful session Prior to Last 72 hours)
TL < 4.days().ago() (Total last successful session Prior to Last 4 days)
TL > 2.weeks().ago() (Total last successful session During the Last 2 weeks)
TL == 0 OR TL < 48.hours().ago() (Total last successful session Equal to Never OR Prior to Last 48 hours)
XL < 3.days().ago()  (Total last successful Exchange session Prior to Last 3 days)
RFL > 24.hours().ago() (Restore Total last successful File and folders session During the Last 24 hours)
CD < 2.weeks().ago() AND TL == 0 (Creation date Prior to Last 2 weeks AND Total last successful session time Equal to Never)
TS > 1.days().ago() (Timestamp During the Last 1 days)
TS < 2.days().ago() (Timestamp Prior to Last 2 days)
TS > 36.hours().since(TL) AND TS < 2.days().ago() (Timestamp More than 36 hours Since Total last successful session AND Timestamp Prior to Last 2 days)

 

In addition to Total Last Successful Session backup time (TL), you also have individual expressions for each of the supported backup and restore data sources. This way you can also filter by Exchange Last Successful Session backup time (XL) or Files and Folders Last Successful Session Restore time (RFL), by swapping out the appropriate short name expression for the desired data source.

Another useful time-based expression is Creation Date (CD) which you can use to determine the amount of time since initial deployment. One possible use case is ignoring warnings from new deployed devices until they’re older than two weeks without a successful backup to the cloud.   Next, Timestamp (TS) provides visibility into when the backup client last checked in with the cloud service, allowing you to monitor backup success in relation to the device being on and connected to the internet.

Note, that while it may seem like the < > symbols are incorrect in some of these examples, they’re not. For expressions using the .ago() function, we count backwards from the current time to set a reference and then compare the number of elapsed seconds since January 1, 1970 (Epoch time) to see which happened first. In the second example above we are only viewing devices where the (Last successful session) time stamp is (a smaller number of elapsed seconds) Prior to 72 hours before the current time.

Total status

While SolarWinds® Backup considers a status of (Completed with errors) as a successful backup job, you and your end customers may not. It often depends on the type and importance of the backup device—for example, is it a workstation or server, is it one error or 500, is it a temp file or a SQL database, is it an old file server or a CEO’s laptop, etc.? All of these may make you want to look a little closer as that Backup status column and start to build expressions that help you differentiate.

 

Expression Description
T0 == 0 (Total status Equal to No backup)
T0 == 1 (Total status Equal to In process)
T0 == 5 OR T0 == 8 (Total status Equal to Completed OR Total status Equal to Completed with errors)
T0 == 11 (Total status Equal to No selection)
T0 != 1 AND T0 != 5 AND T0 != 8 (Total status Not Equal to In process AND Completed AND Completed with errors)
T0 == 2 OR T0 == 3 OR T0 == 6 OR T0 == 7 OR T0 == 13 OR T0 == 10 OR T0 == 12 OR T0 == 9 (Total Status Equal to Failed OR Aborted OR Interrupted OR Not started OR Blocked OR Over quota OR Restarted OR In process with fault)
S0 == 2 (System state status Equal to Failed)
H0 == 8 OR H0 == 2 (Hyper-V status Equal to Failed or Completed with errors)
RH0 == 8 OR RH0 == 2 (Restore Hyper-V status Equal to Failed or Completed with errors)
T7 == 1 (Total errors Equal to 1)
T7 >= 100 (Total errors Greater than or Equal to 50) 
H7 <= 5 (Hyper-V errors Less than or Equal to 5)
G7 > 10 (Office 365 Exchange errors Greater than 10)
RN7 > 2 (Restore Network shares errors Greater than 2)
OT == 1 (OS type Equals Workstation)
OT == 2 (OS type Equals Server)
OT == 1 AND TS < 9.days().ago() (OS type Equals Workstation AND Timestamp Prior to Last 9 days)
OT == 2 AND (TS < 1.days().ago() OR TL < 2.days().ago())  (OS type Equals Server AND (Timestamp Prior to Last 1 day OR Total last successful session Prior to Last 2 days)

 

In addition to Total Status for backup (T0), you also have individual expressions for each of the supported backup and restore data sources. This way you can also filter by MS SQL Status for last backup (S0), or Hyper-V status for last backup (H0), or restore (RH0), by swapping out the appropriate short name expression for the desired data source.

If you want to dig deeper and break down the Total status Equal to Failed (T0 == 2) and Total status Equal to Completed with errors (T0 == 8) results, you can use the Total Errors (T7) expression to group by an exact count, greater than or less than a specific value. Just like the prior examples, there are also expressions for filtering backup and restore errors for each of the individual data sources.

Earlier I began to hint that Operating System type (OT) can also play a factor in prioritization and that errors and failures that occur on a Desktop OS (OT == 1) might get resolved at a lower priority than those on a Server OS. For example, a laptop or desktop that is offline with no backups for two or three days over the weekend may not justify a support ticket. When you start to think about user holidays and vacations, you might even be able to justify waiting until it’s been offline for more than nine days. A Server OS, however, probably shouldn’t be offline more than 24 hours and shouldn’t go a weekend without a successful backup.

Wrapping up

Assuming you followed along in the console, you probably want to combine or tweak the values on a few of the useful expressions that we’ve built, and then copy and paste them into the Advanced Search. After each paste, click Save View and name that view prior to clicking Save as New. Once you’re done, check to make sure your new views are visible in the drop-down lists on the left side Dashboard views and right side Save views.

If I’ve piqued your interest and you have further questions, please stay tuned for the next blog post where we’ll continue to define and construct these advanced filters. Alternatively, feel free to join me during an upcoming Backup Office Hours session or reach out to me on Twitter.

 

Eric Harless is the Head Backup Nerd at SolarWinds MSP. Eric has worked with SolarWinds Backup since 2013 and has over 25+ years of data protection industry experience in sales, support, marketing, systems engineering and product management.

You can follow Eric on Twitter at @backup_nerd

 

Other blogs in this series

Want to stay up to date?

Get the latest MSP tips, tricks, and ideas sent to your inbox each week.

Loading form....

If the form does not load in a few seconds, it is probably because your browser is using Tracking Protection. This is either an Ad Blocker plug-in or your browser is in private mode. Please allow tracking on this page to request a trial.

If this issue persists, please visit our Contact Sales page for local phone numbers.

Note: Firefox users may see a shield icon to the left of the URL in the address bar. Click on this to disable tracking protection for this session/site