Monitoring and Automation

What does the industry want in a monitoring tool?
A tool that will monitor everything in the environment, automatically determine the root cause of errors and resolve the issue automatically.

Of course… such a tool does not exist.

So what is the next best thing (that does exist)?
A set of monitoring tools that can be deployed and integrated into an analytics tool that will help identify the root cause. A system that will allow us to pre-determine the best course of action for specific root cause issues and automate the resolution of those issues.

Even though there is no out of the box solution that can make this happen, our team has expertise that will allow us to deploy the right set of tools for your specific environment that will allow this to work.

We can help address many scenarios with a monitoring and automation solution.
Here is an example of how we can reduce the time to resolution and improve user confidence.

My system is down!
— IT Manager

Situation
A down data path between an application and a remote data store.

Current Response
Operations starts executing a response script for this statement that starts with what application are you using? What is not working? What were you doing when it stopped working? Etc… Once operations determines that indeed the application is not populating fields they open a trouble ticket for tier 2 to investigate. Tier 2 investigates the applications logs and determines there is a query error to a remote data store and passes the ticket to Tier 3 support to troubleshoot the issue with the query issue.

Simplified Solution
Active monitoring executes queries to remote data stores to constantly verify they are working. Monitoring application recognizes the remote data store is down then creates a ticket, assigns the ticket to tier 3 support and creates an alert for operations to let them know that there is an issue with a remote data store that may affect specific applications in a very specific way. If Operations gets the same call they can confirm with the end user that they are indeed having this issue and let them know the problem is being addressed.