Use Case : Anomaly Detection
A pain point of IT administrators is that they are unable to detect an abnormality or root cause quickly due to the amount of time it takes to search through the enormous logs of information collected from device operating systems, applications, and management systems. Once the information has been obtained, it must be analyzed in order to find the root cause of the problem. Because this cannot be performed in a timely manner, this has the potential to cause damage to the business.
With the Netka AIOps Platform, all of these tasks can be performed in order to more easily determine the root cause of a problem. Using Machine Learning technology, the example below shows how an anomaly occurring in an IT system can be discovered from within hundreds of thousands of log and metric records. From the image, we can detect anomaly which consists of three metrics and one log, meaning:
- There is one device with a high CPU utilization reaching 100%, even though this value has not been approached during other periods
- There is one interface in which input traffic suddenly drops
- There is one interface in which output traffic suddenly drops
- One type of log occurs many times. Even though other times have never happened before
Because the platform was able to find four anomalies within hundreds of thousands of records, the administrator could determine the cause of the problem in near real-time, thereby drastically speeding up the time needed to fix the issue.
The platform is able to detect log and metric anomalies from multiple data types, including: CPU utilization, memory utilization, disk usage, traffic utilization, error rate, discard rate, CRC, RTT, packet loss, Quality of Experience value, latency, temperature, humidity, voltage value, electric current value, power factor, contactor relay status, and door close status. Also, no thresholds are required. The platform uses Machine Learning technology to find log and metric deviations over time, and displays the results on the same time axis. This allows the user to easily and quickly determine what is wrong at different times.