Anomaly detection covers a large number of data analytics use cases. However, here anomaly detection refers specifically to the detection of unexpected events, be it cardiac episodes, mechanical failures, hacker attacks, or fraudulent transactions.
The unexpected character of the event means that no such examples are available in the data set. Classification solutions generally require a set of examples for all involved classes. So, how do we proceed in a case where no examples are available? It requires a little change in perspective.
In this case, we can only train a machine learning model on nonfailure data; that is, on data that describes the system operating in normal conditions. The evaluation of whether the input data is an anomaly or just a regular operation can only be performed in deployment after the prediction has been made. The idea is that a model trained on normal data can only predict the next normal sample datum. However, if the system is not working in a normal condition anymore, the input data will not describe a correctly working system, and the model prediction will stray from reality. The error between the reality sample and the predicted sample can then tell us something about the underlying system’s condition.
In IoT (Internet of things) data, signal time series are produced by sensors strategically located on or around a mechanical device or component. A time series is the sequence of values of a variable over time. In this case, the variable describes a mechanical property of the device, and it is measured via one or more sensors. Usually, the mechanical device is working correctly. As a consequence, we have tons of samples for the device working in normal conditions and close to zero examples of device failure. Especially if the device plays a critical role in a mechanical chain, it is usually retired before any failure happens and compromises the whole machinery.
Thus, we can only train a machine learning model on a number of time series describing a system working as expected. The model will be able to predict the next sample in the time series, when the system works properly, because this is how it was trained. We then calculate the distance between the predicted sample and the real sample, and from there, we draw the conclusion as to whether everything is working as expected or if there is any reason for concern.