Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Threat Intelligence

3/13/2019
02:30 PM
Rosaria Silipo
Rosaria Silipo
Commentary
Connect Directly
Twitter
LinkedIn
RSS
E-Mail vvv
50%
50%

IoT Anomaly Detection 101: Data Science to Predict the Unexpected

Yes! You can predict the chance of a mechanical failure or security breach before it happens. Part one of a two-part series.

Data science and artificial intelligence (AI) techniques have been applied successfully for a number of years to predict or detect all kinds of events in very different domains, including:

If you run a quick web search on "machine learning use cases," you will find pages and pages of links to documents describing machine learning (ML) algorithms to detect or predict some kind of event group in some kind of data domain.

Generally, the key to a successful machine learning-based application is a sufficiently general training set. The ML model, during training, should have a sufficient number of available examples to learn about each event group. This is one of the key points to any data science project: the availability of a sufficiently large number of event examples to train the algorithm.

Applying Machine Learning to IoT Event Prediction
Can security teams apply a machine learning algorithm to predict or recognize deterioration of mechanical pieces, or to detect cybersecurity breaches? The answer is, yes! Data science techniques have already been successfully utilized in the field of IoT and cybersecurity. For example, a classic usage of machine learning in IoT is demand prediction. How many customers will visit the restaurant this evening? How many cartons of milk will be sold? How much energy will be consumed tomorrow? Knowing the numbers in advance allows for better planning.

Healthcare is another very common usage of data science in IoT. There are many sports fitness applications and devices to monitor our vital signs, making available an abundance of data available in near real time that can be studied and used to assess a person's health condition.

Another common case study in IoT is predictive maintenance. The capability to predict if and when a mechanical piece will need maintenance leads to an optimum maintenance schedule and extends the lifespan of the machinery until its last breath. Considering that many machinery pieces are quite sophisticated and expensive, this is not a small advantage. This approach works well if a data set is available — and even better if the data set has been labeled. Labeled data means that each vector of numbers describing an event has been preassigned to a given class of events.

Anomaly Discovery: Looking for the Unexpected
A special branch of data science, however, is dedicated to discovering anomalies. What is an anomaly? An anomaly is an extremely rare episode, hard to assign to a specific class, and hard to predict. It is an unexpected event, unclassifiable with current knowledge. It's one of the hardest use cases to crack in data science because:

  • The current knowledge is not enough to define a class.
  • More often than not, no examples are available in the data to describe the anomaly.

So, the problem of anomaly detection can be easily summarized as looking for an unexpected, abnormal event of which we know nothing and of which we have no data examples. As hopeless as this may seem, it is not an uncommon use case.

  • Fraudulent transactions, for example, rarely happen and often occur in an unexpected modality.
  • Expensive mechanical pieces in IoT will break at some point without much indication on how they will break.
  • A new arrhythmic heart beat with an unrecognizable shape sometimes shows up in ECG tracks.
  • A cybersecurity threat might appear and not be easily recognized because it has never been seen before.

In these cases, the classic data science approach, based on a set of labeled data examples, cannot be applied. The solution to this problem is a twist on the usual algorithm learning from examples.

Anomaly Detection in IoT


Anomaly detection problems do not offer a classic training set with labeled examples for classes: a signal from a normally functioning system and a signal from a system with an analogy. In this case, we can only train a machine learning model on a training set with 'normal' examples and use a distance measure between the original signal and the predicted signal to trigger an anomaly alarm.
Anomaly detection problems do not offer a classic training set with labeled examples for classes: a signal from a normally functioning system and a signal from a system with an analogy. In this case, we can only train a machine learning model on a training set with "normal" examples and use a distance measure between the original signal and the predicted signal to trigger an anomaly alarm.


In IoT data, signal time series are produced by sensors strategically located on or around a mechanical component. A time series is the sequence of values of a variable over time. In this case, the variable describes a mechanical property of the object, and it is measured via one or more sensors.

Usually, the mechanical piece is working correctly. As a consequence, we have tons of examples for the piece working in normal conditions and close to zero examples for the piece failure. This is especially true if the piece plays a critical role in a mechanical chain because it is usually retired before any failure happens and compromises the whole machinery.

In IoT, a critical problem is to predict the chance of a mechanical failure before it actually happens. In this way, we can use the mechanical piece throughout its entire life cycle without endangering the other pieces in the mechanical chain. This task of predicting possible signs of mechanical failure is called anomaly detection in predictive maintenance.

Related Content:

 

 

Join Dark Reading LIVE for two cybersecurity summits at Interop 2019. Learn from the industry's most knowledgeable IT security experts. Check out the Interop agenda here.

Rosaria Silipo, Ph.D., principal data scientist at KNIME, is the author of 50+ technical publications, including her most recent book "Practicing Data Science: A Collection of Case Studies". She holds a doctorate degree in bio-engineering and has spent more than 25 years ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Oldest First  |  Newest First  |  Threaded View
prawalikasiri
50%
50%
prawalikasiri,
User Rank: Apprentice
5/1/2019 | 6:08:22 AM
Data Science Training In Hyderabad
very useful information about data science. 
Microsoft Patches Wormable RCE Vulns in Remote Desktop Services
Kelly Sheridan, Staff Editor, Dark Reading,  8/13/2019
The Mainframe Is Seeing a Resurgence. Is Security Keeping Pace?
Ray Overby, Co-Founder & President at Key Resources, Inc.,  8/15/2019
GitHub Named in Capital One Breach Lawsuit
Dark Reading Staff 8/14/2019
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Current Issue
7 Threats & Disruptive Forces Changing the Face of Cybersecurity
This Dark Reading Tech Digest gives an in-depth look at the biggest emerging threats and disruptive forces that are changing the face of cybersecurity today.
Flash Poll
The State of IT Operations and Cybersecurity Operations
The State of IT Operations and Cybersecurity Operations
Your enterprise's cyber risk may depend upon the relationship between the IT team and the security team. Heres some insight on what's working and what isn't in the data center.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-15132
PUBLISHED: 2019-08-17
Zabbix through 4.4.0alpha1 allows User Enumeration. With login requests, it is possible to enumerate application usernames based on the variability of server responses (e.g., the "Login name or password is incorrect" and "No permissions for system access" messages, or just blocki...
CVE-2019-15133
PUBLISHED: 2019-08-17
In GIFLIB before 2019-02-16, a malformed GIF file triggers a divide-by-zero exception in the decoder function DGifSlurp in dgif_lib.c if the height field of the ImageSize data structure is equal to zero.
CVE-2019-15134
PUBLISHED: 2019-08-17
RIOT through 2019.07 contains a memory leak in the TCP implementation (gnrc_tcp), allowing an attacker to consume all memory available for network packets and thus effectively stopping all network threads from working. This is related to _receive in sys/net/gnrc/transport_layer/tcp/gnrc_tcp_eventloo...
CVE-2019-14937
PUBLISHED: 2019-08-17
REDCap before 9.3.0 allows time-based SQL injection in the edit calendar event via the cal_id parameter, such as cal_id=55 and sleep(3) to Calendar/calendar_popup_ajax.php. The attacker can obtain a user's login sessionid from the database, and then re-login into REDCap to compromise all data.
CVE-2019-13069
PUBLISHED: 2019-08-17
extenua SilverSHielD 6.x fails to secure its ProgramData folder, leading to a Local Privilege Escalation to SYSTEM. The attacker must replace SilverShield.config.sqlite with a version containing an additional user account, and then use SSH and port forwarding to reach a 127.0.0.1 service.