Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Analytics

10/7/2015
02:15 PM
Stephen Newman
Stephen Newman
Commentary
Connect Directly
Twitter
LinkedIn
RSS
E-Mail vvv
100%
0%

Intro To Machine Learning & Cybersecurity: 5 Key Steps

Software-based machine learning attempts to emulate the same process that the brain uses. Here's how.

The technology industry loves throwing around the term machine learning (ML). It’s used in a variety of contexts, from technology providers claiming to have "invented the math" behind machine learning, to others applying it to less than scientific outcomes. This doesn’t help the fact that the science of ML as it applies to cybersecurity is probably one of the most complex and least understood topics today. To bring some clarity to the topic, let’s walk through five key steps you’ll need to take to develop and operationalize a true ML system capable of predicting an outcome based on the data it trains on.

What is machine learning?
According to WhatIs.com, “Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.” Rather than following static program instructions, ML systems use algorithms to build a model from example inputs in order to make data-driven predictions or decisions.

The human brain is, perhaps, the best example of a machine learning system. Your brain rapidly collects information and automatically processes it. Without you being aware, it corroborates independent data points. Then, based on past experiences, your brain contrasts known and unknown information. From there, you reach a conclusion that produces an emotion, decision, or course of action.

For example, imagine that you’re taking a walk downtown in a city you’ve never visited before. You turn onto a new block and take in your surroundings. Immediately, your brain gets to work determining whether or not it is safe to walk down the street. It makes note of the time of day – it’s mid-day and the sun is out. There are other people casually walking around. The sidewalks are clean, and the storefronts are well maintained. Your brain processes all of this information and concludes that you can safely walk down the street. You’re not necessarily aware of this conclusion, but your subconscious has taken in the information to make the decision.

As you turn down another block, your brain takes in new information. It’s still light out, but the surroundings look drab and gray. None of the businesses are open, and the area is deserted. In a split second your brain compares this information to what you already know and perceives potential danger, resulting in the decision to turn back.

The 5-step ML process
Software-based machine learning attempts to emulate the same process that the brain uses. At a high level, the process consists of five steps:

Table 1:
Step Brain Software-based ML
1. Define the problem Your subconscious wants to determine whether it is safe to walk down the street. Engineers determine what problem theyre trying to solve. Example: Can we determine if a domain is malicious or not.
2. Harvest data. Your brain gathers data about the environment. Engineers harvest data into a data store (e.g., Hadoop cluster) for analysis. Data is unbiased and unfiltered more training data leads to better models.
3. Create features. Your brain describes differences in information: bright vs. drab, populated vs. deserted, etc. Data scientists and threat researchers identify features that describe differences between classes or categories of data (e.g. benign or malicious).
4. Build and validate models. Your brain assigns weights to the various factors, like the presence of other people on the street or whether the shops are open or closed. It then determines what the combination of these factors means. Data scientists build ML systems that take in multiple features extracted from real world activity to form a statistical model that describes the situation.
5. Operationalize. Your brain makes a decision about whether to proceed or turn around. These decisions form new models for future decisions. The statistical models allow decisions to be made. The ML system is deployed where it is continually observes new data that are used to retrain the model. It becomes smarter and better at predictive analysis over time.

Using machine learning for cybersecurity
ML is actively being used today to solve advanced threat problems like identifying infected machines on the corporate network. For example, a system can watch the traffic going to and from connected devices. While some of the outbound traffic goes to potentially malicious websites, this one piece of evidence doesn’t prove that the user device is infected. This data must be viewed and weighed in context with other evidence.

A laptop on the network is communicating with a website, maybebad.com. The site isn’t necessarily malicious, but it isn’t benign, either. An ML system understands domain reputations and assigns a value of "grey" to maybebad.com’s reputation. As the system studies the communications between the user’s laptop and the website, patterns arise. Within three hours the laptop visits the website on average every 19 minutes with a standard deviation of 38 seconds. Using ML classifiers, we can determine if this behavior is statistically more likely to be initiated by malware or the user. It turns out that it is more likely malware-driven and that the laptop is talking to a potentially malicious website in an automated fashion.

Examples like this demonstrate how ML can empower threat researchers and security professionals in the fight against advanced persistent threats. Going forward, it will be exciting to see what ML systems can do.

 

Stephen Newman brings over 17 years of technology innovation and leadership to Damballa. He has designed products and product strategies for leading, innovative technologies throughout his career. Since joining Damballa in 2009, his team has successfully built upon the ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
7 Tips for Infosec Pros Considering A Lateral Career Move
Kelly Sheridan, Staff Editor, Dark Reading,  1/21/2020
For Mismanaged SOCs, The Price Is Not Right
Kelly Sheridan, Staff Editor, Dark Reading,  1/22/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Current Issue
IT 2020: A Look Ahead
Are you ready for the critical changes that will occur in 2020? We've compiled editor insights from the best of our network (Dark Reading, Data Center Knowledge, InformationWeek, ITPro Today and Network Computing) to deliver to you a look at the trends, technologies, and threats that are emerging in the coming year. Download it today!
Flash Poll
How Enterprises are Attacking the Cybersecurity Problem
How Enterprises are Attacking the Cybersecurity Problem
Organizations have invested in a sweeping array of security technologies to address challenges associated with the growing number of cybersecurity attacks. However, the complexity involved in managing these technologies is emerging as a major problem. Read this report to find out what your peers biggest security challenges are and the technologies they are using to address them.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2015-3154
PUBLISHED: 2020-01-27
CRLF injection vulnerability in Zend\Mail (Zend_Mail) in Zend Framework before 1.12.12, 2.x before 2.3.8, and 2.4.x before 2.4.1 allows remote attackers to inject arbitrary HTTP headers and conduct HTTP response splitting attacks via CRLF sequences in the header of an email.
CVE-2019-17190
PUBLISHED: 2020-01-27
A Local Privilege Escalation issue was discovered in Avast Secure Browser 76.0.1659.101. The vulnerability is due to an insecure ACL set by the AvastBrowserUpdate.exe (which is running as NT AUTHORITY\SYSTEM) when AvastSecureBrowser.exe checks for new updates. When the update check is triggered, the...
CVE-2014-8161
PUBLISHED: 2020-01-27
PostgreSQL before 9.0.19, 9.1.x before 9.1.15, 9.2.x before 9.2.10, 9.3.x before 9.3.6, and 9.4.x before 9.4.1 allows remote authenticated users to obtain sensitive column values by triggering constraint violation and then reading the error message.
CVE-2014-9481
PUBLISHED: 2020-01-27
The Scribunto extension for MediaWiki allows remote attackers to obtain the rollback token and possibly other sensitive information via a crafted module, related to unstripping special page HTML.
CVE-2015-0241
PUBLISHED: 2020-01-27
The to_char function in PostgreSQL before 9.0.19, 9.1.x before 9.1.15, 9.2.x before 9.2.10, 9.3.x before 9.3.6, and 9.4.x before 9.4.1 allows remote authenticated users to cause a denial of service (crash) or possibly execute arbitrary code via a (1) large number of digits when processing a numeric ...