Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Risk

10/6/2015
10:30 AM
Jay Jacobs
Jay Jacobs
Commentary
Connect Directly
Twitter
LinkedIn
RSS
E-Mail vvv
50%
50%

Dont Be Fooled: In Cybersecurity Big Data Is Not The Goal

In other words, the skills to be a security expert do not translate to being able to understand and extract meaning from security data.

With any hyped-up topic, there is a disconnect between what people think can be accomplished and what can actually be accomplished.  The latest example is “Big Data.” There’s something different about big data because when executed properly, modern data analysis can seem like nothing short of magic to outsiders.  It’s this perception of magic that is so attractive to businesses looking to pull the next rabbit out of a hat. 

What’s also unique about “big data” is the way the term is perceived and used in the security industry, which is much different than the approach of  data scientists. This blog is an attempt to bridge the divide and talk about “big data” behind the scenes because when it comes down to it, big data is not a goal, it is logistics, and many of security’s big data problems are being solved by small data solutions.

Big data is just hiding the small data

In many instances, “Big Data Analytics” is an embellishment, a little white lie with good intentions.  In reality, the primary task done directly with big data is figuring out how to turn it into small data. In many cases, big data will be reduced down by counting, comparing or some other aggregation.  Another technique is to pull a small subset or sample from the large data set to get a more manageable data set. Either way, many of the big data jobs are done to produce small data. 

For example, if you are working with log data from thousands of systems, you may want to produce a small data file where each system is reduced to a single line with counts of sources and login statuses (or whatever is being measured).  Don’t be fooled by the use of “small” here, the output may still be in the megabyte or even gigabyte range, which may be just small enough to load into memory on a laptop for analysis. 

If that deflated the mystique of big data for you, don’t worry, there is a small set of big data implementations that will perform analytics at scale, for example, if you need to build a unique and specific model on each application across all of your servers (This is a technique we use to develop security ratings.) While each application may represent small data, doing analysis across all applications represents a challenge.  And finally there is a very small sliver of analytics doing complex computations at scale, but these are rare and chances are your problems are just not that special. 

Big data enables [over]confidence

While much of the big data analysis is being done on small data, that doesn’t mean it’s the same old small data analysis.  Big data is ushering in its own sets of challenges because most of the classic statistics were developed using pencil and paper with just a few dozen samples, perhaps going into the hundreds.  The good news is that most of the techniques are actually improved with more data. Analyses can find ever more subtleties as more data is used. Smaller differences can be discovered and more and more nuanced patterns can be detected.  This is where some of that magic comes from: there can be big gain from small advantages.  However this is not without side effects.  One of the classic measures of statistical significance, the p-value, is often irrelevant on large samples.  Because analysis can uncover smaller and smaller differences in big data, in reverse, smaller differences now become significant.  The effect is that big data can fool those unaware of this effect. 

Big data, small samples

Even though big data still has that new-hype smell about it, underlying that hype are techniques that can be traced back centuries -- and all point to a single constant: good data analysis.  It’s a hard lesson to learn, but good data analysis is a different skill than domain expertise.  In other words, the skills to be a security expert (for example) do not translate to being able to understand and extract meaning from security data.  Neither is good data analysis an intuitive skill; it is not picked up by proximity to data day after day.  It is only learned by intentional study of statistics and related fields.

As an example, in the age of big data where a million data points is labelled as small, a sample of a few hundred and even a few thousand seems meager and perhaps pathetic to the uninitiated.  I’ve seen people dismiss research results because they thought the samples were too small (and they were over a thousand).  We must keep in mind that data analysis is not intuitive and even though we have a lot of data, it may not take a lot of data to provide insight or support a business decision.  So shake off this notion that big data is a goal and get down to the business of learning from data.

Jay Jacobs has over 15 years of experience within IT and information security with a focus on cryptography, risk, and data analysis. Most recently, he has joined BitSight Technologies, the Standard in Security Ratings, as their Senior Data Scientist. Previously, he was a Data ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Mobile Banking Malware Up 50% in First Half of 2019
Kelly Sheridan, Staff Editor, Dark Reading,  1/17/2020
Exploits Released for As-Yet Unpatched Critical Citrix Flaw
Jai Vijayan, Contributing Writer,  1/13/2020
Microsoft to Officially End Support for Windows 7, Server 2008
Kelly Sheridan, Staff Editor, Dark Reading,  1/13/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win a Starbucks Card! Click Here
Latest Comment: This comment is waiting for review by our moderators.
Current Issue
The Year in Security: 2019
This Tech Digest provides a wrap up and overview of the year's top cybersecurity news stories. It was a year of new twists on old threats, with fears of another WannaCry-type worm and of a possible botnet army of Wi-Fi routers. But 2019 also underscored the risk of firmware and trusted security tools harboring dangerous holes that cybercriminals and nation-state hackers could readily abuse. Read more.
Flash Poll
[Just Released] How Enterprises are Attacking the Cybersecurity Problem
[Just Released] How Enterprises are Attacking the Cybersecurity Problem
Organizations have invested in a sweeping array of security technologies to address challenges associated with the growing number of cybersecurity attacks. However, the complexity involved in managing these technologies is emerging as a major problem. Read this report to find out what your peers biggest security challenges are and the technologies they are using to address them.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-7227
PUBLISHED: 2020-01-18
Westermo MRD-315 1.7.3 and 1.7.4 devices have an information disclosure vulnerability that allows an authenticated remote attacker to retrieve the source code of different functions of the web application via requests that lack certain mandatory parameters. This affects ifaces-diag.asp, system.asp, ...
CVE-2019-15625
PUBLISHED: 2020-01-18
A memory usage vulnerability exists in Trend Micro Password Manager 3.8 that could allow an attacker with access and permissions to the victim's memory processes to extract sensitive information.
CVE-2019-19696
PUBLISHED: 2020-01-18
A RootCA vulnerability found in Trend Micro Password Manager for Windows and macOS exists where the localhost.key of RootCA.crt might be improperly accessed by an unauthorized party and could be used to create malicious self-signed SSL certificates, allowing an attacker to misdirect a user to phishi...
CVE-2019-19697
PUBLISHED: 2020-01-18
An arbitrary code execution vulnerability exists in the Trend Micro Security 2019 (v15) consumer family of products which could allow an attacker to gain elevated privileges and tamper with protected services by disabling or otherwise preventing them to start. An attacker must already have administr...
CVE-2019-20357
PUBLISHED: 2020-01-18
A Persistent Arbitrary Code Execution vulnerability exists in the Trend Micro Security 2020 (v160 and 2019 (v15) consumer familiy of products which could potentially allow an attacker the ability to create a malicious program to escalate privileges and attain persistence on a vulnerable system.