Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Partner Perspectives  Connecting marketers to our tech communities.
4/27/2016
10:00 AM
Vincent Weafer
Vincent Weafer
Partner Perspectives
50%
50%

10 Questions To Ask Yourself About Securing Big Data

Big data introduces new wrinkles for managing data volume, workloads, and tools. Securing increasingly large amounts of data begins with a good governance model across the information life cycle. From there, you may need specific controls to address various vulnerabilities. Here are a set of questions to help ensure that you have everything covered.

1. What is your high-risk and high-value data?

Data classification is labor intensive, but you have to do it. It just makes sense: The most valuable or sensitive data requires the highest levels of security. Line-of-business teams have to collaborate with legal and security personnel to get this right. A well-defined classification system should be paired with determination of data stewardship. If everybody owns the data, nobody is really accountable for its care and appropriate use, and it will be more difficult to apply information lifecycle policies.

2. What is your policy for data retention and deletion?

Every company needs clear directions on which data is kept, and for how long. Like any good policy, it needs to be clear - so everyone can follow it. And it needs to be enforced – so they will.

More data means more opportunity, but it can also mean more risk. The first step to reducing that risk is to get rid of what you don’t need. This is a classic tenet of information lifecycle management. If data doesn’t have a purpose, it’s a liability.  One idea for reducing that liability in regards to privacy is to apply de-identification techniques before storing data. That way you can still look for trends, but the data can’t be linked to any individual. De-identification might not be appropriate for any given business need, but it can be a useful approach to have in your toolbox.

3. How do you track who accesses which data?

How you are going to track the data, and who has access to the data, is a foundational element of security. As your analytics programs become more successful, you are likely to be exposed to more sensitive data, so tools and storage mechanisms should have that tracking capability built in from the beginning. After all, if you don’t have the right tracking tools in place at the outset, it’s hard to add them after the fact.

4. Are users creating copies of your corporate data?

Of course they are. Data tends to be copied. A department might want a local copy of a database for faster analysis. A single user might decide to put some data in an Excel spreadsheet, and so on.

So the next question to ask yourself is this: what is the governance model for this process, and how are policies for control passed through to the new copy and the maintainer of this resource? Articulating a clear answer for your company will help prevent sensitive data from leaking out by gradually passing into less secure repositories.

5. What types of encryption and data integrity mechanisms are required?

Beyond technical issues of cryptographic strength, hashing and salting and so on, here are sometimes-overlooked questions to address:

·       Is your encryption setup truly end-to-end, or is there a window of vulnerability between data capture and encryption, or at the point when data is decrypted for analysis? A number of famous data breaches have occurred when hackers grabbed data at the point of capture.

·       Does your encryption method work seamlessly across all databases in your environment?

·       Do you store and manage your encryption keys securely, and who has access to those keys?

Encryption protects data from theft, but doesn’t guarantee its integrity. Separate data integrity mechanisms are required for some use cases, and become increasingly important as data volumes grow and more data sources are incorporated. For example, to mitigate the risk of data poisoning or pollution, a company can implement automatic checks flagging incoming data that doesn’t match the expected volume, file size or pattern.

6. If your algorithms or data analysis methods are proprietary, how do you protect them?

Protecting proprietary discoveries? That’s old hat. What’s easier to miss is the way you arrive at those discoveries. In a competitive industry, a killer algorithm can be a valuable piece of intellectual property.

The data and systems get most of the glory, but analysis methods may deserve just as much protection, with both technical and legal safeguards. Have you vetted and published a plan for securely handling this type of information?

7. How do you validate the security posture of all physical and virtual nodes in your analysis computing cluster?

Big-data analysis often relies on the power of distributed computing. A rogue or infected node can cause your cluster to spring a data leak. Hardware-based controls deserve consideration.

8. Are you working with data generated by Internet of Things sensors?

The key with IoT is to ensure that data is consistently secured from the edge to the data center, with a particular eye on privacy-related data. IoT sensors may present their own security challenges. Are all gateways or other edge devices adequately protected? Industrial devices can be difficult to patch or have a less mature vulnerability management process.

9. What role does the cloud play in your analytics program?

You’ll want to review the contractual obligations and internal policies of those hosting your data or processing. It’s important to know which physical locations they will use, and whether all those facilities have consistent physical (not just logical) security controls. And of course, the geographic locations may impact your regulatory compliance programs.

10. Which individuals in your IT organization are developing security skills and knowledge specific to your big-data tool set?

Over time, your project list, data sets, and toolbox are likely to grow. The more in-house knowledge you develop, the better your own security questions will be. 

Vincent Weafer is Senior Vice President of Intel Security, managing more than 350 researchers across 30 countries. He's also responsible for managing millions of sensors across the globe, all dedicated to protecting our customers from the latest cyber threats. Vincent's team ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Peter Fretty
100%
0%
Peter Fretty,
User Rank: Moderator
4/27/2016 | 1:44:57 PM
IoT necessity
Great advice. There is a huge need for a focus on developing and strengthening big data governance and security strategies, especially as we head into the IoT realm. Peter Fretty, IDG blogger for SAS Big Data Forum 
COVID-19: Latest Security News & Commentary
Dark Reading Staff 7/2/2020
Ripple20 Threatens Increasingly Connected Medical Devices
Kelly Sheridan, Staff Editor, Dark Reading,  6/30/2020
DDoS Attacks Jump 542% from Q4 2019 to Q1 2020
Dark Reading Staff 6/30/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
How Cybersecurity Incident Response Programs Work (and Why Some Don't)
This Tech Digest takes a look at the vital role cybersecurity incident response (IR) plays in managing cyber-risk within organizations. Download the Tech Digest today to find out how well-planned IR programs can detect intrusions, contain breaches, and help an organization restore normal operations.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-9498
PUBLISHED: 2020-07-02
Apache Guacamole 1.1.0 and older may mishandle pointers involved inprocessing data received via RDP static virtual channels. If a userconnects to a malicious or compromised RDP server, a series ofspecially-crafted PDUs could result in memory corruption, possiblyallowing arbitrary code to be executed...
CVE-2020-3282
PUBLISHED: 2020-07-02
A vulnerability in the web-based management interface of Cisco Unified Communications Manager, Cisco Unified Communications Manager Session Management Edition, Cisco Unified Communications Manager IM & Presence Service, and Cisco Unity Connection could allow an unauthenticated, remote attack...
CVE-2020-5909
PUBLISHED: 2020-07-02
In versions 3.0.0-3.5.0, 2.0.0-2.9.0, and 1.0.1, when users run the command displayed in NGINX Controller user interface (UI) to fetch the agent installer, the server TLS certificate is not verified.
CVE-2020-5910
PUBLISHED: 2020-07-02
In versions 3.0.0-3.5.0, 2.0.0-2.9.0, and 1.0.1, the Neural Autonomic Transport System (NATS) messaging services in use by the NGINX Controller do not require any form of authentication, so any successful connection would be authorized.
CVE-2020-5911
PUBLISHED: 2020-07-02
In versions 3.0.0-3.5.0, 2.0.0-2.9.0, and 1.0.1, the NGINX Controller installer starts the download of Kubernetes packages from an HTTP URL On Debian/Ubuntu system.