Big data promises to revolutionize the way we do business. It also has the potential to eviscerate our privacy and security unless we manage it properly. Big data security issues can have embarrassing and potentially costly regulatory and legal ramifications further down the line. By identifying and addressing those big data problems at the design stage, companies can save themselves—and potentially their customers—a lot of pain later.
Big data projects pile gigabytes of data into large data ‘lakes,’ and then analyze it in different ways. It is tempting to treat this data as an interchangeable commodity, with every record seemingly alike. In practice, data lakes include structured data, such as customer transactions and sensor outputs, along with unstructured data ranging from social media posts to news articles.
Some of this data can be highly sensitive, and companies must protect it so customer information doesn’t leak out, creating big data analytics issues.
One such approach is pseudoanonymization. This process, explicitly referred to in the EU’s forthcoming General Data Protection Regulation (GDPR), involves stripping out sensitive fields like name and address from a data record. This sensitive information is then stored separately in a protected area and linked back to the original records using unique IDs.
Another measure to mitigate problems with big data analytics is proper user authentication. Information should be available on a need-to-know basis, and a robust identity and access management (IAM) system can help stop unauthorized users from accessing it.
Integrate your user authentication solution with encryption to protect data both in transit and at rest. Encryption is particularly important in big data environments, even after pseudoanonymization. That’s because in many cases, it’s possible to re-identify data even after sensitive information has been removed, by examining fields in aggregate. That’s how researchers identified customers in an anonymous Netflix® data set in 2007, creating big data legal issues for the video company.
Encrypting data provides an extra layer of protection for large data sets. It comes with its own big data problems, though, because analytics software frequently chops up data and distributes it between large numbers of clustered systems. This means it isn’t just the users that must be trusted; the systems must also be able to trust each other and ensure that they are not passing data to imposters.
Administrators can adopt complementary approaches here. Firstly, instead of relying on host systems for cryptographic security, use a cryptographic ‘shell,’ which decrypts data based on an access policy embedded in it. Secondly, have each clustered system participating in a parallel big data analytics job verify itself using its own cryptographic key.
All this will help to avert big data security issues, but unless companies secure the software tools that they are using, they could find themselves back at square one.
The cloud-native tools commonly used to process big data are new for many technology staff, who will often be specialists in data science rather than in open source big data tools deployment. Simple NoSQL database misconfiguration has left hundreds of thousands of records exposed online, and the same is true in public cloud environments.
This makes employee training and a solid DevOps strategy one of the strongest weapons in your cybersecurity arsenal when rolling out big data projects. If teams are tightly coordinated and understand the security implications of the tools they are using, this can help prevent some of the more egregious big data security problems that might otherwise befall your big data team.
What developments could help to prevent big data security problems in the future? One particularly promising technology is homomorphic encryption. This nascent area of research enables software to process data without decrypting it at all, further decreasing its vulnerability to attack. Some products are already starting to use this technology, and more will likely follow.
As cybersecurity technologies evolve to support cloud-based big data processing, we can do a lot ourselves by following best practices and thinking our data architecture through before crunching numbers in volume. That will help to keep big data privacy issues from your door, and help ensure that you protect your business assets appropriately.
Danny Bradbury has been a technology journalist since 1989. He writes for titles including the Guardian newspaper and Canada’s National Post®. Danny specialises in areas including cybersecurity and cryptocurrency. He authors the About Bitcoin website, and also wrote a regular blog on technology for children called Kids Tech News. You can follow Danny on Twitter® at @DannyBradbury
© 2017 SolarWinds MSP UK Ltd. All rights reserved.
Get the latest MSP tips, tricks, and ideas sent to your inbox each week.