Can data science save social media?
Bob Ackerman Jr.
Contributor
Robert Ackerman Jr. is the founder and a managing director of Allegis Capital, an early-stage cybersecurity venture firm, and a founder of DataTribe, a startup “studio” for fledgling cyber startups staffed by former government technology innovators and cybersecurity professionals.
More posts by this contributor
- The Trump team has failed to address the nation’s mounting cybersecurity threats
- Changing the security landscape for entrepreneurs
The unfettered internet is too often used for malicious purposes and is frequently woefully inaccurate. Social media — especially Facebook — has failed miserably at protecting user privacy and blocking miscreants from sowing discord.
That’s why CEO Mark Zuckerberg was just forced to testify about user privacy before both houses of Congress. And now governmental regulation of Facebook and other social media appears to be a fait accompli.
At this key juncture, the crucial question is whether regulation — in concert with Facebook’s promises to aggressively mitigate its weaknesses — will correct the privacy abuses and continue to fulfill Facebook’s goal of giving people the power to build transparent communities, bringing the world closer together?
The answer is maybe.
What has not been said is that Facebook must embrace data science methodologies initially created in the bowels of the federal government to help protect its two billion users. Simultaneously, Facebook must still enable advertisers — its sole source of revenue — to get the user data required to justify their expenditures.
Specifically, Facebook must promulgate and embrace what is known in high-level security circles as homomorphic encryption (HE), often considered the “Holy Grail” of cryptography, and data provenance (DP). HE would enable Facebook, for example, to generate aggregated reports about its user psychographic profiles so that advertisers could still accurately target groups of prospective customers without knowing their actual identities.
Meanwhile, data provenance — the process of tracing and recording true identities and the origins of data and its movement between databases — could unearth the true identities of Russian perpetrators and other malefactors, or at least identify unknown provenance, adding much-needed transparency in cyberspace.
Both methodologies are extraordinarily complex. IBM and Microsoft, in addition to the National Security Agency, have been working on HE for years, but the technology has suffered from significant performance challenges. Progress is being made, however. IBM, for example, has been granted a patent on a particular HE method — a strong hint it’s seeking a practical solution — and last month proudly announced that its rewritten HE encryption library now works up to 75 times faster. Maryland-based ENVEIL, a startup staffed by the former NSA HE team, has broken the performance barriers required to produce a commercially viable version of HE, benchmarking millions of times faster than IBM in tested use cases.
How homomorphic encryption would help Facebook
HE is a technique used to operate on and draw useful conclusions from encrypted data without decrypting it, simultaneously protecting the source of the information. It is useful to Facebook because its massive inventory of personally identifiable information is the foundation of the economics underlying its business model. The more comprehensive the data sets about individuals, the more precisely advertising can be targeted.
HE could keep Facebook information safe from hackers and inappropriate disclosure, but still extract the essence of what the data tells advertisers. It would convert encrypted data into strings of numbers, do math with these strings, then decrypt the results to get the same answer it would if the data wasn’t encrypted at all.
A particularly promising sign for HE emerged last year, when Google revealed a new marketing measurement tool that relies on this technology to allow advertisers to see whether their online ads result in in-store purchases.
Unearthing this information requires analyzing data sets belonging to separate organizations, notwithstanding the fact that these organizations pledge to protect the privacy and personal information of the data subjects. HE skirts this by generating aggregated, non-specific reports about the comparisons between these data sets.
In pilot tests, HE enabled Google to successfully analyze encrypted data about who clicked on an advertisement in combination with another encrypted multi-company data set that recorded credit card purchase records. With this data in hand, Google was able to provide reports to advertisers summarizing the relationship between the two databases to conclude, for example, that five percent of the people who clicked on an ad wound up purchasing in a store.
Data provenance
Data provenance has a markedly different core principle. It’s based on the fact that digital information is atomized into 1s and 0s with no intrinsic truth. The dual digits exist only to disseminate information, whether accurate or widely fabricated. A well-crafted lie can easily be indistinguishable from the truth and distributed across the internet. What counts is the source of these 1s and 0s. In short, is it legitimate? What is the history of the 1s and 0s?
The art market, as an example, deploys DP to combat fakes and forgeries of the world’s greatest paintings, drawings and sculptures. It uses DP techniques to create a verifiable, chain-of-custody for each piece of the artwork, preserving the integrity of the market.
Much the same thing can be done in the online world. For example, a Facebook post referencing a formal statement by a politician, with an accompanying photo, would have provenance records directly linking the post to the politician’s press release and even the specifics of the photographer’s camera. The goal — again — is ensuring that data content is legitimate.
Companies such as Walmart, Kroger, British-based Tesco and Swedish-based H&M, an international clothing retailer, are using or experimenting with new technologies to provide provenance data to the marketplace.
Let’s hope that Facebook and its social media brethren begin studying HE and DP thoroughly and implement it as soon as feasible. Other strong measures — such as the upcoming implementation of the European Union’s General Data Protection Regulation, which will use a big stick to secure personally identifiable information — essentially should be cloned in the U.S. What is best, however, are multiple avenues to enhance user privacy and security, while hopefully preventing breaches in the first place. Nothing less than the long-term viability of social media giants is at stake.