All Posts
Research
1
min read

Data Privacy in a World of Outsourced Artificial Intelligence

Published on
April 27, 2017
Table of Contents

Artificial intelligence(AI) and deep learning can lead to powerful business insights. Many executives are ready to harness the power of this technology but one main challenge holds them back. Hiring technical talent for cybersecurity is hard enough in itself; hiring technical talent for AI is a much bigger challenge.

This problem was recently faced by the UK’s National Health Service(NHS). Tremendous results have been demonstrated recently using computer vision techniques to identify specific types of illness in medical patients by looking at scans of the patient’s body. Artificial Intelligence has a strong track record of effectively predicting medical conditions such as Cancer, Heart attacks and many other image-based diagnoses.

Medical information is particularly sensitive to medical organizations like the NHS, but it is also among the most lucrative types of PII to cybercriminals. Many freely available AI/machine learning software packages exist such libraries as theano, torch, cntk, and tensorflow. Despite the availability of these tools, many organizations like the NHS do not have sufficient access to experts able to run powerful machine learning tools. Without this type of collaboration many illnesses may go unidentified and people could die. So the NHS* decided to partner with DeepMind, a company acquired by Alphabet/Google. The University of Cambridge and the Economist wrote an article detailing many aspects of the contract.

As a result, DeepMind gets access to 1.6 million medical records and a neat application of its technology, in addition to undisclosed funding. This data includes blood tests, medical diagnostics and historical patient records but also even more sensitive data such as HIV diagnosis and prior drug use. In the sub-discipline of machine learning called Deep Learning, the algorithms are particularly dependent on having a large data corpus.

When an organization is faced with the choice of outsourcing sensitive information to experts, what are the choices? Any organization outsourcing information should redact all personally identifiable information such as name and personal identifiers. This instead can be represented by a pseudonym - a unique mapping such as a hash function - where the unique identifier and the PII are held only by the trusted entity (NHS in this case). Furthermore, semi-sensitive information that would have value to the ML model should be abstracted. For example, geographical location may be a powerful indicator of an illness, but the raw data could be used to reverse-engineer PII of a given patient. In this case binning the information so a little fidelity is lost is an effective trade-off between empowering the AI’s prediction power and protecting patient confidentiality. For example, grouping specific addresses into zip codes or counties may be a nice trade-off in this space.

The tradeoff of security and predictive power will likely be a challenging problem for data owners. AI is able to combine many weak signals and often make surprising conclusions. In one study by CMU researchers found social security numbers were surprisingly predictable, and the AI algorithms could usually reconstruct a SSN from information such as birthdate and gender. So being able to guarantee that AI can’t reconstruct your PII is an unsolved problem, and likely very dependent on the data. However, best-effort strategies like those outlined above can help mitigate against most concerns.

In the future this issue may change significantly. Recent developments in federated learning may allow for increased flexibility where keeping data on premise may become more available. A related technology of homomorphic encryption has been in the works for far longer. In homomorphic encryption the computations occur on encrypted data without ever having to decrypt the data, which would significantly reduce the security concern. We are still years out of technology solving this problem directly. In the interim the promise of the AI benefits are too great for most organizations to wait.

At Anomali, we deal with sensitive information regularly, as we help many organizations around the world winnow down data from across the enterprise and focus on the applicable security threats. We address privacy issues with on-premise deployments such as Anomali Match; or by very tight access controls and data isolation like our Trusted Circles feature for sharing threat intelligence in our Threat Intelligence Platform, ThreatStream.

*The agreement was signed by the Royal Free NHS Trust, a small subordinate component of the much larger NHS. The Royal Free Trust is comprised of three hospitals in London.

FEATURED RESOURCES

January 20, 2026
Anomali Cyber Watch

Anomali Cyber Watch: Remcos RAT, BitB phishing, Linux Malware Framework, Supply Chain Intrusion and more

New Malware Campaign Delivers Remcos RAT Through Text-Only Staging and Living-Off-the-Land Execution. Browser-in-the-Browser Phishing Evolves into a High-Fidelity Credential Trap. Cloud-Aware Linux Malware Framework Poised for Future Threats. And More..
Read More
January 13, 2026
Anomali Cyber Watch

Anomali Cyber Watch: Cisco ISE Flaw, Ni8mare, N8scape, Zero-Click Prompt Injection and more

Anomali Cyber Watch: Cisco ISE Flaw Enables Arbitrary File Read via Administrative Access. Ni8mare and N8scape Vulnerabilities Expose n8n Automation Platforms to Full Compromise. Zero-Click Prompt Injection Abuse Enables Silent Data Exfiltration via AI Agents. Phishing Attacks Exploit Misconfigured Email Routing to Spoof Internal Domains. Ransomware Activity in the U.S. Continued to Rise in 2025. Android Ghost Tap Malware Drives Remote NFC Payment Fraud Campaigns. Black Cat SEO Poisoning Malware Campaign Exploits Software Search Results. MuddyWater Upgrades Espionage Arsenal with RustyWater RAT in Middle East Spear-Phishing. China-Linked ESXi VM Escape Exploit Observed in the Wild. Instagram Denies Data Breach Despite Claims of 17.5 Million Account Data Leak
Read More
January 6, 2026
Anomali Cyber Watch

Anomali Cyber Watch: OWASP Agentic AI, MongoBleed, WebRAT Malware, and more

Real-World Attacks Behind OWASP Agentic AI Top 10. MongoDB Memory Leak Vulnerability “MongoBleed” Actively Exploited. WebRAT Malware Spread via Fake GitHub Proof of Concept Exploits. Trusted Cloud Automation Weaponized for Credential Phishing. MacSync macOS Stealer Evolves to Abuse Code Signing and Swift Execution. Claimed Resecurity Breach Turns Out to Be Honeypot Trap. Cybersecurity Professionals Sentenced for Enabling Ransomware Attacks. Google Tests Nano Banana 2 Flash as Its Fastest Image AI Model. RondoDox Botnet Exploits React2Shell to Hijack 90,000+ Systems. Critical n8n Expression Injection Leads to Arbitrary Code Execution
Read More
Explore All