What You Need To Know About Confidential Computing For Data Science

Adam Lieberman, head of artificial intelligence and machine learning at Finastra.

Data science and artificial intelligence allow us to complete complex tasks, create streamlined automation and develop smarter products and services. However, our ability to solve important, challenging problems around the world and in the field of finance often depends on gaining access to data that is distributed and siloed.

Securing access to data can be difficult, and moving physical datasets can be even harder. Useful datasets may be riddled with persona identifiable information (PII), and gaining access often requires stringent partnership agreements and NDAs. It can also take extensive collaboration and multiple hand-offs between teams.

When machine learning engineers and data scientists can’t solve a problem, they often say, “We don’t have the right data.” But the majority of the time, the data exists—it is just not readily accessible. So, how can we realistically bring all the world’s data together in a completely secure and private way to encourage collaboration for solving the most pressing problems?

The answer lies in the world of confidential computing and remote data science, specifically the concept of federated data networks. I believe confidential computing and federated data networks (FDNs) will be hugely significant to the future of remote data science—both for enhancing data security and for revolutionizing data sharing in the digital age.

Confidential Computing And Federated Data Networks Defined

Confidential computing is a secure technology that ensures data remains encrypted and protected, even when it is being processed by a computer or otherwise in use. This enhances overall data security by maintaining the confidentiality of sensitive information and protecting it against unauthorized access, both during computation and when it is at rest.

Federated data networks are systems comprised of distinct networks that allow the facilitation of accessing data without sharing it directly. They are commonly used in industries such as healthcare, finance and government, where sensitive data is shared among multiple parties to support research, analysis or decision-making.

To put it more simply, an FDN is like a team of libraries, where each library (node) retains control of its books (data). When you have a question, instead of going to each library, the libraries share their information with each other to answer your question, but the books never leave their respective libraries. This way, libraries can maintain control over their own books while still contributing to a collective knowledge base.

Now, imagine if these libraries wanted to ensure that the information about their books remains private, even while sharing it. This is where confidential computing comes in, which is a technology that encrypts data while it is being processed. So, even when the libraries are sharing book information, the details are kept secret.

Use Cases

Federated data networks and confidential computing work together in scenarios where multiple parties want to collaborate on data but also want to maintain privacy. For example, in healthcare, different hospitals might want to collaborate for research purposes but need to ensure patient data remains confidential. Here, the hospitals can form a federated network and use confidential computing to analyze the data securely.

Similarly, in the world of financial services, FDNs and confidential computing can come in handy for banks that may want to work together to combat something like money laundering without explicitly sharing their data with one another.

Accessing high-quality data, which is key to developing machine learning models, can also present challenges for financial services firms. FDNs can mitigate some of these issues by providing secure access to data that would otherwise be unattainable. Common data access obstacles that confidential computing can help data scientists address include:

• Data Privacy And Security: Financial data is sensitive and subject to strict privacy regulations, making it difficult to access and share. This can create legal and ethical challenges for companies looking to collect, store and use financial data for machine learning purposes.

• Data Quality: Financial data can be complex and messy, making it challenging to clean and pre-process for use in machine learning models. It may also be incomplete or contain errors, which can impact the accuracy and reliability of machine-learning models.

• Data Silos: Financial data is often spread across multiple systems and databases, making it difficult to access and integrate. Data silos can make it difficult to build comprehensive machine-learning models that incorporate data from multiple sources.

• Competition And Proprietary Data: Financial institutions may be hesitant to share their data with competitors or third-party vendors, limiting access to proprietary data that could be used to develop more robust machine learning models.

Basically, FDNs allow multiple parties to work together without sharing their data directly, while confidential computing ensures that this collaboration happens securely and privately.

How To Get Started

Getting started learning and developing with federated data networks involves understanding the principles and technologies behind decentralized and distributed systems.

Step 1. Understand the decentralization basics. Dive into core principles and fundamentals of distributed systems, peer-to-peer networks and decentralization. A core concept is understanding the management and distribution of data in these environments.

Step 2. Understand the federated learning basics. Learn about the principles of federated learning, where machine learning models are leveraging localized data and training across decentralized devices.

Step 3. Explore data security and privacy. Privacy-preserving techniques underpin federated data networks and are crucial for sharing sensitive data. Key concepts include secure multiparty computation, differential privacy and homomorphic encryption.

Step 4. Play with the frameworks. Frameworks such as PySyft, Flower, TensorFlow Federated and more have libraries for implementing federated learning models and spinning up decentralized data networks. This step involves creating some toy data or leveraging a sample dataset and digging into the setup of FDNs and training models in a distributed manner.

Step 5. Find your application. With practice under your belt, spend time ideating use cases that require the technology. Federated data networks are applicable across a wide range of industries.

Step 6. Get involved. The community around decentralized data sharing and federated learning is growing. See how you can contribute to further your skills.

In my experience, leveraging the open-source community is crucial for not only understanding but also developing federated learning models and instantiating federated data networks. This community is tight-knit and eager to help propel the field!

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?