188 research outputs found
Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector
Machinelearning(ML)istodaycommonlyemployedintheFinancialServicesSector(FSS) to create various models to predict a variety of conditions ranging from financial transactions fraud to outcomes of investments and also targeted marketing campaigns. The common ML technique used for the modeling is supervised learning using regression algorithms and usually involves large amounts of data that needs to be shared and prepared before the actual learning phase. Compliance with privacy laws and confidentiality regulations requires that most, if not all, of the data must be kept in a secure environment, usually in-house, and not outsourced to cloud or multi-tenant shared environments. This paper presents the results of a research collaboration between IBM Research and Banco Bradesco SA to investigate approaches to homomorphically secure a typical ML pipeline commonly employed in the FSS industry.
We investigated and de-constructed a typical ML pipeline used by Banco Bradesco and applied Homo- morphic Encryption (HE) to two of the important ML tasks, namely the variable selection phase of the model generation task and the prediction task. Variable selection, which usually precedes the training phase, is very important when working with data sets for which no prior knowledge of the covariate set exists. Our work provides a way to define an initial covariate set for the training phase while preserving the privacy and confidentiality of the input data sets.
Quality metrics, using real financial data, comprising quantitative, qualitative and categorical features, demonstrated that our HE based pipeline can yield results comparable to state of the art variable selection techniques and the performance results demonstrated that HE technology has reached the inflection point where it can be useful in batch processing in a financial business setting
Ethics and Responsible AI Deployment
As Artificial Intelligence (AI) becomes more prevalent, protecting personal
privacy is a critical ethical issue that must be addressed. This article
explores the need for ethical AI systems that safeguard individual privacy
while complying with ethical standards. By taking a multidisciplinary approach,
the research examines innovative algorithmic techniques such as differential
privacy, homomorphic encryption, federated learning, international regulatory
frameworks, and ethical guidelines. The study concludes that these algorithms
effectively enhance privacy protection while balancing the utility of AI with
the need to protect personal data. The article emphasises the importance of a
comprehensive approach that combines technological innovation with ethical and
regulatory strategies to harness the power of AI in a way that respects and
protects individual privacy
Data Spaces
This open access book aims to educate data space designers to understand what is required to create a successful data space. It explores cutting-edge theory, technologies, methodologies, and best practices for data spaces for both industrial and personal data and provides the reader with a basis for understanding the design, deployment, and future directions of data spaces. The book captures the early lessons and experience in creating data spaces. It arranges these contributions into three parts covering design, deployment, and future directions respectively. The first part explores the design space of data spaces. The single chapters detail the organisational design for data spaces, data platforms, data governance federated learning, personal data sharing, data marketplaces, and hybrid artificial intelligence for data spaces. The second part describes the use of data spaces within real-world deployments. Its chapters are co-authored with industry experts and include case studies of data spaces in sectors including industry 4.0, food safety, FinTech, health care, and energy. The third and final part details future directions for data spaces, including challenges and opportunities for common European data spaces and privacy-preserving techniques for trustworthy data sharing. The book is of interest to two primary audiences: first, researchers interested in data management and data sharing, and second, practitioners and industry experts engaged in data-driven systems where the sharing and exchange of data within an ecosystem are critical
Taking Computation to Data: Integrating Privacy-preserving AI techniques and Blockchain Allowing Secure Analysis of Sensitive Data on Premise
PhD thesis in Information technologyWith the advancement of artificial intelligence (AI), digital pathology has seen significant progress in recent years. However, the use of medical AI raises concerns about patient data privacy. The CLARIFY project is a research project funded under the European Union’s Marie Sklodowska-Curie Actions (MSCA) program. The primary objective of CLARIFY is to create a reliable, automated digital diagnostic platform that utilizes cloud-based data algorithms and artificial intelligence to enable interpretation and diagnosis of wholeslide-images (WSI) from any location, maximizing the advantages of AI-based digital pathology.
My research as an early stage researcher for the CLARIFY project centers on securing information systems using machine learning and access control techniques. To achieve this goal, I extensively researched privacy protection technologies such as federated learning, differential privacy, dataset distillation, and blockchain. These technologies have different priorities in terms of privacy, computational efficiency, and usability. Therefore, we designed a computing system that supports different levels of privacy security, based on the concept: taking computation to data. Our approach is based on two design principles. First, when external users need to access internal data, a robust access control mechanism must be established to limit unauthorized access. Second, it implies that raw data should be processed to ensure privacy and security. Specifically, we use smart contractbased access control and decentralized identity technology at the system security boundary to ensure the flexibility and immutability of verification. If the user’s raw data still cannot be directly accessed, we propose to use dataset distillation technology to filter out privacy, or use locally trained model as data agent. Our research focuses on improving the usability of these methods, and this thesis serves as a demonstration of current privacy-preserving and secure computing technologies
Recommended from our members
FlexFHE: A System for Homomorphically Encrypting DNA and Operating on Encrypted Data Securely in Untrusted Environments
DNA data contains sensitive health information and personally identifiable data. Currently, even if DNA data is stored in encrypted databases, it must be decrypted for health professionals and researchers to analyze, which means that DNA data exists in plaintext on unsecured, untrusted servers and machines during analysis. This thesis describes a complete system for homomorphically encrypting DNA data in a trusted context and then running analytic operations on the encrypted DNA data in an untrusted context, thus allowing healthcare professionals and researchers to run both high volume analytics on many individuals’ sequenced DNA and run complex analytics on a single individual’s sequenced DNA without ever handling plaintext data.
Symmetric encryption is used as a mechanism for controlling which queries are made on the data. The threat model addressed by this system allows an authorized party to run only authorized queries on a genome, while restricting any additional access.
The system implemented achieves substring search, substring search with wildcards representing mutations, and percent match between two nucleotide sequences by converting genomic data into one-hot binary matrixes and encrypting each bit individually using OpenFHE’s LWE Encryption implemented using the CGGI scheme. While runtime for each operation is O(nm), each operation is maximally parallelized using OpenMP, thus allowing for accelerated performance on machines with multiple CPUs without the need for batching
Technologies and Applications for Big Data Value
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems
- …