430 research outputs found
Recommended from our members
Cryptographic approaches to security and optimization in machine learning
Modern machine learning techniques have achieved surprisingly good standard test accuracy, yet classical machine learning theory has been unable to explain the underlying reason behind this success. The phenomenon of adversarial examples further complicates our understanding of what it means to have good generalization ability. Classifiers that generalize well to the test set are easily fooled by imperceptible image modifications, which can often be computed without knowledge of the classifier itself. The adversarial error of a classifier measures the error under which each test data point can be modified by an algorithm before it is given as input to the classifier. Followup work has showed that a tradeoff exists between optimizing for standard generalization error versus for adversarial error. This calls into question whether standard generalization error is the correct metric to measure.
We try to understand the generalization capability of modern machine learning techniques through the lens of adversarial examples. To reconcile the apparent tradeoff between the two competing notions of error, we create new security definitions and classifier constructions which allow us to prove an upper bound on the adversarial error that decreases as standard test error decreases. We introduce a cryptographic proof technique by defining a security assumption in a simpler attack setting and proving a security reduction from a restricted black-box attack problem to this security assumption. We then investigate the double descent curve in the interpolation regime, where test error can continue to decrease even after training error has reached zero, to give a natural explanation for the observed tradeoff between adversarial error and standard generalization error.
The second part of our work investigates further this notion of a black-box model by looking at the separation between being able to evaluate a function and being able to actually understand it. This is formalized through the notion of function obfuscation in cryptography. Given some concrete implementation of a function, the implementation is considered obfuscated if a user cannot produce the function output on a test input without querying the implementation itself. This means that a user cannot actually learn or understand the function even though all of the implementation details are presented in the clear. As expected this is a very strong requirement that does not exist for all functions one might be interested in. In our work we make progress on providing obfuscation schemes for simple, explicit function classes.
The last part of our work investigates non-statistical biases and algorithms for nonconvex optimization problems. We show that the continuous-time limit of stochastic gradient descent does not converge directly to the local optimum, but rather has a bias term which grows with the step size. We also construct novel, non-statistical algorithms for two parametric learning problems by employing lattice basis reduction techniques from cryptography
SoK: Acoustic Side Channels
We provide a state-of-the-art analysis of acoustic side channels, cover all
the significant academic research in the area, discuss their security
implications and countermeasures, and identify areas for future research. We
also make an attempt to bridge side channels and inverse problems, two fields
that appear to be completely isolated from each other but have deep
connections.Comment: 16 page
Cloud-based homomorphic encryption for privacy-preserving machine learning in clinical decision support
While privacy and security concerns dominate public cloud services, Homomorphic Encryption (HE) is seen as an emerging solution that ensures secure processing of sensitive data via untrusted networks in the public cloud or by third-party cloud vendors. It relies on the fact that some encryption algorithms display the property of homomorphism, which allows them to manipulate data meaningfully while still in encrypted form; although there are major stumbling blocks to overcome before the technology is considered mature for production cloud environments. Such a framework would find particular relevance in Clinical Decision Support (CDS) applications deployed in the public cloud. CDS applications have an important computational and analytical role over confidential healthcare information with the aim of supporting decision-making in clinical practice. Machine Learning (ML) is employed in CDS applications that typically learn and can personalise actions based on individual behaviour. A relatively simple-to-implement, common and consistent framework is sought that can overcome most limitations of Fully Homomorphic Encryption (FHE) in order to offer an expanded and flexible set of HE capabilities. In the absence of a significant breakthrough in FHE efficiency and practical use, it would appear that a solution relying on client interactions is the best known entity for meeting the requirements of private CDS-based computation, so long as security is not significantly compromised. A hybrid solution is introduced, that intersperses limited two-party interactions amongst the main homomorphic computations, allowing exchange of both numerical and logical cryptographic contexts in addition to resolving other major FHE limitations. Interactions involve the use of client-based ciphertext decryptions blinded by data obfuscation techniques, to maintain privacy. This thesis explores the middle ground whereby HE schemes can provide improved and efficient arbitrary computational functionality over a significantly reduced two-party network interaction model involving data obfuscation techniques. This compromise allows for the powerful capabilities of HE to be leveraged, providing a more uniform, flexible and general approach to privacy-preserving system integration, which is suitable for cloud deployment. The proposed platform is uniquely designed to make HE more practical for mainstream clinical application use, equipped with a rich set of capabilities and potentially very complex depth of HE operations. Such a solution would be suitable for the long-term privacy preserving-processing requirements of a cloud-based CDS system, which would typically require complex combinatorial logic, workflow and ML capabilities
Automated Cloud Removal on High-Altitude UAV Imagery Through Deep Learning on Synthetic Data
New theories and applications of deep learning have been discovered and implemented within the field of machine learning recently. The high degree of effectiveness of deep learning models span across many domains including image processing and enhancement. Specifically, the automated removal of clouds, smoke, and haze from images has become a prominent and pertinent field of research. In this paper, I propose an analysis and synthetic training data variant for the All-in-One Dehazing Network (AOD-Net) architecture that performs better on removing clouds and haze; most specifically on high altitude unmanned aerial vehicles (UAVs) images
A Survey of Adversarial Machine Learning in Cyber Warfare
The changing nature of warfare has seen a paradigm shift from the conventional to asymmetric, contactless warfare such as information and cyber warfare. Excessive dependence on information and communication technologies, cloud infrastructures, big data analytics, data-mining and automation in decision making poses grave threats to business and economy in adversarial environments. Adversarial machine learning is a fast growing area of research which studies the design of Machine Learning algorithms that are robust in adversarial environments. This paper presents a comprehensive survey of this emerging area and the various techniques of adversary modelling. We explore the threat models for Machine Learning systems and describe the various techniques to attack and defend them. We present privacy issues in these models and describe a cyber-warfare test-bed to test the effectiveness of the various attack-defence strategies and conclude with some open problems in this area of research.
Recommended from our members
From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality
The past two decades have witnessed several paradigm shifts in computing environments. Starting from cloud computing which offers on-demand allocation of storage, network, compute, and memory resources, as well as other services, in a pay-as-you-go billingmodel. Ending with the rise of permissionless blockchain technology, a decentralized computing paradigm with lower trust assumptions and limitless number of participants. Unlike in the cloud, where all the computing resources are owned by some trusted cloud provider, permissionless blockchains allow computing resources owned by possibly malicious parties to join and leave their network without obtaining permission from some centralized trusted authority. Still, in the presence of malicious parties, permissionlessblockchain networks can perform general computations and make progress. Cloud computing is powered by geographically distributed data-centers controlled and managed by trusted cloud service providers and promises theoretically infinite computing resources. On the other hand, permissionless blockchains are powered by open networks of geographically distributed computing nodes owned by entities that are not necessarily known or trusted. This paradigm shift requires a reconsideration of distributed data management protocols and distributed system designs that assume low latency across system components, inelastic computing resources, or fully trusted computing resources.In this dissertation, we propose new system designs and optimizations that address scalability and efficiency of distributed data management systems in cloud environments. We also propose several protocols and new programming paradigms to extend the functionality and enhance the robustness of permissionless blockchains. The work presented spans global-scale transaction processing, large-scale stream processing, atomic transaction processing across permissionless blockchains, and extending the functionality and the use-cases of permissionless blockchains. In all these directions, the focus is on rethinking system and protocol designs to account for novel cloud and permissionless blockchain assumptions. For global-scale transaction processing, we propose GPlacer, a placement optimization framework that decides replica placement of fully and partial geo-replicated databases. For large-scale stream processing, we propose Cache-on-Track (CoT) an adaptive and elastic client-side cache that addresses server-side load-imbalances that occur in large-scale distributed storage layers. In permissionless blockchain transaction processing, we propose AC3WN, the first correct cross-chain commitment protocol that guarantees atomicity of cross-chain transactions. Also, we propose TXSC, a transactional smart contract programming framework. TXSC provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions. In addition, we propose a forward-looking architecture that unifies both permissioned and permissionless blockchains and exploits the running infrastructure of permissionless blockchains to build global asset management systems
REDIR: Automated Static Detection of Obfuscated Anti-Debugging Techniques
Reverse Code Engineering (RCE) to detect anti-debugging techniques in software is a very difficult task. Code obfuscation is an anti-debugging technique makes detection even more challenging. The Rule Engine Detection by Intermediate Representation (REDIR) system for automated static detection of obfuscated anti-debugging techniques is a prototype designed to help the RCE analyst improve performance through this tedious task. Three tenets form the REDIR foundation. First, Intermediate Representation (IR) improves the analyzability of binary programs by reducing a large instruction set down to a handful of semantically equivalent statements. Next, an Expert System (ES) rule-engine searches the IR and initiates a sensemaking process for anti-debugging technique detection. Finally, an IR analysis process confirms the presence of an anti-debug technique. The REDIR system is implemented as a debugger plug-in. Within the debugger, REDIR interacts with a program in the disassembly view. Debugger users can instantly highlight anti-debugging techniques and determine if the presence of a debugger will cause a program to take a conditional jump or fall through to the next instruction
- …