330 research outputs found

    Type-based Dependency Analysis for JavaScript

    Full text link
    Dependency analysis is a program analysis that determines potential data flow between program points. While it is not a security analysis per se, it is a viable basis for investigating data integrity, for ensuring confidentiality, and for guaranteeing sanitization. A noninterference property can be stated and proved for the dependency analysis. We have designed and implemented a dependency analysis for JavaScript. We formalize this analysis as an abstraction of a tainting semantics. We prove the correctness of the tainting semantics, the soundness of the abstraction, a noninterference property, and the termination of the analysis.Comment: Technical Repor

    Survey: Leakage and Privacy at Inference Time

    Get PDF
    Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance as commercial and government applications of ML can draw on multiple sources of data, potentially including users' and clients' sensitive data. We provide a comprehensive survey of contemporary advances on several fronts, covering involuntary data leakage which is natural to ML models, potential malevolent leakage which is caused by privacy attacks, and currently available defence mechanisms. We focus on inference-time leakage, as the most likely scenario for publicly available models. We first discuss what leakage is in the context of different data, tasks, and model architectures. We then propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications. We conclude with outstanding challenges and open questions, outlining some promising directions for future research

    Hardening High-Assurance Security Systems with Trusted Computing

    Get PDF
    We are living in the time of the digital revolution in which the world we know changes beyond recognition every decade. The positive aspect is that these changes also drive the progress in quality and availability of digital assets crucial for our societies. To name a few examples, these are broadly available communication channels allowing quick exchange of knowledge over long distances, systems controlling automatic share and distribution of renewable energy in international power grid networks, easily accessible applications for early disease detection enabling self-examination without burdening the health service, or governmental systems assisting citizens to settle official matters without leaving their homes. Unfortunately, however, digitalization also opens opportunities for malicious actors to threaten our societies if they gain control over these assets after successfully exploiting vulnerabilities in the complex computing systems building them. Protecting these systems, which are called high-assurance security systems, is therefore of utmost importance. For decades, humanity has struggled to find methods to protect high-assurance security systems. The advancements in the computing systems security domain led to the popularization of hardware-assisted security techniques, nowadays available in commodity computers, that opened perspectives for building more sophisticated defense mechanisms at lower costs. However, none of these techniques is a silver bullet. Each one targets particular use cases, suffers from limitations, and is vulnerable to specific attacks. I argue that some of these techniques are synergistic and help overcome limitations and mitigate specific attacks when used together. My reasoning is supported by regulations that legally bind high-assurance security systems' owners to provide strong security guarantees. These requirements can be fulfilled with the help of diverse technologies that have been standardized in the last years. In this thesis, I introduce new techniques for hardening high-assurance security systems that execute in remote execution environments, such as public and hybrid clouds. I implemented these techniques as part of a framework that provides technical assurance that high-assurance security systems execute in a specific data center, on top of a trustworthy operating system, in a virtual machine controlled by a trustworthy hypervisor or in strong isolation from other software. I demonstrated the practicality of my approach by leveraging the framework to harden real-world applications, such as machine learning applications in the eHealth domain. The evaluation shows that the framework is practical. It induces low performance overhead (<6%), supports software updates, requires no changes to the legacy application's source code, and can be tailored to individual trust boundaries with the help of security policies. The framework consists of a decentralized monitoring system that offers better scalability than traditional centralized monitoring systems. Each monitored machine runs a piece of code that verifies that the machine's integrity and geolocation conform to the given security policy. This piece of code, which serves as a trusted anchor on that machine, executes inside the trusted execution environment, i.e., Intel SGX, to protect itself from the untrusted host, and uses trusted computing techniques, such as trusted platform module, secure boot, and integrity measurement architecture, to attest to the load-time and runtime integrity of the surrounding operating system running on a bare metal machine or inside a virtual machine. The trusted anchor implements my novel, formally proven protocol, enabling detection of the TPM cuckoo attack. The framework also implements a key distribution protocol that, depending on the individual security requirements, shares cryptographic keys only with high-assurance security systems executing in the predefined security settings, i.e., inside the trusted execution environments or inside the integrity-enforced operating system. Such an approach is particularly appealing in the context of machine learning systems where some algorithms, like the machine learning model training, require temporal access to large computing power. These algorithms can execute inside a dedicated, trusted data center at higher performance because they are not limited by security features required in the shared execution environment. The evaluation of the framework showed that training of a machine learning model using real-world datasets achieved 0.96x native performance execution on the GPU and a speedup of up to 1560x compared to the state-of-the-art SGX-based system. Finally, I tackled the problem of software updates, which makes the operating system's integrity monitoring unreliable due to false positives, i.e., software updates move the updated system to an unknown (untrusted) state that is reported as an integrity violation. I solved this problem by introducing a proxy to a software repository that sanitizes software packages so that they can be safely installed. The sanitization consists of predicting and certifying the future (after the specific updates are installed) operating system's state. The evaluation of this approach showed that it supports 99.76% of the packages available in Alpine Linux main and community repositories. The framework proposed in this thesis is a step forward in verifying and enforcing that high-assurance security systems execute in an environment compliant with regulations. I anticipate that the framework might be further integrated with industry-standard security information and event management tools as well as other security monitoring mechanisms to provide a comprehensive solution hardening high-assurance security systems

    Automatic Input Rectification

    Get PDF
    We present a novel technique, automatic input rectification, and a prototype implementation called SOAP. SOAP learns a set of constraints characterizing typical inputs that an application is highly likely to process correctly. When given an atypical input that does not satisfy these constraints, SOAP automatically rectifies the input (i.e., changes the input so that is satisfies the learned constraints). The goal is to automatically convert potentially dangerous inputs into typical inputs that the program is highly likely to process correctly. Our experimental results show that, for a set of benchmark applications (namely, Google Picasa, ImageMagick, VLC, Swfdec, and Dillo), this approach effectively converts malicious inputs (which successfully exploit vulnerabilities in the application) into benign inputs that the application processes correctly. Moreover, a manual code analysis shows that, if an input does satisfy the learned constraints, it is incapable of exploiting these vulnerabilities. We also present the results of a user study designed to evaluate the subjective perceptual quality of outputs from benign but atypical inputs that have been automatically rectified by SOAP to conform to the learned constraints. Specifically, we obtained benign inputs that violate learned constraints, used our input rectifier to obtain rectified inputs, then paid Amazon Mechanical Turk users to provide their subjective qualitative perception of the difference between the outputs from the original and rectified inputs. The results indicate that rectification can often preserve much, and in many cases all, of the desirable data in the original input

    Applications in security and evasions in machine learning : a survey

    Get PDF
    In recent years, machine learning (ML) has become an important part to yield security and privacy in various applications. ML is used to address serious issues such as real-time attack detection, data leakage vulnerability assessments and many more. ML extensively supports the demanding requirements of the current scenario of security and privacy across a range of areas such as real-time decision-making, big data processing, reduced cycle time for learning, cost-efficiency and error-free processing. Therefore, in this paper, we review the state of the art approaches where ML is applicable more effectively to fulfill current real-world requirements in security. We examine different security applications' perspectives where ML models play an essential role and compare, with different possible dimensions, their accuracy results. By analyzing ML algorithms in security application it provides a blueprint for an interdisciplinary research area. Even with the use of current sophisticated technology and tools, attackers can evade the ML models by committing adversarial attacks. Therefore, requirements rise to assess the vulnerability in the ML models to cope up with the adversarial attacks at the time of development. Accordingly, as a supplement to this point, we also analyze the different types of adversarial attacks on the ML models. To give proper visualization of security properties, we have represented the threat model and defense strategies against adversarial attack methods. Moreover, we illustrate the adversarial attacks based on the attackers' knowledge about the model and addressed the point of the model at which possible attacks may be committed. Finally, we also investigate different types of properties of the adversarial attacks

    Privacy-preserving data outsourcing in the cloud via semantic data splitting

    Full text link
    Even though cloud computing provides many intrinsic benefits, privacy concerns related to the lack of control over the storage and management of the outsourced data still prevent many customers from migrating to the cloud. Several privacy-protection mechanisms based on a prior encryption of the data to be outsourced have been proposed. Data encryption offers robust security, but at the cost of hampering the efficiency of the service and limiting the functionalities that can be applied over the (encrypted) data stored on cloud premises. Because both efficiency and functionality are crucial advantages of cloud computing, in this paper we aim at retaining them by proposing a privacy-protection mechanism that relies on splitting (clear) data, and on the distributed storage offered by the increasingly popular notion of multi-clouds. We propose a semantically-grounded data splitting mechanism that is able to automatically detect pieces of data that may cause privacy risks and split them on local premises, so that each chunk does not incur in those risks; then, chunks of clear data are independently stored into the separate locations of a multi-cloud, so that external entities cannot have access to the whole confidential data. Because partial data are stored in clear on cloud premises, outsourced functionalities are seamlessly and efficiently supported by just broadcasting queries to the different cloud locations. To enforce a robust privacy notion, our proposal relies on a privacy model that offers a priori privacy guarantees; to ensure its feasibility, we have designed heuristic algorithms that minimize the number of cloud storage locations we need; to show its potential and generality, we have applied it to the least structured and most challenging data type: plain textual documents
    • …
    corecore