69 research outputs found

    GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences

    Full text link
    GitHub is a popular data repository for code examples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API call sequences in GitHub. We also analyze the suitability of GitHub data to train a learning-based model to generate correct cryptographic API call sequences. For this, we manually extracted and analyzed the call sequences from GitHub. Using this data, we augmented an existing learning-based model called DeepAPI to create two security-specific models that generate cryptographic API call sequences for a given natural language (NL) description. Our results indicate that it is imperative to not neglect the misuses in API call sequences while using data sources like GitHub, to train models that generate code.Comment: Accepted at QRS 202

    Dealing with Variability in API Misuse Specification

    Get PDF
    APIs are the primary mechanism for developers to gain access to externally defined services and tools. However, previous research has revealed API misuses that violate the contract of APIs to be prevalent. Such misuses can have harmful consequences, especially in the context of cryptographic libraries. Various API-misuse detectors have been proposed to address this issue - including CogniCrypt, one of the most versatile of such detectors and that uses a language (CrySL) to specify cryptographic API usage contracts. Nonetheless, existing approaches to detect API misuse had not been designed for systematic reuse, ignoring the fact that different versions of a library, different versions of a platform, and different recommendations/guidelines might introduce variability in the correct usage of an API. Yet, little is known about how such variability impacts the specification of the correct API usage. This paper investigates this question by analyzing the impact of various sources of variability on widely used Java cryptographic libraries (including JCA/JCE, Bouncy Castle, and Google Tink). The results of our investigation show that sources of variability like new versions of the API and security standards significantly impact the specifications. We then use the insights gained from our investigation to motivate an extension to the CrySL language (named MetaCrySL), which builds on meta-programming concepts. We evaluate MetaCrySL by specifying usage rules for a family of Android versions and illustrate that MetaCrySL can model all forms of variability we identified and drastically reduce the size of a family of specifications for the correct usage of cryptographic APIs

    A First Look at Android Applications in Google Play related to Covid-19

    Get PDF
    Due to the convenience of access-on-demand to information and business solutions, mobile apps have become an important asset in the digital world. In the context of the Covid-19 pandemic, app developers have joined the response effort in various ways by releasing apps that target different user bases (e.g., all citizens or journalists), offer different services (e.g., location tracking or diagnostic-aid), provide generic or specialized information, etc. While many apps have raised some concerns by spreading misinformation or even malware, the literature does not yet provide a clear landscape of the different apps that were developed. In this study, we focus on the Android ecosystem and investigate Covid-related Android apps. In a best-effort scenario, we attempt to systematically identify all relevant apps and study their characteristics with the objective to provide a First taxonomy of Covid-related apps, broadening the relevance beyond the implementation of contact tracing. Overall, our study yields a number of empirical insights that contribute to enlarge the knowledge on Covid-related apps: (1) Developer communities contributed rapidly to the Covid-19, with dedicated apps released as early as January 2020; (2) Covid-related apps deliver digital tools to users (e.g., health diaries), serve to broadcast information to users (e.g., spread statistics), and collect data from users (e.g., for tracing); (3) Covid-related apps are less complex than standard apps; (4) they generally do not seem to leak sensitive data; (5) in the majority of cases, Covid-related apps are released by entities with past experience on the market, mostly official government entities or public health organizations.Comment: Accepted in Empirical Software Engineering under reference: EMSE-D-20-00211R

    A Reevaluation Of Why Crypto-Detectors Fail: A Systematic Revaluation Of Cryptographic Misuse Detection Techniques

    Get PDF
    The correct use of cryptography is central to ensuring data security in modern software systems. Hence, several academic and commercial static analysis tools have been developed for detecting and mitigating crypto-API misuse. While developers are optimistically adopting these crypto-API misuse detectors (or crypto-detectors) in their software development cycles, this momentum must be accompanied by a rigorous understanding of their effectiveness at finding crypto-API misuse in practice. The original paper presents the MASC framework, which enables a systematic and data-driven evaluation of crypto-detectors using mutation testing. MASC was grounded in a comprehensive view of the problem space by developing a data-driven taxonomy of existing crypto-API misuse, containing 105 misuse cases organized among nine semantic clusters. 12 generalizable usage based mutation operators were developed and three mutation scopes that can expressively instantiate thousands of compilable variants of the misuse cases for thoroughly evaluating crypto-detectors. Using MASC, nine major crypto-detectors were evaluated and 19 unique, undocumented flaws that severely impact the ability of crypto-detectors to discover misuses in practice were found. For my thesis, I built upon this previous research and greatly expanded the MASC framework. MASC was expanded in all areas by adding new functionality, new operators, new misuses, and expanding the taxonomy. In addition, I reevaluated the most up-to-date versions of the original 9 crypto-detectors and evaluated 5 additional crypto-detectors. On top of this I also doubled the amount of applications I used to evaluate the tools. To analyze crypto-detectors I looked at both the 9 crypto-detectors evaluated in the original work and 5 new crypto-detectors. For the original crypto-detectors, I evaluated them with the updated MASC against their most up-to-date versions to determine if old flaws that have previously been fixed have a tendency to reappear. Both old and new crypto-detectors were evaluated with mutated Android and Java applications that were used in the original MASC paper, 15 newly mutated Android and Java applications, and minimal examples of cryptographic misuse

    RVSec : Runtime verification methods for high precision detection of cryptography API misuse

    Get PDF
    Dissertação (Mestrado em Informática) — Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, Brasília, 2022.O uso incorreto de APIs de criptografia pode causar vulnerabilidades em software. Portanto, recentemente, foram propostas ferramentas baseadas em análise estática para detecção de mau uso. Estes detectores encontram diversos maus usos, mas diferem em suas capacidades e limitações, alem de não detectarem alguns bugs. Neste trabalho, investigamos verificação em tempo de execução (RV, de Runtime Verification em Inglês) - como uma alternativa baseada em análise dinâmica para detectar mau uso de crypto APIs. RV monitora execução de programas em relação a especificações formais, e há evidência da eficiência e eficácia do seu uso na deteccção de bugs em software. Neste estudo empírico sobre a eficácia e eficiência da aplicação de análise dinâmica para detectar tais maus usos, nós propomos um um protótipo baseado em JavaMOP - uma implementação de Monitoring-Oriented Programming para Java - para realizar Verificação em Runtime (RV) de 22 classes da Java Cryptography Architecture (JCA). Desenvolvemos nossas especificações através da tradução manual de 22 especificações presentes em CrySL - o estado da arte em detecção estática - para nosso contexto MOP. Após conduzir um estudo comparativo de RVsec com o estado da arte em análise estática, nos calculamos métricas de acurácia - precision, recall e F-measure - bem como custos de execução e correlação com cobertura de testes. Nossos resultados suportam RV como um complemento efetivo para analisadores estáticos, uma vez que os métodos aqui propostos apresentam precisão comparável, quando não superior, sem incorrer em custos de execução proibitivos.Incorrect usage of cryptographic (crypto) APIs can cause software security vulnerabilities, but developers often find it difficult to reason about those APIs. To automate the detection of misuse, static-analysis based crypto API tools have been proposed. These detectors find many misuses, but they differ in strengths and weaknesses, and miss bugs. We investigate runtime verification (RV) as a dynamic-analysis based alternative for crypto API misuse detection. RV monitors program runs against formal specifications and was shown to be effective and efficient for amplifying the bug-finding ability of software tests. In this empirical study on the efficacy and efficiency of applying dynamic analysis to detect such misuses, we propose a prototype based on JavaMOP - a Java implementation of Monitoring-Oriented Programming - to perform Runtime Verification of 22 classes of the Java Cryptography Architecture (JCA). We developed our specifications by manually translating 22 specifications from CrySL - a state-of-the-art static detector - into our MOP context. Upon conducting a comparative assessment of the methods using three benchmarks provided in the literature, we evaluated accuracy metrics - precision, recall and F-measure - as well as runtime overhead cost and correlation to coverage. Our results support RV as an effective complement to static analysers, as the methods herein proposed presented competitive, when not superior, accuracy metrics, without incurring in prohibitive runtime overhead

    A Systematic Approach to Benchmark and Improve Automated Static Detection of Java-API Misuses

    Get PDF
    Today's software industry relies heavily on the reuse of existing software libraries. Such libraries provide the building blocks for modern software products. Reusing them allow developers to focus on innovation, while standing on the shoulders of giants. To use libraries effectively, developers need to know the Application Programming Interfaces (APIs) through which they communicate with the libraries. This includes both the APIs' semantics and the (implicit) usage constraints that come with them. In the face of the rapidly growing and evolving supply of software libraries, this has become a challenging task. As a result, incorrect usages of APIs, or API misuses, are a prevalent cause of software bugs, crashes, and vulnerabilities. In reaction to this problem, researchers have proposed a multitude of developer-assistance tools. One particular class of such tools automates the detection of API misuses in software code. We call these tools API-misuse detectors. Existing misuse detectors address different aspects of API misuse. However, no attempt has been made to systematically define the problem space of API misuse and to assess the prevalence of API misuses compared to other types of bugs. This makes it impossible to judge the relevance of research on API-misuse detection. Moreover, previous empirical evaluations of misuse detectors commonly measure the detectors' precision. However, since the studies use different datasets, it is unclear to which extend the results are comparable. It is also unclear where the detectors make trade-offs between their precision and their recall. In this thesis, we first present a systematic analysis of the problem of API misuse. We find that API misuse causes 9.1% of all software bugs in real-world projects, including many critical issues, such as program crashes, data loss, and security vulnerabilities. Then, we survey the literature to consolidate over a decade of research on API-misuse detection and build MUBench, a public automated benchmark for API-misuse detectors. This enables us to conduct the first-ever qualitative and quantitative comparison of existing misuse detectors. We find that these detectors have the potential to discover many API misuses, but suffer from extremely low precision and recall in practice. Finally, we systematically design MUDetect, a new API-misuse detector that addresses many of the problems of existing detectors. Using MUBench, we demonstrate that MUDetect clearly outperforms existing detectors with respect to both precision and recall. Our results provide strong evidence that, following our systematic approach, we can develop API-misuse detectors that are fit for practical application

    On the security of NoSQL cloud database services

    Get PDF
    Processing a vast volume of data generated by web, mobile and Internet-enabled devices, necessitates a scalable and flexible data management system. Database-as-a-Service (DBaaS) is a new cloud computing paradigm, promising a cost-effective and scalable, fully-managed database functionality meeting the requirements of online data processing. Although DBaaS offers many benefits it also introduces new threats and vulnerabilities. While many traditional data processing threats remain, DBaaS introduces new challenges such as confidentiality violation and information leakage in the presence of privileged malicious insiders and adds new dimension to the data security. We address the problem of building a secure DBaaS for a public cloud infrastructure where, the Cloud Service Provider (CSP) is not completely trusted by the data owner. We present a high level description of several architectures combining modern cryptographic primitives for achieving this goal. A novel searchable security scheme is proposed to leverage secure query processing in presence of a malicious cloud insider without disclosing sensitive information. A holistic database security scheme comprised of data confidentiality and information leakage prevention is proposed in this dissertation. The main contributions of our work are: (i) A searchable security scheme for non-relational databases of the cloud DBaaS; (ii) Leakage minimization in the untrusted cloud. The analysis of experiments that employ a set of established cryptographic techniques to protect databases and minimize information leakage, proves that the performance of the proposed solution is bounded by communication cost rather than by the cryptographic computational effort

    CONTACTLESS FINGERPRINT BIOMETRICS: ACQUISITION, PROCESSING, AND PRIVACY PROTECTION

    Get PDF
    Biometrics is defined by the International Organization for Standardization (ISO) as \u201cthe automated recognition of individuals based on their behavioral and biological characteristics\u201d Examples of distinctive features evaluated by biometrics, called biometric traits, are behavioral characteristics like the signature, gait, voice, and keystroke, and biological characteristics like the fingerprint, face, iris, retina, hand geometry, palmprint, ear, and DNA. The biometric recognition is the process that permits to establish the identity of a person, and can be performed in two modalities: verification, and identification. The verification modality evaluates if the identity declared by an individual corresponds to the acquired biometric data. Differently, in the identification modality, the recognition application has to determine a person's identity by comparing the acquired biometric data with the information related to a set of individuals. Compared with traditional techniques used to establish the identity of a person, biometrics offers a greater confidence level that the authenticated individual is not impersonated by someone else. Traditional techniques, in fact, are based on surrogate representations of the identity, like tokens, smart cards, and passwords, which can easily be stolen or copied with respect to biometric traits. This characteristic permitted a wide diffusion of biometrics in different scenarios, like physical access control, government applications, forensic applications, logical access control to data, networks, and services. Most of the biometric applications, also called biometric systems, require performing the acquisition process in a highly controlled and cooperative manner. In order to obtain good quality biometric samples, the acquisition procedures of these systems need that the users perform deliberate actions, assume determinate poses, and stay still for a time period. Limitations regarding the applicative scenarios can also be present, for example the necessity of specific light and environmental conditions. Examples of biometric technologies that traditionally require constrained acquisitions are based on the face, iris, fingerprint, and hand characteristics. Traditional face recognition systems need that the users take a neutral pose, and stay still for a time period. Moreover, the acquisitions are based on a frontal camera and performed in controlled light conditions. Iris acquisitions are usually performed at a distance of less than 30 cm from the camera, and require that the user assume a defined pose and stay still watching the camera. Moreover they use near infrared illumination techniques, which can be perceived as dangerous for the health. Fingerprint recognition systems and systems based on the hand characteristics require that the users touch the sensor surface applying a proper and uniform pressure. The contact with the sensor is often perceived as unhygienic and/or associated to a police procedure. This kind of constrained acquisition techniques can drastically reduce the usability and social acceptance of biometric technologies, therefore decreasing the number of possible applicative contexts in which biometric systems could be used. In traditional fingerprint recognition systems, the usability and user acceptance are not the only negative aspects of the used acquisition procedures since the contact of the finger with the sensor platen introduces a security lack due to the release of a latent fingerprint on the touched surface, the presence of dirt on the surface of the finger can reduce the accuracy of the recognition process, and different pressures applied to the sensor platen can introduce non-linear distortions and low-contrast regions in the captured samples. Other crucial aspects that influence the social acceptance of biometric systems are associated to the privacy and the risks related to misuses of biometric information acquired, stored and transmitted by the systems. One of the most important perceived risks is related to the fact that the persons consider the acquisition of biometric traits as an exact permanent filing of their activities and behaviors, and the idea that the biometric systems can guarantee recognition accuracy equal to 100\% is very common. Other perceived risks consist in the use of the collected biometric data for malicious purposes, for tracing all the activities of the individuals, or for operating proscription lists. In order to increase the usability and the social acceptance of biometric systems, researchers are studying less-constrained biometric recognition techniques based on different biometric traits, for example, face recognition systems in surveillance applications, iris recognition techniques based on images captured at a great distance and on the move, and contactless technologies based on the fingerprint and hand characteristics. Other recent studies aim to reduce the real and perceived privacy risks, and consequently increase the social acceptance of biometric technologies. In this context, many studies regard methods that perform the identity comparison in the encrypted domain in order to prevent possible thefts and misuses of biometric data. The objective of this thesis is to research approaches able to increase the usability and social acceptance of biometric systems by performing less-constrained and highly accurate biometric recognitions in a privacy compliant manner. In particular, approaches designed for high security contexts are studied in order improve the existing technologies adopted in border controls, investigative, and governmental applications. Approaches based on low cost hardware configurations are also researched with the aim of increasing the number of possible applicative scenarios of biometric systems. The privacy compliancy is considered as a crucial aspect in all the studied applications. Fingerprint is specifically considered in this thesis, since this biometric trait is characterized by high distinctivity and durability, is the most diffused trait in the literature, and is adopted in a wide range of applicative contexts. The studied contactless biometric systems are based on one or more CCD cameras, can use two-dimensional or three-dimensional samples, and include privacy protection methods. The main goal of these systems is to perform accurate and privacy compliant recognitions in less-constrained applicative contexts with respect to traditional fingerprint biometric systems. Other important goals are the use of a wider fingerprint area with respect to traditional techniques, compatibility with the existing databases, usability, social acceptance, and scalability. The main contribution of this thesis consists in the realization of novel biometric systems based on contactless fingerprint acquisitions. In particular, different techniques for every step of the recognition process based on two-dimensional and three-dimensional samples have been researched. Novel techniques for the privacy protection of fingerprint data have also been designed. The studied approaches are multidisciplinary since their design and realization involved optical acquisition systems, multiple view geometry, image processing, pattern recognition, computational intelligence, statistics, and cryptography. The implemented biometric systems and algorithms have been applied to different biometric datasets describing a heterogeneous set of applicative scenarios. Results proved the feasibility of the studied approaches. In particular, the realized contactless biometric systems have been compared with traditional fingerprint recognition systems, obtaining positive results in terms of accuracy, usability, user acceptability, scalability, and security. Moreover, the developed techniques for the privacy protection of fingerprint biometric systems showed satisfactory performances in terms of security, accuracy, speed, and memory usage
    corecore