700 research outputs found
Toward Data-Driven Discovery of Software Vulnerabilities
Over the years, Software Engineering, as a discipline, has recognized the potential for engineers to make mistakes and has incorporated processes to prevent such mistakes from becoming exploitable vulnerabilities. These processes span the spectrum from using unit/integration/fuzz testing, static/dynamic/hybrid analysis, and (automatic) patching to discover instances of vulnerabilities to leveraging data mining and machine learning to collect metrics that characterize attributes indicative of vulnerabilities. Among these processes, metrics have the potential to uncover systemic problems in the product, process, or people that could lead to vulnerabilities being introduced, rather than identifying specific instances of vulnerabilities. The insights from metrics can be used to support developers and managers in making decisions to improve the product, process, and/or people with the goal of engineering secure software.
Despite empirical evidence of metrics\u27 association with historical software vulnerabilities, their adoption in the software development industry has been limited. The level of granularity at which the metrics are defined, the high false positive rate from models that use the metrics as explanatory variables, and, more importantly, the difficulty in deriving actionable intelligence from the metrics are often cited as factors that inhibit metrics\u27 adoption in practice. Our research vision is to assist software engineers in building secure software by providing a technique that generates scientific, interpretable, and actionable feedback on security as the software evolves. In this dissertation, we present our approach toward achieving this vision through (1) systematization of vulnerability discovery metrics literature, (2) unsupervised generation of metrics-informed security feedback, and (3) continuous developer-in-the-loop improvement of the feedback.
We systematically reviewed the literature to enumerate metrics that have been proposed and/or evaluated to be indicative of vulnerabilities in software and to identify the validation criteria used to assess the decision-informing ability of these metrics. In addition to enumerating the metrics, we implemented a subset of these metrics as containerized microservices. We collected the metric values from six large open-source projects and assessed metrics\u27 generalizability across projects, application domains, and programming languages. We then used an unsupervised approach from literature to compute threshold values for each metric and assessed the thresholds\u27 ability to classify risk from historical vulnerabilities. We used the metrics\u27 values, thresholds, and interpretation to provide developers natural language feedback on security as they contributed changes and used a survey to assess their perception of the feedback. We initiated an open dialogue to gain an insight into their expectations from such feedback. In response to developer comments, we assessed the effectiveness of an existing vulnerability discovery approachâstatic analysisâand that of vulnerability discovery metrics in identifying risk from vulnerability contributing commits
Studying JavaScript Security Through Static Analysis
Mit dem stetigen Wachstum des Internets wĂ€chst auch das Interesse von Angreifern. UrsprĂŒnglich sollte das Internet Menschen verbinden; gleichzeitig benutzen aber Angreifer diese Vernetzung, um Schadprogramme wirksam zu verbreiten. Insbesondere JavaScript ist zu einem beliebten Angriffsvektor geworden, da es Angreifer ermöglicht Bugs und weitere SicherheitslĂŒcken auszunutzen, und somit die Sicherheit und PrivatsphĂ€re der Internetnutzern zu gefĂ€hrden. In dieser Dissertation fokussieren wir uns auf die Erkennung solcher Bedrohungen, indem wir JavaScript Code statisch und effizient analysieren. ZunĂ€chst beschreiben wir unsere zwei Detektoren, welche Methoden des maschinellen Lernens mit statischen Features aus Syntax, Kontroll- und DatenflĂŒssen kombinieren zur Erkennung bösartiger JavaScript Dateien. Wir evaluieren daraufhin die VerlĂ€sslichkeit solcher statischen Systeme, indem wir bösartige JavaScript Dokumente umschreiben, damit sie die syntaktische Struktur von bestehenden gutartigen Skripten reproduzieren. Zuletzt studieren wir die Sicherheit von Browser Extensions. Zu diesem Zweck modellieren wir Extensions mit einem Graph, welcher Kontroll-, Daten-, und NachrichtenflĂŒsse mit Pointer Analysen kombiniert, wodurch wir externe FlĂŒsse aus und zu kritischen Extension-Funktionen erkennen können. Insgesamt wiesen wir 184 verwundbare Chrome Extensions nach, welche die Angreifer ausnutzen könnten, um beispielsweise beliebigen Code im Browser eines Opfers auszufĂŒhren.As the Internet keeps on growing, so does the interest of malicious actors. While the Internet has become widespread and popular to interconnect billions of people, this interconnectivity also simplifies the spread of malicious software. Specifically, JavaScript has become a popular attack vector, as it enables to stealthily exploit bugs and further vulnerabilities to compromise the security and privacy of Internet users. In this thesis, we approach these issues by proposing several systems to statically analyze real-world JavaScript code at scale. First, we focus on the detection of malicious JavaScript samples. To this end, we propose two learning-based pipelines, which leverage syntactic, control and data-flow based features to distinguish benign from malicious inputs. Subsequently, we evaluate the robustness of such static malicious JavaScript detectors in an adversarial setting. For this purpose, we introduce a generic camouflage attack, which consists in rewriting malicious samples to reproduce existing benign syntactic structures. Finally, we consider vulnerable browser extensions. In particular, we abstract an extension source code at a semantic level, including control, data, and message flows, and pointer analysis, to detect suspicious data flows from and toward an extension privileged context. Overall, we report on 184 Chrome extensions that attackers could exploit to, e.g., execute arbitrary code in a victim's browser
Automated vulnerability detection in source code
Technological advances have facilitated instant global connectivity, transforming the way we interact with the world. Software, propelled by this evolution, plays a pivotal role in our daily lives, being present in virtually every facet of our existence. Programmers, who form the bedrock of the business structure, create source code comprising hundreds or even thousands of lines, encompassing essential functionalities for software to operate seamlessly. However, owing to the inherent complexity of these functionalities and their interdependencies, it is common for errors to escape notice in the code, inadvertently reaching the software production phase and resulting in code vulnerabilities Each year, the number ofidentified software vulnerabilities, either publicly disclosed or discovered internally, increases. These vulnerabilities pose a significant risk of exploitation, potentially leading to data breaches or service interruptions. Therefore, the goal of this project is to develop a tool capable of analyzing code written in C and C++ to detect vulnerabilities before the code is deployed to end users. To achieve this goal, we leveraged existing work in this area by using a dataset of open-source functions written in C and C++. This dataset contains approximately 1.27 million functions categorized into five different Common Weakness Enumerations (CWEs). Preprocessing was performed to optimize the performance of the models used. The models were trained on function snippets only, without considering any external context of the code, thus simplifying the problem and increasing processing efficiency. The results obtained are promising, with the trained models showing high performance in identifying and classifying vulnerabilities. In addition, these results can serve as a benchmark for direct comparisons between different approaches.O avanço tecnolĂłgico permitiu uma conexĂŁo global instantĂąnea, transformando a maneira como interagimos com o mundo. Os softwares, impulsionados por essa evolução, desempenham um papel crucial em nosso cotidiano, estando presentes em praticamente todos os aspectos de nossas vidas. Os programadores, fundamentais na estrutura empresarial, desenvolvem o cĂłdigo-fonte composto por centenas ou atĂ© milhares de linhas, incorporando as funcionalidades essenciais para o pleno funcionamento dos softwares. No entanto, devido Ă complexidade intrĂnseca dessas funcionalidades e suas interdependĂȘncias, Ă© comum que erros passem despercebidos no cĂłdigo, chegando inadvertidamente Ă fase de produção do software e resultando em vulnerabilidades de cĂłdigo. Anualmente, observa-se um aumento no nĂșmero de vulnerabilidades de software que sĂŁo identificadas e divulgadas publicamente ou descobertas internamente. Essas vulnerabilidades representam um sĂ©rio risco e podem resultar em fuga de informaçÔes ou interrupção de serviços. Assim, este projeto visa desenvolver uma ferramenta capaz de analisar o cĂłdigo escrito em C e C++ para identificar vulnerabilidades antes que esse cĂłdigo chegue ao consumidor final. Para alcançar esse objetivo, utilizamos como ponto de partida diversos trabalhos jĂĄ realizados nessa ĂĄrea, fazendo uso de um conjunto de dados contendo funçÔes de cĂłdigo aberto escritas em C e C++. Esse conjunto de dados engloba cerca de 1.27 milhĂ”es de funçÔes categorizadas por cinco diferentes Common Weakness Enumerations (CWEs). Realizamos um prĂ©-processamento para otimizar o desempenho dos modelos utilizados. Os modelos foram treinados apenas em trechos de funçÔes, sem considerar qualquer contexto externo sobre o cĂłdigo, simplificando assim o problema e melhorando a eficiĂȘncia do processamento. Os resultados obtidos sĂŁo promissores, pois os modelos treinados foram capazes de identificar e classificar as vulnerabilidades com alto desempenho, estes resultados podem tambĂ©m servir como base para comparação direta entre diferentes abordagens
Recommended from our members
Remedying Security Concerns at an Internet Scale
The state of security across the Internet is poor, and it has been so since the advent of the modern Internet. While the research community has made tremendous progress over the years in learning how to design and build secure computer systems, network protocols, and algorithms, we are far from a world where we can truly trust the security of deployed Internet systems. In reality, we may never reach such a world. Security concerns continue to be identified at scale through-out the software ecosystem, with thousands of vulnerabilities discovered each year. Meanwhile, attacks have become ever more frequent and consequential.As Internet systems will continue to be inevitably affected by newly found security concerns, the research community must develop more effective ways to remedy these issues. To that end, in this dissertation, we conduct extensive empirical measurements to understand how remediation occurs in practice for Internet systems, and explore methods for spurring improved remediation behavior. This dissertation provides a treatment of the complete remediation life cycle, investigating the creation, dissemination, and deployment of remedies. We start by focusing on security patches that address vulnerabilities, and analyze at scale their creation process, characteristics of the resulting fixes, and how these impact vulnerability remediation. We then investigate and systematize how administrators of Internet systems deploy software updates which patch vulnerabilities across the many machines they manage on behalf of organizations. Finally, we conduct the first systematic exploration of Internet-scale outreach efforts to disseminate information about security concerns and their remedies to system administrators, with an aim of driving their remediation decisions. Our results show that such outreach campaigns can effectively galvanize positive reactions.Improving remediation, particularly at scale, is challenging, as the problem space exhibits many dimensions beyond traditional computer technical considerations, including human, social, organizational, economic, and policy facets. To make meaningful progress, this work uses a diversity of empirical methods, from software data mining to user studies to Internet-wide network measurements, to systematically collect and evaluate large-scale datasets. Ultimately, this dissertation establishes broad empirical grounding on security remediation in practice today, as well as new approaches for improved remediation at an Internet scale
A Human-Centric Approach to Software Vulnerability Discovery
Software security bugs | referred to as vulnerabilities | persist as an important and costly challenge. Significant effort has been exerted toward automatic vulnerability discovery, but human intelligence generally remains required and will remain necessary for the foreseeable future. Therefore, many companies have turned to internal and external (e.g., penetration testing, bug bounties) security experts to manually analyze their code for vulnerabilities. Unfortunately, there are a limited number of qualified experts. Therefore, to improve software security, we must understand how experts search for vulnerabilities and how their processes could be made more efficient, by improving tool usability and targeting the most common vulnerabilities. Additionally, we seek to understand how to improve training to increase the number of experts.
To answer these questions, I begin with an in-depth qualitative analysis of secure development competition submissions to identify common vulnerabilities developers introduce. I found developers struggle to understand and implement complex security concepts, not recognizing how nuanced development decisions could lead to vulnerabilities. Next, using a cognitive task analysis to investigate experts' and non-experts' vulnerability discovery processes, I observed they use the same process, but dier in the variety of security experiences which inform their searches. Together, these results suggest exposure to an in-depth understanding of potential vulnerabilities as essential for vulnerability discovery.
As a first step to leverage both experts and non-experts, I pursued two lines of work: education to support experience development and vulnerability discovery automation interaction improvements. To improve vulnerability discovery tool interaction, I conducted observational interviews of experts' reverse engineering process, an essential and time-consuming component of vulnerability discovery. From this, I provide guidelines for more usable interaction design. For security education, I began with a pedagogical review of security exercises to identify their current strengths and weaknesses. I also developed a psychometric measure for secure software development self-efficacy to support comparisons between educational interventions
Securing the Next Generation Web
With the ever-increasing digitalization of society, the need for secure systems is growing. While some security features, like HTTPS, are popular, securing web applications, and the clients we use to interact with them remains difficult.To secure web applications we focus on both the client-side and server-side. For the client-side, mainly web browsers, we analyze how new security features might solve a problem but introduce new ones. We show this by performing a systematic analysis of the new Content Security Policy (CSP)\ua0 directive navigate-to. In our research, we find that it does introduce new vulnerabilities, to which we recommend countermeasures. We also create AutoNav, a tool capable of automatically suggesting navigation policies for this directive. Finding server-side vulnerabilities in a black-box setting where\ua0 there is no access to the source code is challenging. To improve this, we develop novel black-box methods for automatically finding vulnerabilities. We\ua0 accomplish this by identifying key challenges in web scanning and combining the best of previous methods. Additionally, we leverage SMT solvers to\ua0 further improve the coverage and vulnerability detection rate of scanners.In addition to browsers, browser extensions also play an important role in the web ecosystem. These small programs, e.g. AdBlockers and password\ua0 managers, have powerful APIs and access to sensitive user data like browsing history. By systematically analyzing the extension ecosystem we find new\ua0 static and dynamic methods for detecting both malicious and vulnerable extensions. In addition, we develop a method for detecting malicious extensions\ua0 solely based on the meta-data of downloads over time. We analyze new attack vectors introduced by Googleâs new vehicle OS, Android Automotive. This\ua0 is based on Android with the addition of vehicle APIs. Our analysis results in new attacks pertaining to safety, privacy, and availability. Furthermore, we\ua0 create AutoTame, which is designed to analyze third-party apps for vehicles for the vulnerabilities we found
Security considerations in the open source software ecosystem
Open source software plays an important role in the software supply chain, allowing stakeholders to
utilize open source components as building blocks in their software, tooling, and infrastructure. But
relying on the open source ecosystem introduces unique challenges, both in terms of security and trust,
as well as in terms of supply chain reliability.
In this dissertation, I investigate approaches, considerations, and encountered challenges of stakeholders in the context of security, privacy, and trustworthiness of the open source software supply
chain. Overall, my research aims to empower and support software experts with the knowledge and
resources necessary to achieve a more secure and trustworthy open source software ecosystem. In the
first part of this dissertation, I describe a research study investigating the security and trust practices
in open source projects by interviewing 27 owners, maintainers, and contributors from a diverse set
of projects to explore their behind-the-scenes processes, guidance and policies, incident handling, and
encountered challenges, finding that participantsâ projects are highly diverse in terms of their deployed
security measures and trust processes, as well as their underlying motivations. More on the consumer
side of the open source software supply chain, I investigated the use of open source components in
industry projects by interviewing 25 software developers, architects, and engineers to understand their
projectsâ processes, decisions, and considerations in the context of external open source code, finding
that open source components play an important role in many of the industry projects, and that most
projects have some form of company policy or best practice for including external code. On the side of
end-user focused software, I present a study investigating the use of software obfuscation in Android
applications, which is a recommended practice to protect against plagiarism and repackaging. The
study leveraged a multi-pronged approach including a large-scale measurement, a developer survey, and
a programming experiment, finding that only 24.92% of apps are obfuscated by their developer, that
developers do not fear theft of their own apps, and have difficulties obfuscating their own apps. Lastly,
to involve end users themselves, I describe a survey with 200 users of cloud office suites to investigate
their security and privacy perceptions and expectations, with findings suggesting that users are generally
aware of basic security implications, but lack technical knowledge for envisioning some threat models.
The key findings of this dissertation include that open source projects have highly diverse security
measures, trust processes, and underlying motivations. That the projectsâ security and trust needs are
likely best met in ways that consider their individual strengths, limitations, and project stage, especially
for smaller projects with limited access to resources. That open source components play an important
role in industry projects, and that those projects often have some form of company policy or best
practice for including external code, but developers wish for more resources to better audit included
components.
This dissertation emphasizes the importance of collaboration and shared responsibility in building and maintaining the open source software ecosystem, with developers, maintainers, end users,
researchers, and other stakeholders alike ensuring that the ecosystem remains a secure, trustworthy, and
healthy resource for everyone to rely on
Production of Innovations within FarmerâResearcher Associations Applying Transdisciplinary Research Principles
Small-scale farmers in sub-Saharan West Africa depend heavily on local resources and local knowledge. Science-based knowledge is likely to aid decision-making in complex situations. In this presentation, we highlight a FiBL-coordinated research partnership between three national producer organisations and national agriculture research bodies in Mali, Burkina Faso, and Benin. The partnership seeks to compare conventional, GMObased, and organic cotton systems as regards food security and climate change
Towards an Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches
Software Vulnerabilities (SVs) can expose software systems to cyber-attacks, potentially causing enormous financial and reputational damage for organizations. There have been significant research efforts to detect these SVs so that developers can promptly fix them. However, fixing SVs is complex and time-consuming in practice, and thus developers usually do not have sufficient time and resources to fix all SVs at once. As a result, developers often need SV information, such as exploitability, impact, and overall severity, to prioritize fixing more critical SVs. Such information required for fixing planning and prioritization is typically provided in the SV assessment step of the SV lifecycle. Recently, data-driven methods have been increasingly proposed to automate SV assessment tasks. However, there are still numerous shortcomings with the existing studies on data-driven SV assessment that would hinder their application in practice. This PhD thesis aims to contribute to the growing literature in data-driven SV assessment by investigating and addressing the constant changes in SV data as well as the lacking considerations of source code and developersâ needs for SV assessment that impede the practical applicability of the field. Particularly, we have made the following five contributions in this thesis. (1) We systematize the knowledge of data-driven SV assessment to reveal the best practices of the field and the main challenges affecting its application in practice. Subsequently, we propose various solutions to tackle these challenges to better support the real-world applications of data-driven SV assessment. (2) We first demonstrate the existence of the concept drift (changing data) issue in descriptions of SV reports that current studies have mostly used for predicting the Common Vulnerability Scoring System (CVSS) metrics. We augment report-level SV assessment models with subwords of terms extracted from SV descriptions to help the models more effectively capture the semantics of ever-increasing SVs. (3) We also identify that SV reports are usually released after SV fixing. Thus, we propose using vulnerable code to enable earlier SV assessment without waiting for SV reports. We are the first to use Machine Learning techniques to predict CVSS metrics on the function level leveraging vulnerable statements directly causing SVs and their context in code functions. The performance of our function-level SV assessment models is promising, opening up research opportunities in this new direction. (4) To facilitate continuous integration of software code nowadays, we present a novel deep multi-task learning model, DeepCVA, to simultaneously and efficiently predict multiple CVSS assessment metrics on the commit level, specifically using vulnerability-contributing commits. DeepCVA is the first work that enables practitioners to perform SV assessment as soon as vulnerable changes are added to a codebase, supporting just-in-time prioritization of SV fixing. (5) Besides code artifacts produced from a software project of interest, SV assessment tasks can also benefit from SV crowdsourcing information on developer Question and Answer (Q&A) websites. We automatically retrieve large-scale security/SVrelated posts from these Q&A websites. We then apply a topic modeling technique on these posts to distill developersâ real-world SV concerns that can be used for data-driven SV assessment. Overall, we believe that this thesis has provided evidence-based knowledge and useful guidelines for researchers and practitioners to automate SV assessment using data-driven approaches.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
- âŠ