7 research outputs found

    Classifying Web Exploits with Topic Modeling

    Full text link
    This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.Comment: Proceedings of the 2017 28th International Workshop on Database and Expert Systems Applications (DEXA). http://ieeexplore.ieee.org/abstract/document/8049693

    An Efficient Information Extraction Mechanism with Page Ranking and a Classification Strategy based on Similarity Learning of Web Text Documents

    Get PDF
    Users have recently had more access to information thanks to the growth of the www information system. In these situations, search engines have developed into an essential tool for consumers to find information in a big space. The difficulty of handling this wealth of knowledge grows more difficult every day. Although search engines are crucial for information gathering, many of the results they offer are not required by the user because they are ranked according on user string matches. As a result, there were semantic disparities between the terms used in the user inquiry and the importance of catch phrases in the results. The problem of grouping relevant information into categories of related topics hasn't been solved. A Ranking Based Similarity Learning Approach and SVM based classification frame work of web text to estimate the semantic comparison between words to improve extraction of information is proposed in the work. The results of the experiment suggest improvisation in order to obtain better results by retrieving more relevant results

    A Bug Bounty Perspective on the Disclosure of Web Vulnerabilities

    Get PDF
    Bug bounties have become increasingly popular in recent years. This paper discusses bug bounties by framing these theoretically against so-called platform economy. Empirically the interest is on the disclosure of web vulnerabilities through the Open Bug Bounty (OBB) platform between 2015 and late 2017. According to the empirical results based on a dataset covering nearly 160 thousand web vulnerabilities, (i) OBB has been successful as a community-based platform for the dissemination of web vulnerabilities. The platform has also attracted many productive hackers, (ii) but there exists a large productivity gap, which likely relates to (iii) a knowledge gap and the use of automated tools for web vulnerability discovery. While the platform (iv) has been exceptionally fast to evaluate new vulnerability submissions, (v) the patching times of the web vulnerabilities disseminated have been long. With these empirical results and the accompanying theoretical discussion, the paper contributes to the small but rapidly growing amount of research on bug bounties. In addition, the paper makes a practical contribution by discussing the business models behind bug bounties from the viewpoints of platforms, ecosystems, and vulnerability markets.Comment: 17th Annual Workshop on the Economics of Information Security, Innsbruck, https://weis2018.econinfosec.org

    Essays on software vulnerability coordination

    Get PDF
    Software vulnerabilities are software bugs with security implications. Exposure to a security bug makes a software system behave in unexpected ways when the bug is exploited. As software vulnerabilities are thus a classical way to compromise a software system, these have long been coordinated in the global software industry in order to lessen the risks. This dissertation claims that the coordination occurs in a complex and open socio-technical system composed of decentralized software units and heterogeneous software agents, including not only software engineers but also other actors, from security specialists and software testers to attackers with malicious motives. Vulnerability disclosure is a classical example of the associated coordination; a security bug is made known to a software vendor by the discoverer of the bug, a third-party coordinator, or public media. The disclosure is then used to patch the bug. In addition to patching, the bug is typically archived to databases, cataloged and quantified for additional information, and communicated to users with a security advisory. Although commercial solutions have become increasingly important, the underlying coordination system is still governed by multiple stakeholders with vested interests. This governance has continued to result in different inefficiencies. Thus, this dissertation examines four themes: (i) disclosure of software vulnerabilities; (ii) coordination of these; (iii) evolution of these across time; and (iv) automation potential. The philosophical position is rooted in scientific realism and positivism, while regression analysis forms the kernel of the methodology. Based on these themes, the results indicate that (a) when vulnerability disclosure has worked, it has been relatively efficient; the obstacles have been social rather than technical in nature, originating from the diverging interests of the stakeholders who have different incentives. Furthermore, (b) the efficiency applies also to the coordination of different identifiers and classifications for the vulnerabilities disclosed. Longitudinally, (c) also the evolution of software vulnerabilities across time reflect distinct software and vulnerability life cycle models and the incentives underneath. Finally, (d) there is potential to improve the coordination efficiency through software automation

    A Case Study on Software Vulnerability Coordination

    Get PDF
    Context: Coordination is a fundamental tenet of software engineering. Coordination is required also for identifying discovered and disclosed software vulnerabilities with Common Vulnerabilities and Exposures (CVEs). Motivated by recent practical challenges, this paper examines the coordination of CVEs for open source projects through a public mailing list. Objective: The paper observes the historical time delays between the assignment of CVEs on a mailing list and the later appearance of these in the National Vulnerability Database (NVD). Drawing from research on software engineering coordination, software vulnerabilities, and bug tracking, the delays are modeled through three dimensions: social networks and communication practices, tracking infrastructures, and the technical characteristics of the CVEs coordinated. Method: Given a period between 2008 and 2016, a sample of over five thousand CVEs is used to model the delays with nearly fifty explanatory metrics. Regression analysis is used for the modeling. Results: The results show that the CVE coordination delays are affected by different abstractions for noise and prerequisite constraints. These abstractions convey effects from the social network and infrastructure dimensions. Particularly strong effect sizes are observed for annual and monthly control metrics, a control metric for weekends, the degrees of the nodes in the CVE coordination networks, and the number of references given in NVD for the CVEs archived. Smaller but visible effects are present for metrics measuring the entropy of the emails exchanged, traces to bug tracking systems, and other related aspects. The empirical signals are weaker for the technical characteristics. Conclusion: [...

    Towards an Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches

    Get PDF
    Software Vulnerabilities (SVs) can expose software systems to cyber-attacks, potentially causing enormous financial and reputational damage for organizations. There have been significant research efforts to detect these SVs so that developers can promptly fix them. However, fixing SVs is complex and time-consuming in practice, and thus developers usually do not have sufficient time and resources to fix all SVs at once. As a result, developers often need SV information, such as exploitability, impact, and overall severity, to prioritize fixing more critical SVs. Such information required for fixing planning and prioritization is typically provided in the SV assessment step of the SV lifecycle. Recently, data-driven methods have been increasingly proposed to automate SV assessment tasks. However, there are still numerous shortcomings with the existing studies on data-driven SV assessment that would hinder their application in practice. This PhD thesis aims to contribute to the growing literature in data-driven SV assessment by investigating and addressing the constant changes in SV data as well as the lacking considerations of source code and developers’ needs for SV assessment that impede the practical applicability of the field. Particularly, we have made the following five contributions in this thesis. (1) We systematize the knowledge of data-driven SV assessment to reveal the best practices of the field and the main challenges affecting its application in practice. Subsequently, we propose various solutions to tackle these challenges to better support the real-world applications of data-driven SV assessment. (2) We first demonstrate the existence of the concept drift (changing data) issue in descriptions of SV reports that current studies have mostly used for predicting the Common Vulnerability Scoring System (CVSS) metrics. We augment report-level SV assessment models with subwords of terms extracted from SV descriptions to help the models more effectively capture the semantics of ever-increasing SVs. (3) We also identify that SV reports are usually released after SV fixing. Thus, we propose using vulnerable code to enable earlier SV assessment without waiting for SV reports. We are the first to use Machine Learning techniques to predict CVSS metrics on the function level leveraging vulnerable statements directly causing SVs and their context in code functions. The performance of our function-level SV assessment models is promising, opening up research opportunities in this new direction. (4) To facilitate continuous integration of software code nowadays, we present a novel deep multi-task learning model, DeepCVA, to simultaneously and efficiently predict multiple CVSS assessment metrics on the commit level, specifically using vulnerability-contributing commits. DeepCVA is the first work that enables practitioners to perform SV assessment as soon as vulnerable changes are added to a codebase, supporting just-in-time prioritization of SV fixing. (5) Besides code artifacts produced from a software project of interest, SV assessment tasks can also benefit from SV crowdsourcing information on developer Question and Answer (Q&A) websites. We automatically retrieve large-scale security/SVrelated posts from these Q&A websites. We then apply a topic modeling technique on these posts to distill developers’ real-world SV concerns that can be used for data-driven SV assessment. Overall, we believe that this thesis has provided evidence-based knowledge and useful guidelines for researchers and practitioners to automate SV assessment using data-driven approaches.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
    corecore