10 research outputs found

    Classifying Web Exploits with Topic Modeling

    Full text link
    This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.Comment: Proceedings of the 2017 28th International Workshop on Database and Expert Systems Applications (DEXA). http://ieeexplore.ieee.org/abstract/document/8049693

    Vulnerable Open Source Dependencies: Counting Those That Matter

    Full text link
    BACKGROUND: Vulnerable dependencies are a known problem in today's open-source software ecosystems because OSS libraries are highly interconnected and developers do not always update their dependencies. AIMS: In this paper we aim to present a precise methodology, that combines the code-based analysis of patches with information on build, test, update dates, and group extracted from the very code repository, and therefore, caters to the needs of industrial practice for correct allocation of development and audit resources. METHOD: To understand the industrial impact of the proposed methodology, we considered the 200 most popular OSS Java libraries used by SAP in its own software. Our analysis included 10905 distinct GAVs (group, artifact, version) when considering all the library versions. RESULTS: We found that about 20% of the dependencies affected by a known vulnerability are not deployed, and therefore, they do not represent a danger to the analyzed library because they cannot be exploited in practice. Developers of the analyzed libraries are able to fix (and actually responsible for) 82% of the deployed vulnerable dependencies. The vast majority (81%) of vulnerable dependencies may be fixed by simply updating to a new version, while 1% of the vulnerable dependencies in our sample are halted, and therefore, potentially require a costly mitigation strategy. CONCLUSIONS: Our case study shows that the correct counting allows software development companies to receive actionable information about their library dependencies, and therefore, correctly allocate costly development and audit resources, which is spent inefficiently in case of distorted measurements.Comment: This is a pre-print of the paper that appears, with the same title, in the proceedings of the 12th International Symposium on Empirical Software Engineering and Measurement, 201

    Enhancing Trust –A Unified Meta-Model for Software Security Vulnerability Analysis

    Get PDF
    Over the last decade, a globalization of the software industry has taken place which has facilitated the sharing and reuse of code across existing project boundaries. At the same time, such global reuse also introduces new challenges to the Software Engineering community, with not only code implementation being shared across systems but also any vulnerabilities it is exposed to as well. Hence, vulnerabilities found in APIs no longer affect only individual projects but instead might spread across projects and even global software ecosystem borders. Tracing such vulnerabilities on a global scale becomes an inherently difficult task, with many of the resources required for the analysis not only growing at unprecedented rates but also being spread across heterogeneous resources. Software developers are struggling to identify and locate the required data to take full advantage of these resources. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation introduces four major contributions to address these challenges: (1) It provides a literature review of the use of software vulnerabilities databases (SVDBs) in the Software Engineering community. (2) Based on findings from this literature review, we present SEVONT, a Semantic Web based modeling approach to support a formal and semi-automated approach for unifying vulnerability information resources. SEVONT introduces a multi-layer knowledge model which not only provides a unified knowledge representation, but also captures software vulnerability information at different abstract levels to allow for seamless integration, analysis, and reuse of the modeled knowledge. The modeling approach takes advantage of Formal Concept Analysis (FCA) to guide knowledge engineers in identifying reusable knowledge concepts and modeling them. (3) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which is an instantiation of the SEVONT knowledge model to support evidence-based vulnerability detection. The framework integrates vulnerability ontologies (and data) with existing Software Engineering ontologies allowing for the use of Semantic Web reasoning services to trace and assess the impact of security vulnerabilities across project boundaries. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that the presented knowledge modeling approach cannot only unify heterogeneous vulnerability data sources but also enables new types of vulnerability analysis

    Some Guidelines for Risk Assessment of Vulnerability Discovery Processes

    Get PDF
    Software vulnerabilities can be defined as software faults, which can be exploited as results of security attacks. Security researchers have used data from vulnerability databases to study trends of discovery of new vulnerabilities or propose models for fitting the discovery times and for predicting when new vulnerabilities may be discovered. Estimating the discovery times for new vulnerabilities is useful both for vendors as well as the end-users as it can help with resource allocation strategies over time. Among the research conducted on vulnerability modeling, only a few studies have tried to provide a guideline about which model should be used in a given situation. In other words, assuming the vulnerability data for a software is given, the research questions are the following: Is there any feature in the vulnerability data that could be used for identifying the most appropriate models for that dataset? What models are more accurate for vulnerability discovery process modeling? Can the total number of publicly-known exploited vulnerabilities be predicted using all vulnerabilities reported for a given software? To answer these questions, we propose to characterize the vulnerability discovery process using several common software reliability/vulnerability discovery models, also known as Software Reliability Models (SRMs)/Vulnerability Discovery Models (VDMs). We plan to consider different aspects of vulnerability modeling including curve fitting and prediction. Some existing SRMs/VDMs lack accuracy in the prediction phase. To remedy the situation, three strategies are considered: (1) Finding a new approach for analyzing vulnerability data using common models. In other words, we examine the effect of data manipulation techniques (i.e. clustering, grouping) on vulnerability data, and investigate whether it leads to more accurate predictions. (2) Developing a new model that has better curve filling and prediction capabilities than current models. (3) Developing a new method to predict the total number of publicly-known exploited vulnerabilities using all vulnerabilities reported for a given software. The dissertation is intended to contribute to the science of software reliability analysis and presents some guidelines for vulnerability risk assessment that could be integrated as part of security tools, such as Security Information and Event Management (SIEM) systems

    An automatic method for assessing the versions affected by a vulnerability

    No full text
    Vulnerability data sources are used by academics to build models, and by industry and government to assess compliance. Errors in such data sources therefore not only are threats to validity in scientific studies, but also might cause organizations, which rely on retro versions of software, to lose compliance. In this work, we propose an automated method to determine the code evidence for the presence of vulnerabilities in retro software versions. The method scans the code base of each retro version of software for the code evidence to determine whether a retro version is vulnerable or not. It identifies the lines of code that were changed to fix vulnerabilities. If an earlier version contains these deleted lines, it is highly likely that this version is vulnerable. To show the scalability of the method we performed a large scale experiments on Chrome and Firefox (spanning 7,236 vulnerable files and approximately 9,800 vulnerabilities) on the National Vulnerability Database (NVD). The elimination of spurious vulnerability claims (e.g. entries to a vulnerability database such as NVD) found by our method may change the conclusions of studies on the prevalence of foundational vulnerabilities

    Dependency Management 2.0 – A Semantic Web Enabled Approach

    Get PDF
    Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis
    corecore