10 research outputs found
Classifying Web Exploits with Topic Modeling
This short empirical paper investigates how well topic modeling and database
meta-data characteristics can classify web and other proof-of-concept (PoC)
exploits for publicly disclosed software vulnerabilities. By using a dataset
comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is
obtained in the empirical experiment. Text mining and topic modeling are a
significant boost factor behind this classification performance. In addition to
these empirical results, the paper contributes to the research tradition of
enhancing software vulnerability information with text mining, providing also a
few scholarly observations about the potential for semi-automatic
classification of exploits in the existing tracking infrastructures.Comment: Proceedings of the 2017 28th International Workshop on Database and
Expert Systems Applications (DEXA).
http://ieeexplore.ieee.org/abstract/document/8049693
Vulnerable Open Source Dependencies: Counting Those That Matter
BACKGROUND: Vulnerable dependencies are a known problem in today's
open-source software ecosystems because OSS libraries are highly interconnected
and developers do not always update their dependencies. AIMS: In this paper we
aim to present a precise methodology, that combines the code-based analysis of
patches with information on build, test, update dates, and group extracted from
the very code repository, and therefore, caters to the needs of industrial
practice for correct allocation of development and audit resources. METHOD: To
understand the industrial impact of the proposed methodology, we considered the
200 most popular OSS Java libraries used by SAP in its own software. Our
analysis included 10905 distinct GAVs (group, artifact, version) when
considering all the library versions. RESULTS: We found that about 20% of the
dependencies affected by a known vulnerability are not deployed, and therefore,
they do not represent a danger to the analyzed library because they cannot be
exploited in practice. Developers of the analyzed libraries are able to fix
(and actually responsible for) 82% of the deployed vulnerable dependencies. The
vast majority (81%) of vulnerable dependencies may be fixed by simply updating
to a new version, while 1% of the vulnerable dependencies in our sample are
halted, and therefore, potentially require a costly mitigation strategy.
CONCLUSIONS: Our case study shows that the correct counting allows software
development companies to receive actionable information about their library
dependencies, and therefore, correctly allocate costly development and audit
resources, which is spent inefficiently in case of distorted measurements.Comment: This is a pre-print of the paper that appears, with the same title,
in the proceedings of the 12th International Symposium on Empirical Software
Engineering and Measurement, 201
Recommended from our members
Vulnerability Prediction Capability: A Comparison between Vulnerability Discovery Models and Neural Network Models
In this paper, we introduce an approach for predicting the cumulative number of software vulnerabilities that is in most cases more accurate than vulnerability discovery models (VDMs). Our approach uses a neural network model (NNM) to model the nonlinearities associated with vulnerability disclosure. Nine common VDMs were used to compare their prediction capability with our approach. The different models were applied to vulnerabilities associated with eight well-known software (four operating systems and four web browsers). The models were assessed in terms of prediction accuracy and prediction bias. Out of eight software we analyzed, the NNM outperformed the VDMs in all the cases in terms of prediction accuracy, and provided smaller values of absolute average bias in seven cases. This study shows that NNMs are promising for accurate predictions of software vulnerabilities disclosures
Recommended from our members
Predicting the Discovery Pattern of Publically Known Exploited Vulnerabilities
Vulnerabilities with publically known exploits typically form 2-7% of all vulnerabilities reported for a given software version. With a smaller number of known exploited vulnerabilities compared with the total number of vulnerabilities, it is more difficult to model and predict when a vulnerability with a known exploit will be reported. In this paper, we introduce an approach for predicting the discovery pattern of publically known exploited vulnerabilities using all publically known vulnerabilities reported for a given software. Eight commonly used vulnerability discovery models (VDMs) and one neural network model (NNM) were utilized to evaluate the prediction capability of our approach. We compared their predictions results with the scenario when only exploited vulnerabilities were used for prediction. Our results show that, in terms of prediction accuracy, out of eight software we analyzed, our approach led to more accurate results in seven cases. Only in one case, the accuracy of our approach was worse by 1.6%
Recommended from our members
Method and Technology for Ensuring the Software Security by Identifying and Classifying the Failures and Vulnerabilities
The conducted literature review on known methods and technologies for providing the software security and for identifying the failures and vulnerabilities of software showed that, although the analyzed methods and technologies have great potential for the field of software engineering, none of the known solutions are intended for identification and classification of software failures and vulnerabilities. Therefore, it is necessary to develop a method for ensuring the software security by identifying and classifying the failures and vulnerabilities, as well as to design and implement a technology for ensuring the software security by identifying and classifying the failures and vulnerabilities, which is the goal of this study. The developed in this paper method for ensuring the software security by identifying and classifying the failures and vulnerabilities provides a conclusion as to whether a failure occurred, and if a failure occurred, its type is issued to the user. In addition, the developed method for ensuring the software security by identifying and classifying the failures and vulnerabilities provides a conclusion as to whether a feature is a vulnerability, and if the feature is a vulnerability, its type is issued to the user. The paper also develops a technology for ensuring the software security by identifying and classifying the failures and vulnerabilities, which provides a conclusion on the presence or absence of software failure(s); conclusion on the presence or absence of software vulnerability(s); conclusion about the type of failure and the type of vulnerability in case of their presence, thanks to which the proposed technology is useful for software users due to the identification and classification of failures and vulnerabilities
Enhancing Trust –A Unified Meta-Model for Software Security Vulnerability Analysis
Over the last decade, a globalization of the software industry has taken place which has facilitated the sharing and reuse of code across existing project boundaries. At the same time, such global reuse also introduces new challenges to the Software Engineering community, with not only code implementation being shared across systems but also any vulnerabilities it is exposed to as well. Hence, vulnerabilities found in APIs no longer affect only individual projects but instead might spread across projects and even global software ecosystem borders. Tracing such vulnerabilities on a global scale becomes an inherently difficult task, with many of the resources required for the analysis not only growing at unprecedented rates but also being spread across heterogeneous resources. Software developers are struggling to identify and locate the required data to take full advantage of these resources. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources.
This dissertation introduces four major contributions to address these challenges: (1) It provides a literature review of the use of software vulnerabilities databases (SVDBs) in the Software Engineering community. (2) Based on findings from this literature review, we present SEVONT, a Semantic Web based modeling approach to support a formal and semi-automated approach for unifying vulnerability information resources. SEVONT introduces a multi-layer knowledge model which not only provides a unified knowledge representation, but also captures software vulnerability information at different abstract levels to allow for seamless integration, analysis, and reuse of the modeled knowledge. The modeling approach takes advantage of Formal Concept Analysis (FCA) to guide knowledge engineers in identifying reusable knowledge concepts and modeling them. (3) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which is an instantiation of the SEVONT knowledge model to support evidence-based vulnerability detection. The framework integrates vulnerability ontologies (and data) with existing Software Engineering ontologies allowing for the use of Semantic Web reasoning services to trace and assess the impact of security vulnerabilities across project boundaries.
Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that the presented knowledge modeling approach cannot only unify heterogeneous vulnerability data sources but also enables new types of vulnerability analysis
Some Guidelines for Risk Assessment of Vulnerability Discovery Processes
Software vulnerabilities can be defined as software faults, which can be exploited as results of security attacks. Security researchers have used data from vulnerability databases to study trends of discovery of new vulnerabilities or propose models for fitting the discovery times and for predicting when new vulnerabilities may be discovered. Estimating the discovery times for new vulnerabilities is useful both for vendors as well as the end-users as it can help with resource allocation strategies over time.
Among the research conducted on vulnerability modeling, only a few studies have tried to provide a guideline about which model should be used in a given situation. In other words, assuming the vulnerability data for a software is given, the research questions are the following: Is there any feature in the vulnerability data that could be used for identifying the most appropriate models for that dataset? What models are more accurate for vulnerability discovery process modeling? Can the total number of publicly-known exploited vulnerabilities be predicted using all vulnerabilities reported for a given software?
To answer these questions, we propose to characterize the vulnerability discovery process using several common software reliability/vulnerability discovery models, also known as Software Reliability Models (SRMs)/Vulnerability Discovery Models (VDMs). We plan to consider different aspects of vulnerability modeling including curve fitting and prediction.
Some existing SRMs/VDMs lack accuracy in the prediction phase. To remedy the situation, three strategies are considered: (1) Finding a new approach for analyzing vulnerability data using common models. In other words, we examine the effect of data manipulation techniques (i.e. clustering, grouping) on vulnerability data, and investigate whether it leads to more accurate predictions. (2) Developing a new model that has better curve filling and prediction capabilities than current models. (3) Developing a new method to predict the total number of publicly-known exploited vulnerabilities using all vulnerabilities reported for a given software.
The dissertation is intended to contribute to the science of software reliability analysis and presents some guidelines for vulnerability risk assessment that could be integrated as part of security tools, such as Security Information and Event Management (SIEM) systems
An automatic method for assessing the versions affected by a vulnerability
Vulnerability data sources are used by academics to build models, and by industry and government to assess compliance. Errors in such data sources therefore not only are threats to validity in scientific studies, but also might cause organizations, which rely on retro versions of software, to lose compliance. In this work, we propose an automated method to determine the code evidence for the presence of vulnerabilities in retro software versions. The method scans the code base of each retro version of software for the code evidence to determine whether a retro version is vulnerable or not. It identifies the lines of code that were changed to fix vulnerabilities. If an earlier version contains these deleted lines, it is highly likely that this version is vulnerable. To show the scalability of the method we performed a large scale experiments on Chrome and Firefox (spanning 7,236 vulnerable files and approximately 9,800 vulnerabilities) on the National Vulnerability Database (NVD). The elimination of spurious vulnerability claims (e.g. entries to a vulnerability database such as NVD) found by our method may change the conclusions of studies on the prevalence of foundational vulnerabilities
Dependency Management 2.0 – A Semantic Web Enabled Approach
Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts.
The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems.
Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis