18 research outputs found
Fault Localization Models in Debugging
Debugging is considered as a rigorous but important feature of software
engineering process. Since more than a decade, the software engineering
research community is exploring different techniques for removal of faults from
programs but it is quite difficult to overcome all the faults of software
programs. Thus, it is still remains as a real challenge for software debugging
and maintenance community. In this paper, we briefly introduced software
anomalies and faults classification and then explained different fault
localization models using theory of diagnosis. Furthermore, we compared and
contrasted between value based and dependencies based models in accordance with
different real misbehaviours and presented some insight information for the
debugging process. Moreover, we discussed the results of both models and
manifested the shortcomings as well as advantages of these models in terms of
debugging and maintenance.Comment: 58-6
The Progress, Challenges, and Perspectives of Directed Greybox Fuzzing
Most greybox fuzzing tools are coverage-guided as code coverage is strongly
correlated with bug coverage. However, since most covered codes may not contain
bugs, blindly extending code coverage is less efficient, especially for corner
cases. Unlike coverage-guided greybox fuzzers who extend code coverage in an
undirected manner, a directed greybox fuzzer spends most of its time allocation
on reaching specific targets (e.g., the bug-prone zone) without wasting
resources stressing unrelated parts. Thus, directed greybox fuzzing (DGF) is
particularly suitable for scenarios such as patch testing, bug reproduction,
and specialist bug hunting. This paper studies DGF from a broader view, which
takes into account not only the location-directed type that targets specific
code parts, but also the behaviour-directed type that aims to expose abnormal
program behaviours. Herein, the first in-depth study of DGF is made based on
the investigation of 32 state-of-the-art fuzzers (78% were published after
2019) that are closely related to DGF. A thorough assessment of the collected
tools is conducted so as to systemise recent progress in this field. Finally,
it summarises the challenges and provides perspectives for future research.Comment: 16 pages, 4 figure
Cybersecurity: Time Series Predictive Modeling of Vulnerabilities of Desktop Operating System Using Linear and Non-Linear Approach
Vulnerability forecasting models help us to predict the number of vulnerabilities that may occur in the future for a given Operating System (OS). There exist few models that focus on quantifying future vulnerabilities without consideration of trend, level, seasonality and non linear components of vulnerabilities. Unlike traditional ones, we propose a vulnerability analytic prediction model based on linear and non-linear approaches via time series analysis. We have developed the models based on Auto Regressive Moving Average (ARIMA), Artificial Neural Network (ANN), and Support Vector Machine (SVM) settings. The best model which provides the minimum error rate is selected for prediction of future vulnerabilities. Utilizing time series approach, this study has developed a predictive analytic model for three popular Desktop Operating Systems, namely, Windows 7, Mac OS X, and Linux Kernel by using their reported vulnerabilities on the National Vulnerability Database (NVD). Based on these reported vulnerabilities, we predict ahead their behavior so that the OS companies can make strategic and operational decisions like secure deployment of OS, facilitate backup provisioning, disaster recovery, diversity planning, maintenance scheduling, etc. Similarly, it also helps in assessing current security risks along with estimation of resources needed for handling potential security breaches and to foresee the future releases of security patches. The proposed non-linear analytic models produce very good prediction results in comparison to linear time series models
The Importance of Accounting for Real-World Labelling When Predicting Software Vulnerabilities
Previous work on vulnerability prediction assume that predictive models are trained with respect to perfect labelling information (includes labels from future, as yet undiscovered vulnerabilities). In this paper we present results from a comprehensive empirical study of 1,898 real-world vulnerabilities reported in 74 releases of three security-critical open source systems (Linux Kernel, OpenSSL and Wiresark). Our study investigates the effectiveness of three previously proposed vulnerability prediction approaches, in two settings: with and without the unrealistic labelling assumption. The results reveal that the unrealistic labelling assumption can profoundly mis- lead the scientific conclusions drawn; suggesting highly effective and deployable prediction results vanish when we fully account for realistically available labelling in the experimental methodology. More precisely, MCC mean values of predictive effectiveness drop from 0.77, 0.65 and 0.43 to 0.08, 0.22, 0.10 for Linux Kernel, OpenSSL and Wiresark, respectively. Similar results are also obtained for precision, recall and other assessments of predictive efficacy. The community therefore needs to upgrade experimental and empirical methodology for vulnerability prediction evaluation and development to ensure robust and actionable scientific findings
Revisiting Process versus Product Metrics: a Large Scale Analysis
Numerous methods can build predictive models from software data. However,
what methods and conclusions should we endorse as we move from analytics
in-the-small (dealing with a handful of projects) to analytics in-the-large
(dealing with hundreds of projects)?
To answer this question, we recheck prior small-scale results (about process
versus product metrics for defect prediction and the granularity of metrics)
using 722,471 commits from 700 Github projects. We find that some analytics
in-the-small conclusions still hold when scaling up to analytics in-the-large.
For example, like prior work, we see that process metrics are better predictors
for defects than product metrics (best process/product-based learners
respectively achieve recalls of 98\%/44\% and AUCs of 95\%/54\%, median
values).
That said, we warn that it is unwise to trust metric importance results from
analytics in-the-small studies since those change dramatically when moving to
analytics in-the-large. Also, when reasoning in-the-large about hundreds of
projects, it is better to use predictions from multiple models (since single
model predictions can become confused and exhibit a high variance).Comment: 36 pages, 12 figures and 5 table