18 research outputs found

    Fault Localization Models in Debugging

    Full text link
    Debugging is considered as a rigorous but important feature of software engineering process. Since more than a decade, the software engineering research community is exploring different techniques for removal of faults from programs but it is quite difficult to overcome all the faults of software programs. Thus, it is still remains as a real challenge for software debugging and maintenance community. In this paper, we briefly introduced software anomalies and faults classification and then explained different fault localization models using theory of diagnosis. Furthermore, we compared and contrasted between value based and dependencies based models in accordance with different real misbehaviours and presented some insight information for the debugging process. Moreover, we discussed the results of both models and manifested the shortcomings as well as advantages of these models in terms of debugging and maintenance.Comment: 58-6

    The Progress, Challenges, and Perspectives of Directed Greybox Fuzzing

    Full text link
    Most greybox fuzzing tools are coverage-guided as code coverage is strongly correlated with bug coverage. However, since most covered codes may not contain bugs, blindly extending code coverage is less efficient, especially for corner cases. Unlike coverage-guided greybox fuzzers who extend code coverage in an undirected manner, a directed greybox fuzzer spends most of its time allocation on reaching specific targets (e.g., the bug-prone zone) without wasting resources stressing unrelated parts. Thus, directed greybox fuzzing (DGF) is particularly suitable for scenarios such as patch testing, bug reproduction, and specialist bug hunting. This paper studies DGF from a broader view, which takes into account not only the location-directed type that targets specific code parts, but also the behaviour-directed type that aims to expose abnormal program behaviours. Herein, the first in-depth study of DGF is made based on the investigation of 32 state-of-the-art fuzzers (78% were published after 2019) that are closely related to DGF. A thorough assessment of the collected tools is conducted so as to systemise recent progress in this field. Finally, it summarises the challenges and provides perspectives for future research.Comment: 16 pages, 4 figure

    Cybersecurity: Time Series Predictive Modeling of Vulnerabilities of Desktop Operating System Using Linear and Non-Linear Approach

    Get PDF
    Vulnerability forecasting models help us to predict the number of vulnerabilities that may occur in the future for a given Operating System (OS). There exist few models that focus on quantifying future vulnerabilities without consideration of trend, level, seasonality and non linear components of vulnerabilities. Unlike traditional ones, we propose a vulnerability analytic prediction model based on linear and non-linear approaches via time series analysis. We have developed the models based on Auto Regressive Moving Average (ARIMA), Artificial Neural Network (ANN), and Support Vector Machine (SVM) settings. The best model which provides the minimum error rate is selected for prediction of future vulnerabilities. Utilizing time series approach, this study has developed a predictive analytic model for three popular Desktop Operating Systems, namely, Windows 7, Mac OS X, and Linux Kernel by using their reported vulnerabilities on the National Vulnerability Database (NVD). Based on these reported vulnerabilities, we predict ahead their behavior so that the OS companies can make strategic and operational decisions like secure deployment of OS, facilitate backup provisioning, disaster recovery, diversity planning, maintenance scheduling, etc. Similarly, it also helps in assessing current security risks along with estimation of resources needed for handling potential security breaches and to foresee the future releases of security patches. The proposed non-linear analytic models produce very good prediction results in comparison to linear time series models

    The Importance of Accounting for Real-World Labelling When Predicting Software Vulnerabilities

    Get PDF
    Previous work on vulnerability prediction assume that predictive models are trained with respect to perfect labelling information (includes labels from future, as yet undiscovered vulnerabilities). In this paper we present results from a comprehensive empirical study of 1,898 real-world vulnerabilities reported in 74 releases of three security-critical open source systems (Linux Kernel, OpenSSL and Wiresark). Our study investigates the effectiveness of three previously proposed vulnerability prediction approaches, in two settings: with and without the unrealistic labelling assumption. The results reveal that the unrealistic labelling assumption can profoundly mis- lead the scientific conclusions drawn; suggesting highly effective and deployable prediction results vanish when we fully account for realistically available labelling in the experimental methodology. More precisely, MCC mean values of predictive effectiveness drop from 0.77, 0.65 and 0.43 to 0.08, 0.22, 0.10 for Linux Kernel, OpenSSL and Wiresark, respectively. Similar results are also obtained for precision, recall and other assessments of predictive efficacy. The community therefore needs to upgrade experimental and empirical methodology for vulnerability prediction evaluation and development to ensure robust and actionable scientific findings

    Revisiting Process versus Product Metrics: a Large Scale Analysis

    Full text link
    Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98\%/44\% and AUCs of 95\%/54\%, median values). That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).Comment: 36 pages, 12 figures and 5 table
    corecore