458,717 research outputs found
Built to Last or Built Too Fast? Evaluating Prediction Models for Build Times
Automated builds are integral to the Continuous Integration (CI) software
development practice. In CI, developers are encouraged to integrate early and
often. However, long build times can be an issue when integrations are
frequent. This research focuses on finding a balance between integrating often
and keeping developers productive. We propose and analyze models that can
predict the build time of a job. Such models can help developers to better
manage their time and tasks. Also, project managers can explore different
factors to determine the best setup for a build job that will keep the build
wait time to an acceptable level. Software organizations transitioning to CI
practices can use the predictive models to anticipate build times before CI is
implemented. The research community can modify our predictive models to further
understand the factors and relationships affecting build times.Comment: 4 paged version published in the Proceedings of the IEEE/ACM 14th
International Conference on Mining Software Repositories (MSR) Pages 487-490.
MSR 201
EnsembleSVM: A Library for Ensemble Learning Using Support Vector Machines
EnsembleSVM is a free software package containing efficient routines to
perform ensemble learning with support vector machine (SVM) base models. It
currently offers ensemble methods based on binary SVM models. Our
implementation avoids duplicate storage and evaluation of support vectors which
are shared between constituent models. Experimental results show that using
ensemble approaches can drastically reduce training complexity while
maintaining high predictive accuracy. The EnsembleSVM software package is
freely available online at http://esat.kuleuven.be/stadius/ensemblesvm.Comment: 5 pages, 1 tabl
Development and validation of computational models of cellular interaction
In this paper we take the view that computational models of biological systems should satisfy two conditions –
they should be able to predict function at a systems biology level, and robust techniques of validation against
biological models must be available. A modelling paradigm for developing a predictive computational model of
cellular interaction is described, and methods of providing robust validation against biological models are
explored, followed by a consideration of software issues
Search-Based Predictive Modelling for Software Engineering: How Far Have We Gone?
In this keynote I introduce the use of Predictive Analytics for Software Engineering (SE) and then focus on the use of search-based heuristics to tackle long-standing SE prediction problems including (but not limited to) software development effort estimation and software defect prediction. I review recent research in Search-Based Predictive Modelling for SE in order to assess the maturity of the field and point out promising research directions. I conclude my keynote by discussing best practices for a rigorous and realistic empirical evaluation of search-based predictive models, a condicio sine qua non to facilitate the adoption of prediction models in software industry practices.Predictive analytics Predictive modelling Search-based software engineering Machine learning Software analytic
A Hybrid Multi-Filter Wrapper Feature Selection Method for Software Defect Predictors
Software Defect Prediction (SDP) is an approach used for identifying defect-prone software modules or components. It helps software engineer to optimally, allocate limited resources to defective software modules or components in the testing or maintenance phases of software development life cycle (SDLC). Nonetheless, the predictive performance of SDP models reckons largely on the quality of dataset utilized for training the predictive models. The high dimensionality of software metric features has been noted as a data quality problem which negatively affects the predictive performance of SDP models. Feature Selection (FS) is a well-known method for solving high dimensionality problem and can be divided into filter-based and wrapper-based methods. Filter-based FS has low computational cost, but the predictive performance of its classification algorithm on the filtered data cannot be guaranteed. On the contrary, wrapper-based FS have good predictive performance but with high computational cost and lack of generalizability. Therefore, this study proposes a hybrid multi-filter wrapper method for feature selection of relevant and irredundant features in software defect prediction. The proposed hybrid feature selection will be developed to take advantage of filter-filter and filter-wrapper relationships to give optimal feature subsets, reduce its evaluation cycle and subsequently improve SDP models overall predictive performance in terms of Accuracy, Precision and Recall values
Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data
Each year, thousands of software vulnerabilities are discovered and reported
to the public. Unpatched known vulnerabilities are a significant security risk.
It is imperative that software vendors quickly provide patches once
vulnerabilities are known and users quickly install those patches as soon as
they are available. However, most vulnerabilities are never actually exploited.
Since writing, testing, and installing software patches can involve
considerable resources, it would be desirable to prioritize the remediation of
vulnerabilities that are likely to be exploited. Several published research
studies have reported moderate success in applying machine learning techniques
to the task of predicting whether a vulnerability will be exploited. These
approaches typically use features derived from vulnerability databases (such as
the summary text describing the vulnerability) or social media posts that
mention the vulnerability by name. However, these prior studies share multiple
methodological shortcomings that inflate predictive power of these approaches.
We replicate key portions of the prior work, compare their approaches, and show
how selection of training and test data critically affect the estimated
performance of predictive models. The results of this study point to important
methodological considerations that should be taken into account so that results
reflect real-world utility
- …