Search CORE

8,396 research outputs found

Recommended from our members

A systematic review of software development cost estimation studies

Author: Jørgensen M
Shepperd MJ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

This paper aims to provide a basis for the improvement of software estimation research through a systematic review of previous work. The review identifies 304 software cost estimation papers in 76 journals and classifies the papers according to research topic, estimation approach, research approach, study context and data set. A web-based library of these cost estimation papers is provided to ease the identification of relevant estimation research results. The review results combined with other knowledge provide support for recommendations for future software cost estimation research, including: 1) Increase the breadth of the search for relevant studies, 2) Search manually for relevant papers within a carefully selected set of journals when completeness is essential, 3) Conduct more studies on estimation methods commonly used by the software industry, and, 4) Increase the awareness of how properties of the data sets impact the results when evaluating estimation methods

Brunel University Research Archive

Data quality: Some comments on the NASA software defect datasets

Author: Mair C
Shepperd M
Song Q
Sun Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2013
Field of study

Background-Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective-This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method-We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results-We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions-It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners

Crossref

UAL Research Online

Brunel University Research Archive

Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

Author: Brown André EX
Ch'ng Quee-Lim
Currie Michael
Grundy Laura J
Hokanson Jim
Javer Avelino
Kerr Rex
Lee Chee Wai
Li Chris
Li Kezhi
Schafer William R
Yemini Eviatar
Publication venue
Publication date: 20/02/2018
Field of study

We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

arXiv.org e-Print Archive

ZENODO

FigShare

Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

Author: Bennin Kwabena Ebo
Chiha I.
Ghotra Baljinder
Menzies Tim
Omran M.
Pedregosa Fabian
Refaeilzadeh Payam
Tan Ming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/02/2018
Field of study

arXiv.org e-Print Archive

Crossref

Automatic test cases generation from software specifications modules

Author: Alhroob Aysh
Dahal Keshav
Hossain Alamgir
Publication venue: Wroclaw University of Technology
Publication date: 01/01/2010
Field of study

A new technique is proposed in this paper to extend the Integrated Classification Tree Methodology (ICTM) developed by Chen et al. [13] This software assists testers to construct test cases from functional specifications. A Unified Modelling Language (UML) class diagram and Object Constraint Language (OCL) are used in this paper to represent the software specifications. Each classification and associated class in the software specification is represented by classes and attributes in the class diagram. Software specification relationships are represented by associated and hierarchical relationships in the class diagram. To ensure that relationships are consistent, an automatic methodology is proposed to capture and control the class relationships in a systematic way. This can help to reduce duplication and illegitimate test cases, which improves the testing efficiency and minimises the time and cost of the testing. The methodology introduced in this paper extracts only the legitimate test cases, by removing the duplicate test cases and those incomputable with the software specifications. Large amounts of time would have been needed to execute all of the test cases; therefore, a methodology was proposed which aimed to select a best testing path. This path guarantees the highest coverage of system units and avoids using all generated test cases. This path reduces the time and cost of the testing

CiteSeerX

Northumbria Research Link

Adaptive multiscale methods for 3D streamer discharges in air

Author: Marskar Robert
Publication venue: 'IOP Publishing'
Publication date: 01/01/2019
Field of study

We discuss spatially and temporally adaptive implicit-explicit (IMEX) methods for parallel simulations of three-dimensional fluid streamer discharges in atmospheric air. We examine strategies for advancing the fluid equations and elliptic transport equations (e.g. Poisson) with different time steps, synchronizing them on a global physical time scale which is taken to be proportional to the dielectric relaxation time. The use of a longer time step for the electric field leads to numerical errors that can be diagnosed, and we quantify the conditions where this simplification is valid. Likewise, using a three-term Helmholtz model for radiative transport, the same error diagnostics show that the radiative transport equations do not need to be resolved on time scales finer than the dielectric relaxation time. Elliptic equations are bottlenecks for most streamer simulation codes, and the results presented here potentially provide computational savings. Finally, a computational example of 3D branching streamers in a needle-plane geometry that uses up to 700 million grid cells is presented.Comment: 17 pages, 5 figure

arXiv.org e-Print Archive

SINTEF Open

NORA - Norwegian Open Research Archives