Search CORE

13,852 research outputs found

Making Software Cost Data Available for Meta-Analysis

Author: Mair Carolyn
Shepperd Martin
Publication venue
Publication date: 01/01/2005
Field of study

In this paper we consider the increasing need for meta-analysis within empirical software engineering. However, we also note that a necessary precondition to such forms of analysis is to have both the results in an appropriate format and sufficient contextual information to avoid misleading inferences. We consider the implications in the field of software project effort estimation and show that for a sample of 12 seemingly similar published studies, the results are difficult to compare let alone combine. This is due to different reporting conventions. We argue that a protocol is required and make some suggestions as to what it should contain

Applying Absolute Residuals as Evaluation Criterion for Estimating the Development Time of Software Projects by Means of a Neuro-Fuzzy Approach

Author: Garcia-Virgen Juan
García-Díaz Noel
Muñoz Lilia
Verduzo-Ramirez Alberto
Publication venue: Journal of Information Systems Engineering & Management
Publication date: 11/04/2016
Field of study

In the software development field, software practitioners expend between 30% and 40% more effort than is predicted. Accordingly, researchers have proposed new models for estimating the development effort such that the estimations of these models are close to actual ones. In this study, an application based on a new neuro-fuzzy system (NFS) is analyzed. The NFS accuracy was compared to that of a statistical multiple linear regression (MLR) model. The criterion for evaluating the accuracy of estimation models has mainly been the Magnitude of Relative Error (MRE), however, it was recently found that MRE is asymmetric, and the use of Absolute Residuals (AR) has been proposed, therefore, in this study, the accuracy results of the NFS and MLR were based on AR. After a statistical paired t-test was performed, results showed that accuracy of the New-NFS is statistically better than that of the MLR at the 99% confidence level. It can be concluded that a new-NFS could be used for predicting the effort of software development projects when they have been individually developed on a disciplined process.In the software development field, software practitioners expend between 30% and 40% more effort than is predicted. Accordingly, researchers have proposed new models for estimating the development effort such that the estimations of these models are close to actual ones. In this study, an application based on a new neuro-fuzzy system (NFS) is analyzed. The NFS accuracy was compared to that of a statistical multiple linear regression (MLR) model. The criterion for evaluating the accuracy of estimation models has mainly been the Magnitude of Relative Error (MRE), however, it was recently found that MRE is asymmetric, and the use of Absolute Residuals (AR) has been proposed, therefore, in this study, the accuracy results of the NFS and MLR were based on AR. After a statistical paired t-test was performed, results showed that accuracy of the New-NFS is statistically better than that of the MLR at the 99% confidence level. It can be concluded that a new-NFS could be used for predicting the effort of software development projects when they have been individually developed on a disciplined process

Directory of Open Access Journals

Repositorio Institucional de la Universidad Tecnológica de Panamá

On multi-view learning with additive models

Author: Culp Mark
Johnson Kjell
Michailidis George
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

In many scientific settings data can be naturally partitioned into variable groupings called views. Common examples include environmental (1st view) and genetic information (2nd view) in ecological applications, chemical (1st view) and biological (2nd view) data in drug discovery. Multi-view data also occur in text analysis and proteomics applications where one view consists of a graph with observations as the vertices and a weighted measure of pairwise similarity between observations as the edges. Further, in several of these applications the observations can be partitioned into two sets, one where the response is observed (labeled) and the other where the response is not (unlabeled). The problem for simultaneously addressing viewed data and incorporating unlabeled observations in training is referred to as multi-view transductive learning. In this work we introduce and study a comprehensive generalized fixed point additive modeling framework for multi-view transductive learning, where any view is represented by a linear smoother. The problem of view selection is discussed using a generalized Akaike Information Criterion, which provides an approach for testing the contribution of each view. An efficient implementation is provided for fitting these models with both backfitting and local-scoring type algorithms adjusted to semi-supervised graph-based learning. The proposed technique is assessed on both synthetic and real data sets and is shown to be competitive to state-of-the-art co-training and graph-based techniques.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS202 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Research Repository @ WVU (West Virginia University)

Modelling evapotranspiration using the modified Penman-Monteith equation and MODIS data over the Albany Thicket in South Africa

Author: Gibson Lesley
Gwate O.
Mantel Sukhmani K.
Palmer Anthony R.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 25/10/2016
Field of study

ResearchOnline@GCU

Effort Estimation For Object-oriented System Using Stochastic Gradient Boosting Technique

Author: Acharya Barada Prasanna
Publication venue
Publication date: 12/05/2014
Field of study

The success of software development depends on the proper prediction of the effort required to develop the software. Project managers oblige a solid methodology for software effort prediction. It is particularly paramount throughout the early stages of the software development life cycle. Faultless software effort estimation is a major concern in software commercial enterprises. Stochastic Gradient Boosting (SGB) is a machine learning techniques that helps in getting improved estimated values. SGB is used for improving the accuracy of estimation models using decision trees. In this paper, the basic aim is the effort prediction required to develop various software projects using both the class point and the use case point approach. Then, optimization of the effort parameters is achieved using the SGB technique to obtain better prediction accuracy. Furthermore, performance comparisons of the models obtained using the SGB technique with the other machine learning techniques are presented in order to highlight the performance achieved by each method

ethesis@nitr

Evaluating subset selection methods for use case points estimation

Author: Prokopová Zdenka
Šilhavý Petr
Šilhavý Radek
Publication venue: Elsevier Science BV
Publication date: 23/04/2018
Field of study

When the Use Case Points method is used for software effort estimation, users are faced with low model accuracy which impacts on its practical application. This study investigates the significance of using subset selection methods for the prediction accuracy of Multiple Linear Regression models, obtained by the stepwise approach. K-means, Spectral Clustering, the Gaussian Mixture Model and Moving Window are evaluated as appropriate subset selection techniques. The methods were evaluated according to several evaluation criteria and then statistically tested. Evaluation was performing on two independent datasets-which differ in project types and size. Both were cut by the hold-out method. If clustering were used, the training sets were clustered into 3 classes; and, for each of class, an independent regression model was created. These were later used for the prediction of testing sets. If Moving Window was used, then window of sizes 5, 10 and 15 were tested. The results show that clustering techniques decrease prediction errors significantly when compared to Use Case Points or moving windows methods. Spectral Clustering was selected as the best-performing solution, because it achieves a Sum of Squared Errors reduction of 32% for the first dataset, and 98% for the second dataset. The Mean Absolute Percentage Error is less than 1% for the second dataset for Spectral Clustering; 9% for moving window; and 27% for Use Case Points. When the first dataset is used, then prediction errors are significantly higher -53% for Spectral Clustering, but Use Case Points produces a 165% result. It can be concluded that this study proves subset selection techniques as a significant method for improving the prediction ability of linear regression models - which are used for software development effort prediction. It can also be concluded that the clustering method performs better than the moving window method

Institutional repository of Tomas Bata University Library

Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review

Author: Fernández Diego Marta
González-Ladrón-de-Guevara Fernando
Publication venue: 'Elsevier BV'
Publication date: 01/06/2014
Field of study

Context The International Software Benchmarking Standards Group (ISBSG) maintains a software development repository with over 6000 software projects. This dataset makes it possible to estimate a project s size, effort, duration, and cost. Objective The aim of this study was to determine how and to what extent, ISBSG has been used by researchers from 2000, when the first papers were published, until June of 2012. Method A systematic mapping review was used as the research method, which was applied to over 129 papers obtained after the filtering process. Results The papers were published in 19 journals and 40 conferences. Thirty-five percent of the papers published between years 2000 and 2011 have received at least one citation in journals and only five papers have received six or more citations. Effort variable is the focus of 70.5% of the papers, 22.5% center their research in a variable different from effort and 7% do not consider any target variable. Additionally, in as many as 70.5% of papers, effort estimation is the research topic, followed by dataset properties (36.4%). The more frequent methods are Regression (61.2%), Machine Learning (35.7%), and Estimation by Analogy (22.5%). ISBSG is used as the only support in 55% of the papers while the remaining papers use complementary datasets. The ISBSG release 10 is used most frequently with 32 references. Finally, some benefits and drawbacks of the usage of ISBSG have been highlighted. Conclusion This work presents a snapshot of the existing usage of ISBSG in software development research. ISBSG offers a wealth of information regarding practices from a wide range of organizations, applications, and development types, which constitutes its main potential. However, a data preparation process is required before any analysis. Lastly, the potential of ISBSG to develop new research is also outlined.Fernández Diego, M.; González-Ladrón-De-Guevara, F. (2014). Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review. Information and Software Technology. 56(6):527-544. doi:10.1016/j.infsof.2014.01.003S52754456

RiuNet