43,995 research outputs found
Investigating effort prediction of web-based applications using CBR on the ISBSG dataset
As web-based applications become more popular and more sophisticated, so does the requirement for early accurate estimates of the effort required to build such systems. Case-based reasoning (CBR) has been shown to be a reasonably effective estimation strategy, although it has not been widely explored in the context of web applications. This paper reports on a study carried out on a subset of the ISBSG dataset to examine the optimal number of analogies that should be used in making a prediction. The results show that it is not possible to select such a value with confidence, and that, in common with other findings in different domains, the effectiveness of CBR is hampered by other factors including the characteristics of the underlying dataset (such as the spread of data and presence of outliers) and the calculation employed to evaluate the distance function (in particular, the treatment of numeric and categorical data)
The consistency of empirical comparisons of regression and analogy-based software project cost prediction
OBJECTIVE - to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD â we conducted an exhaustive search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. RESULTS â our analysis found that about 25% of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogy-based methods. CONCLUSIONS â we confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: âWhat is the best prediction system?
Recommended from our members
A systematic review of software development cost estimation studies
This paper aims to provide a basis for the improvement of software estimation research through a systematic review of previous work. The review identifies 304 software cost estimation papers in 76 journals and classifies the papers according to research topic, estimation approach, research approach, study context and data set. A web-based library of these cost estimation papers is provided to ease the identification of relevant estimation research results. The review results combined with other knowledge provide support for recommendations for future software cost estimation research, including: 1) Increase the breadth of the search for relevant studies, 2) Search manually for relevant papers within a carefully selected set of journals when completeness is essential, 3) Conduct more studies on estimation methods commonly used by the software industry, and, 4) Increase the awareness of how properties of the data sets impact the results when evaluating estimation methods
International conference on software engineering and knowledge engineering: Session chair
The Thirtieth International Conference on Software Engineering and Knowledge Engineering (SEKE 2018) will be held at the Hotel Pullman, San Francisco Bay, USA, from July 1 to July 3, 2018. SEKE2018 will also be dedicated in memory of Professor Lofti Zadeh, a great scholar, pioneer and leader in fuzzy sets theory and soft computing.
The conference aims at bringing together experts in software engineering and knowledge engineering to discuss on relevant results in either software engineering or knowledge engineering or both. Special emphasis will be put on the transference of methods between both domains. The theme this year is soft computing in software engineering & knowledge engineering. Submission of papers and demos are both welcome
Recommended from our members
Data sets and data quality in software engineering
OBJECTIVE - to assess the extent and types of techniques used to manage quality within software engineering data sets. We consider this a particularly interesting question in the context of initiatives to promote sharing and secondary analysis of data sets.
METHOD - we perform a systematic review of available empirical software engineering studies.
RESULTS - only 23 out of the many hundreds of studies assessed, explicitly considered data quality.
CONCLUSIONS - first, the community needs to consider the quality and appropriateness of the data set being utilised; not all data sets are equal. Second, we need more research into means of identifying, and ideally repairing, noisy cases. Third, it should become routine to use sensitivity analysis to assess conclusion stability with respect to the assumptions that must be made concerning noise levels
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Labeling training datasets has become a key barrier to building medical
machine learning models. One strategy is to generate training labels
programmatically, for example by applying natural language processing pipelines
to text reports associated with imaging studies. We propose cross-modal data
programming, which generalizes this intuitive strategy in a
theoretically-grounded way that enables simpler, clinician-driven input,
reduces required labeling time, and improves with additional unlabeled data. In
this approach, clinicians generate training labels for models defined over a
target modality (e.g. images or time series) by writing rules over an auxiliary
modality (e.g. text reports). The resulting technical challenge consists of
estimating the accuracies and correlations of these rules; we extend a recent
unsupervised generative modeling technique to handle this cross-modal setting
in a provably consistent way. Across four applications in radiography, computed
tomography, and electroencephalography, and using only several hours of
clinician time, our approach matches or exceeds the efficacy of
physician-months of hand-labeling with statistical significance, demonstrating
a fundamentally faster and more flexible way of building machine learning
models in medicine
- âŚ