Search CORE

635 research outputs found

Recommended from our members

Data sets and data quality in software engineering

Author: Liebchen G
Shepperd M J
Publication venue: ACM Press
Publication date: 01/01/2008
Field of study

OBJECTIVE - to assess the extent and types of techniques used to manage quality within software engineering data sets. We consider this a particularly interesting question in the context of initiatives to promote sharing and secondary analysis of data sets. METHOD - we perform a systematic review of available empirical software engineering studies. RESULTS - only 23 out of the many hundreds of studies assessed, explicitly considered data quality. CONCLUSIONS - first, the community needs to consider the quality and appropriateness of the data set being utilised; not all data sets are equal. Second, we need more research into means of identifying, and ideally repairing, noisy cases. Third, it should become routine to use sensitivity analysis to assess conclusion stability with respect to the assumptions that must be made concerning noise levels

Brunel University Research Archive

Recommended from our members

An evaluation of e-learning standards

Author: Jayal A
Shepperd M J
Publication venue
Publication date: 01/01/2007
Field of study

The aim of this investigation is to perform an independent study of the various emerging elearning standards. This paper presents a summary of these standards in order to make them more accessible and understandable, and provide preliminary evidence as to their utility and adoption by the various UK higher and further education institutions. Recently there have been efforts to define standards for the elearning contents and elearning components like the IEEELOM, UKLOM, IMS, SCORM and OKI. Since it was not possible to cover all the standards in detail within the time available, so our independent study focuses on eight standards Although the results of the preliminary study suggest that the eight standards considered in the study may help interoperability, accessibility and reusability of the elearning content and elearning components, but it is yet to be seen how many of these are actually followed at UK higher education institutions

Brunel University Research Archive

Comparing software prediction techniques using simulation

Author: Kadoda G
Shepperd M J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

The need for accurate software prediction systems increases as software becomes much larger and more complex. We believe that the underlying characteristics: size, number of features, type of distribution, etc., of the data set influence the choice of the prediction system to be used. For this reason, we would like to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system, and data set characteristic. It would also be useful to have a large validation data set. Our solution is to simulate data allowing both control and the possibility of large (1000) validation cases. The authors compare four prediction techniques: regression, rule induction, nearest neighbor (a form of case-based reasoning), and neural nets. The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider prediction context when evaluating competing prediction systems. We observed that the more "messy" the data and the more complex the relationship with the dependent variable, the more variability in the results. In the more complex cases, we observed significantly different results depending upon the particular training set that has been sampled from the underlying data set. However, our most important result is that it is more fruitful to ask which is the best prediction system in a particular context rather than which is the "best" prediction system

Crossref

Brunel University Research Archive

The consistency of empirical comparisons of regression and analogy-based software project cost prediction

Author: Mair CM
Shepperd M J
Publication venue: IEEE Computer Society
Publication date: 01/01/2005
Field of study

OBJECTIVE - to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD – we conducted an exhaustive search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. RESULTS – our analysis found that about 25% of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogy-based methods. CONCLUSIONS – we confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: “What is the best prediction system?

Brunel University Research Archive

The problem of labels in e-assessment of diagrams

Author: Jayal A
Shepperd M J
Publication venue
Publication date: 01/01/2008
Field of study

In this short paper we explore a problematic aspect of automated assessment of diagrams. Diagrams have partial and sometimes inconsistent semantics. Typically much of the meaning of diagram resides in the labels, however, the choice of labeling is largely unrestricted. This means a correct solution may utilise differing yet semantically equivalent labels to the specimen solution. With human marking this problem can be easily overcome. Unfortunately with e-assessment this is challenging. We empirically explore the scale of the problem of synonyms by analysing 160 student solutions to a UML task. From this we find that cumulative growth of synonyms only shows a limited tendency to reduce at the margin. This finding has significant implications for the ease in which we may develop future e-assessment systems of diagrams, in that the need for better algorithms for assessing label semantic similarity becomes inescapable

CiteSeerX

Brunel University Research Archive

Recommended from our members

The use of function points to find cost analogies

Author: Atkinson K
Shepperd M J
Publication venue
Publication date: 01/01/1994
Field of study

Finding effective techniques for the early estimation of project effort remains an important — and frustratingly elusive — research objective for the software development community. We have conducted an empirical study of 21 real time projects for a major software developer. The study collected a range of counts and measures derived from specification documents, including a derivative of Function Points intended for highly constrained systems. Notwithstanding the fact that the projects were drawn from a comparatively stable environment, traditional approaches for building prediction systems, (for example, regression analysis) failed to yield a useful predictive model. By contrast, estimation based upon the automated search for analogous projects produced more accurate estimates. How much this is a characteristic of this particular dataset and how much these findings might be more generally replicated is uncertain. Nevertheless, these results should act as encouragement for follow up research on a much under utilised estimation technique

Brunel University Research Archive

Search Heuristics, Case-Based Reasoning and Software Project Effort Prediction

Author: Hart J
Kirsopp C
Shepperd M J
Publication venue: Morgan Kaufmann Publishers Inc.
Publication date: 01/01/2002
Field of study

This paper reports on the use of search techniques to help optimise a case-based reasoning (CBR) system for predicting software project effort. A major problem, common to ML techniques in general, has been dealing with large numbers of case features, some of which can hinder the prediction process. Unfortunately searching for the optimal feature subset is a combinatorial problem and therefore NP-hard. This paper examines the use of random searching, hill climbing and forward sequential selection (FSS) to tackle this problem. Results from examining a set of real software project data show that even random searching was better than using all available for features (average error 35.6% rather than 50.8%). Hill climbing and FSS both produced results substantially better than the random search (15.3 and 13.1% respectively), but FSS was more computationally efficient. Providing a description of the fitness landscape of a problem along with search results is a step towards the classification of search problems and their assignment to optimum search techniques. This paper attempts to describe the fitness landscape of this problem by combining the results from random searches and hill climbing, as well as using multi-dimensional scaling to aid visualisation. Amongst other findings, the visualisation results suggest that some form of heuristic-based initialisation might prove useful for this problem

CiteSeerX

Brunel University Research Archive

Recommended from our members

An empirical study of evolution of inheritance in Java OSS

Author: Counsell S
Nasseri E
Shepperd M J
Publication venue: ACM Press
Publication date: 01/01/2006
Field of study

Previous studies of Object-Oriented (OO) software have reported avoidance of the inheritance mechanism and cast doubt on the wisdom of ‘deep’ inheritance levels. From an evolutionary perspective, the picture is unclear - we still know relatively little about how, over time, changes tend to be applied by developers. Our conjecture is that an inheritance hierarchy will tend to grow ‘breadth-wise’ rather than ‘depth-wise’. This claim is made on the basis that developers will avoid extending depth in favour of breadth because of the inherent complexity of having to understand the functionality of superclasses. Thus the goal of our study is to investigate this empirically. We conduct an empirical study of seven Java Open-Source Systems (OSSs) over a series of releases to observe the nature and location of changes within the inheritance hierarchies. Results show a strong tendency for classes to be added at levels one and two of the hierarchy (rather than anywhere else). Over 96% of classes added over the course of the versions of all systems were at level 1 or level 2. The results suggest that changes cluster in the shallow levels of a hierarchy; this is relevant for developers since it indicates where remedial activities such as refactoring should be focused

Brunel University Research Archive

Letter from J. L. Shepperd

Author: Shepperd J. L.
Publication venue: Hosted by Utah State University Libraries
Publication date: 20/12/1905
Field of study

Letter concerning recommendations for position in Domestic Science at Utah Agricultural College

DigitalCommons@USU

Integrate the GM(1,1) and Verhulst models to predict software stage effort

Author: MacDonell S
Shen J
Shepperd M
Song Q
Wang Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2009
Field of study

This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Software effort prediction clearly plays a crucial role in software project management. In keeping with more dynamic approaches to software development, it is not sufficient to only predict the whole-project effort at an early stage. Rather, the project manager must also dynamically predict the effort of different stages or activities during the software development process. This can assist the project manager to reestimate effort and adjust the project plan, thus avoiding effort or schedule overruns. This paper presents a method for software physical time stage-effort prediction based on grey models GM(1,1) and Verhulst. This method establishes models dynamically according to particular types of stage-effort sequences, and can adapt to particular development methodologies automatically by using a novel grey feedback mechanism. We evaluate the proposed method with a large-scale real-world software engineering dataset, and compare it with the linear regression method and the Kalman filter method, revealing that accuracy has been improved by at least 28% and 50%, respectively. The results indicate that the method can be effective and has considerable potential. We believe that stage predictions could be a useful complement to whole-project effort prediction methods.National Natural Science Foundation of China and the Hi-Tech Research and Development Program of Chin

Crossref

AUT Scholarly Commons

Brunel University Research Archive