34,887 research outputs found
Recommended from our members
Estimating software project effort using analogies
Accurate project effort prediction is an important goal for the software engineering community. To date most work has focused upon building algorithmic models of effort, for example COCOMO. These can be calibrated to local environments. We describe an alternative approach to estimation based upon the use of analogies. The underlying principle is to characterise projects in terms of features (for example, the number of interfaces, the development method or the size of the functional requirements document). Completed projects are stored and then the problem becomes one of finding the most similar projects to the one for which a prediction is required. Similarity is defined as Euclidean distance in n-dimensional space where n is the number of project features. Each dimension is standardised so all dimensions have equal weight. The known effort values of the nearest neighbours to the new project are then used as the basis for the prediction. The process is automated using a PC based tool known as ANGEL. The method is validated on nine different industrial datasets (a total of 275 projects) and in all cases analogy outperforms algorithmic models based upon stepwise regression. From this work we argue that estimation by analogy is a viable technique that, at the very least, can be used by project managers to complement current estimation techniques
Gendered behavior as a disadvantage in open source software development
Women are severely marginalized in software development, especially in open
source. In this article we argue that disadvantage is more due to gendered
behavior than to categorical discrimination: women are at a disadvantage
because of what they do, rather than because of who they are. Using data on
entire careers of users from GitHub.com, we develop a measure to capture the
gendered pattern of behavior: We use a random forest prediction of being female
(as opposed to being male) by behavioral choices in the level of activity,
specialization in programming languages, and choice of partners. We test
differences in success and survival along both categorical gender and the
gendered pattern of behavior. We find that 84.5% of women's disadvantage
(compared to men) in success and 34.8% of their disadvantage in survival are
due to the female pattern of their behavior. Men are also disadvantaged along
their interquartile range of the female pattern of their behavior, and users
who don't reveal their gender suffer an even more drastic disadvantage in
survival probability. Moreover, we do not see evidence for any reduction of
these inequalities in time. Our findings are robust to noise in gender
recognition, and to taking into account particular programming languages, or
decision tree classes of gendered behavior. Our results suggest that fighting
categorical gender discrimination will have a limited impact on gender
inequalities in open source software development, and that gender hiding is not
a viable strategy for women
Is One Hyperparameter Optimizer Enough?
Hyperparameter tuning is the black art of automatically finding a good
combination of control parameters for a data miner. While widely applied in
empirical Software Engineering, there has not been much discussion on which
hyperparameter tuner is best for software analytics. To address this gap in the
literature, this paper applied a range of hyperparameter optimizers (grid
search, random search, differential evolution, and Bayesian optimization) to
defect prediction problem. Surprisingly, no hyperparameter optimizer was
observed to be `best' and, for one of the two evaluation measures studied here
(F-measure), hyperparameter optimization, in 50\% cases, was no better than
using default configurations.
We conclude that hyperparameter optimization is more nuanced than previously
believed. While such optimization can certainly lead to large improvements in
the performance of classifiers used in software analytics, it remains to be
seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1
- …