2 research outputs found
On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow
Abundant data is the key to successful machine learning. However, supervised
learning requires annotated data that are often hard to obtain. In a
classification task with limited resources, Active Learning (AL) promises to
guide annotators to examples that bring the most value for a classifier. AL can
be successfully combined with self-training, i.e., extending a training set
with the unlabelled examples for which a classifier is the most certain. We
report our experiences on using AL in a systematic manner to train an SVM
classifier for Stack Overflow posts discussing performance of software
components. We show that the training examples deemed as the most valuable to
the classifier are also the most difficult for humans to annotate. Despite
carefully evolved annotation criteria, we report low inter-rater agreement, but
we also propose mitigation strategies. Finally, based on one annotator's work,
we show that self-training can improve the classification accuracy. We conclude
the paper by discussing implication for future text miners aspiring to use AL
and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International
Conference on Evaluation and Assessment in Software Engineering, 201
Towards Software Assets Origin Selection Supported by a Knowledge Repository
Software architecture is no more a mere system specification as resulting from the design phase, but it includes the process by which its specification was carried out. In this respect, design decisions in component-based software engineering play an important role: they are used to enhance the quality of the system, keep the current market level, keep partnership relationships, reduce costs, and so forth. For non trivial systems, a recurring situation is the selection of an asset origin, that is if going for in-house, outsourcing, open-source, or COTS, when in the need of a certain missing functionality. Usually, the decision making process follows a case-by-case approach, in which historical information is largely neglected. This solution avoids the overhead of keeping detailed documentation about past decisions, but hampers consistency among multiple, possibly related, decisions. The ORION project aims at developing a decision support framework in which historical decision information plays a pivotal role: it is used to analyse current decision scenarios, take well-founded decisions, and store the collected data for future exploitation. In this paper, we outline the potentials of such a knowledge repository, including the information it is intended to be stored in it, and when and how to retrieve it within a decision case.ORIO