Search CORE

2 research outputs found

On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

Author: Allamanis M.
Chowdhury S.
Cicchetti A.
Lin Y.
Pedregosa F.
Settles B.
Settles B.
Soliman M.
Ying A.
Publication venue
Publication date: 01/01/2017
Field of study

Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to train an SVM classifier for Stack Overflow posts discussing performance of software components. We show that the training examples deemed as the most valuable to the classifier are also the most difficult for humans to annotate. Despite carefully evolved annotation criteria, we report low inter-rater agreement, but we also propose mitigation strategies. Finally, based on one annotator's work, we show that self-training can improve the classification accuracy. We conclude the paper by discussing implication for future text miners aspiring to use AL and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 201

arXiv.org e-Print Archive

Lund University Publications

Crossref

Swedish Institute of Computer Science Publications Database

Towards Software Assets Origin Selection Supported by a Knowledge Repository

Author: Borg Markus
Carlsson Jan
Cicchetti Antonio
Papatheocharous Efi
Sentilles Severine
Wnuk Krzysztof
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Software architecture is no more a mere system specification as resulting from the design phase, but it includes the process by which its specification was carried out. In this respect, design decisions in component-based software engineering play an important role: they are used to enhance the quality of the system, keep the current market level, keep partnership relationships, reduce costs, and so forth. For non trivial systems, a recurring situation is the selection of an asset origin, that is if going for in-house, outsourcing, open-source, or COTS, when in the need of a certain missing functionality. Usually, the decision making process follows a case-by-case approach, in which historical information is largely neglected. This solution avoids the overhead of keeping detailed documentation about past decisions, but hampers consistency among multiple, possibly related, decisions. The ORION project aims at developing a decision support framework in which historical decision information plays a pivotal role: it is used to analyse current decision scenarios, take well-founded decisions, and store the collected data for future exploitation. In this paper, we outline the potentials of such a knowledge repository, including the information it is intended to be stored in it, and when and how to retrieve it within a decision case.ORIO

Crossref

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Software institutes' Online Digital Archive