2 research outputs found

    On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

    Full text link
    Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to train an SVM classifier for Stack Overflow posts discussing performance of software components. We show that the training examples deemed as the most valuable to the classifier are also the most difficult for humans to annotate. Despite carefully evolved annotation criteria, we report low inter-rater agreement, but we also propose mitigation strategies. Finally, based on one annotator's work, we show that self-training can improve the classification accuracy. We conclude the paper by discussing implication for future text miners aspiring to use AL and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 201

    Towards Software Assets Origin Selection Supported by a Knowledge Repository

    Get PDF
    Software architecture is no more a mere system specification as resulting from the design phase, but it includes the process by which its specification was carried out. In this respect, design decisions in component-based software engineering play an important role: they are used to enhance the quality of the system, keep the current market level, keep partnership relationships, reduce costs, and so forth. For non trivial systems, a recurring situation is the selection of an asset origin, that is if going for in-house, outsourcing, open-source, or COTS, when in the need of a certain missing functionality. Usually, the decision making process follows a case-by-case approach, in which historical information is largely neglected. This solution avoids the overhead of keeping detailed documentation about past decisions, but hampers consistency among multiple, possibly related, decisions. The ORION project aims at developing a decision support framework in which historical decision information plays a pivotal role: it is used to analyse current decision scenarios, take well-founded decisions, and store the collected data for future exploitation. In this paper, we outline the potentials of such a knowledge repository, including the information it is intended to be stored in it, and when and how to retrieve it within a decision case.ORIO
    corecore