1,598 research outputs found

    Mining app reviews to support software engineering

    Get PDF
    The thesis studies how mining app reviews can support software engineering. App reviews —short user reviews of an app in app stores— provide a potentially rich source of information to help software development teams maintain and evolve their products. Exploiting this information is however difficult due to the large number of reviews and the difficulty in extracting useful actionable information from short informal texts. A variety of app review mining techniques have been proposed to classify reviews and to extract information such as feature requests, bug descriptions, and user sentiments but the usefulness of these techniques in practice is still unknown. Research in this area has grown rapidly, resulting in a large number of scientific publications (at least 182 between 2010 and 2020) but nearly no independent evaluation and description of how diverse techniques fit together to support specific software engineering tasks have been performed so far. The thesis presents a series of contributions to address these limitations. We first report the findings of a systematic literature review in app review mining exposing the breadth and limitations of research in this area. Using findings from the literature review, we then present a reference model that relates features of app review mining tools to specific software engineering tasks supporting requirements engineering, software maintenance and evolution. We then present two additional contributions extending previous evaluations of app review mining techniques. We present a novel independent evaluation of opinion mining techniques using an annotated dataset created for our experiment. Our evaluation finds lower effectiveness than initially reported by the techniques authors. A final part of the thesis, evaluates approaches in searching for app reviews pertinent to a particular feature. The findings show a general purpose search technique is more effective than the state-of-the-art purpose-built app review mining techniques; and suggest their usefulness for requirements elicitation. Overall, the thesis contributes to improving the empirical evaluation of app review mining techniques and their application in software engineering practice. Researchers and developers of future app mining tools will benefit from the novel reference model, detailed experiments designs, and publicly available datasets presented in the thesis

    Mining and searching app reviews for requirements engineering: Evaluation and replication studies

    Get PDF
    App reviews provide a rich source of feature-related information that can support requirement engineering activities. Analysing them manually to find this information, however, is challenging due to their large quantity and noisy nature. To overcome the problem, automated approaches have been proposed for ‘feature-specific analysis’. Unfortunately, the effectiveness of these approaches has been evaluated using different methods and datasets. Replicating these studies to confirm their results and to provide benchmarks of different approaches is a challenging problem. We address the problem by extending previous evaluations and performing a comparison of these approaches. In this paper, we present two empirical studies. In the first study, we evaluate opinion mining approaches; the approaches extract features discussed in app reviews and identify their associated sentiments. In the second study, we evaluate approaches searching for feature-related reviews. The approaches search for users’ feedback pertinent to a particular feature. The results of both studies show these approaches achieve lower effectiveness than reported originally, and raise an important question about their practical use

    Knowledge Elicitation Methods for Affect Modelling in Education

    Get PDF
    Research on the relationship between affect and cognition in Artificial Intelligence in Education (AIEd) brings an important dimension to our understanding of how learning occurs and how it can be facilitated. Emotions are crucial to learning, but their nature, the conditions under which they occur, and their exact impact on learning for different learners in diverse contexts still needs to be mapped out. The study of affect during learning can be challenging, because emotions are subjective, fleeting phenomena that are often difficult for learners to report accurately and for observers to perceive reliably. Context forms an integral part of learners’ affect and the study thereof. This review provides a synthesis of the current knowledge elicitation methods that are used to aid the study of learners’ affect and to inform the design of intelligent technologies for learning. Advantages and disadvantages of the specific methods are discussed along with their respective potential for enhancing research in this area, and issues related to the interpretation of data that emerges as the result of their use. References to related research are also provided together with illustrative examples of where the individual methods have been used in the past. Therefore, this review is intended as a resource for methodological decision making for those who want to study emotions and their antecedents in AIEd contexts, i.e. where the aim is to inform the design and implementation of an intelligent learning environment or to evaluate its use and educational efficacy

    Decoding the cognitive states of attention and distraction in a real-life setting using EEG.

    Get PDF
    Lapses in attention can have serious consequences in situations such as driving a car, hence there is considerable interest in tracking it using neural measures. However, as most of these studies have been done in highly controlled and artificial laboratory settings, we want to explore whether it is also possible to determine attention and distraction using electroencephalogram (EEG) data collected in a natural setting using machine/deep learning. 24 participants volunteered for the study. Data were collected from pairs of participants simultaneously while they engaged in Tibetan Monastic debate, a practice that is interesting because it is a real-life situation that generates substantial variability in attention states. We found that attention was on average associated with increased left frontal alpha, increased left parietal theta, and decreased central delta compared to distraction. In an attempt to predict attention and distraction, we found that a Long Short Term Memory model classified attention and distraction with maximum accuracy of 95.86% and 95.4% corresponding to delta and theta waves respectively. This study demonstrates that EEG data collected in a real-life setting can be used to predict attention states in participants with good accuracy, opening doors for developing Brain-Computer Interfaces that track attention in real-time using data extracted in daily life settings, rendering them much more usable

    Impact of annotation modality on label quality and model performance in the automatic assessment of laughter in-the-wild

    Full text link
    Laughter is considered one of the most overt signals of joy. Laughter is well-recognized as a multimodal phenomenon but is most commonly detected by sensing the sound of laughter. It is unclear how perception and annotation of laughter differ when annotated from other modalities like video, via the body movements of laughter. In this paper we take a first step in this direction by asking if and how well laughter can be annotated when only audio, only video (containing full body movement information) or audiovisual modalities are available to annotators. We ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, three tasks common in previous studies of laughter. Our analysis of more than 4000 annotations acquired from 48 annotators revealed evidence for incongruity in the perception of laughter, and its intensity between modalities. Further analysis of annotations against consolidated audiovisual reference annotations revealed that recall was lower on average for video when compared to the audio condition, but tended to increase with the intensity of the laughter samples. Our machine learning experiments compared the performance of state-of-the-art unimodal (audio-based, video-based and acceleration-based) and multi-modal models for different combinations of input modalities, training label modality, and testing label modality. Models with video and acceleration inputs had similar performance regardless of training label modality, suggesting that it may be entirely appropriate to train models for laughter detection from body movements using video-acquired labels, despite their lower inter-rater agreement

    Dynamic Estimation of Rater Reliability using Multi-Armed Bandits

    Get PDF
    One of the critical success factors for supervised machine learning is the quality of target values, or predictions, associated with training instances. Predictions can be discrete labels (such as a binary variable specifying whether a blog post is positive or negative) or continuous ratings (for instance, how boring a video is on a 10-point scale). In some areas, predictions are readily available, while in others, the eort of human workers has to be involved. For instance, in the task of emotion recognition from speech, a large corpus of speech recordings is usually available, and humans denote which emotions are present in which recordings

    Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications

    Get PDF
    If a machine translation is wrong, how we can tell the underlying model to fix it? Answering this question requires (1) a machine learning algorithm to define update rules, (2) an interface for feedback to be submitted, and (3) expertise on the side of the human who gives the feedback. This thesis investigates solutions for machine learning updates, the suitability of feedback interfaces, and the dependency on reliability and expertise for different types of feedback. We start with an interactive online learning scenario where a machine translation (MT) system receives bandit feedback (i.e. only once per source) instead of references for learning. Policy gradient algorithms for statistical and neural MT are developed to learn from absolute and pairwise judgments. Our experiments on domain adaptation with simulated online feedback show that the models can largely improve under weak feedback, with variance reduction techniques being very effective. In production environments offline learning is often preferred over online learning. We evaluate algorithms for counterfactual learning from human feedback in a study on eBay product title translations. Feedback is either collected via explicit star ratings from users, or implicitly from the user interaction with cross-lingual product search. Leveraging implicit feedback turns out to be more successful due to lower levels of noise. We compare the reliability and learnability of absolute Likert-scale ratings with pairwise preferences in a smaller user study, and find that absolute ratings are overall more effective for improvements in down-stream tasks. Furthermore, we discover that error markings provide a cheap and practical alternative to error corrections. In a generalized interactive learning framework we propose a self-regulation approach, where the learner, guided by a regulator module, decides which type of feedback to choose for each input. The regulator is reinforced to find a good trade-off between supervision effect and cost. In our experiments, it discovers strategies that are more efficient than active learning and standard fully supervised learning

    Ontologies for automatic question generation

    Get PDF
    Assessment is an important tool for formal learning, especially in higher education. At present, many universities use online assessment systems where questions are entered manually into a question bank system. This kind of system requires the instructor’s time and effort to construct questions manually. The main aim of this thesis is, therefore, to contribute to the investigation of new question generation strategies for short/long answer questions in order to allow for the development of automatic factual question generation from an ontology for educational assessment purposes. This research is guided by four research questions: (1) How well can an ontology be used for generating factual assessment questions? (2) How can questions be generated from course ontology? (3) Are the ontological question generation strategies able to generate acceptable assessment questions? and (4) Do the topic-based indexing able to improve the feasibility of AQGen. We firstly conduct ontology validation to evaluate the appropriateness of concept representation using a competency question approach. We used revision questions from the textbook to obtain keyword (in revision questions) and a concept (in the ontology) matching. The results show that only half of the ontology concepts matched the keywords. We took further investigation on the unmatched concepts and found some incorrect concept naming and later suggest a guideline for an appropriate concept naming. At the same time, we introduce validation of ontology using revision questions as competency questions to check for ontology completeness. Furthermore, we also proposed 17 short/long answer question templates for 3 question categories, namely definition, concept completion and comparison. In the subsequent part of the thesis, we develop the AQGen tool and evaluate the generated questions. Two Computer Science subjects, namely OS and CNS, are chosen to evaluate AQGen generated questions. We conduct a questionnaire survey from 17 domain experts to identify experts’ agreement on the acceptability measure of AQGen generated questions. The experts’ agreements for acceptability measure are favourable, and it is reported that three of the four QG strategies proposed can generate acceptable questions. It has generated thousands of questions from the 3 question categories. AQGen is updated with question selection to generate a feasible question set from a tremendous amount of generated questions before. We have suggested topic-based indexing with the purpose to assert knowledge about topic chapters into ontology representation for question selection. The topic indexing shows a feasible result for filtering question by topics. Finally, our results contribute to an understanding of ontology element representation for question generations and how to automatically generate questions from ontology for education assessment

    Metadata, repository and methodology in learning objects

    Full text link
    Many universities in different countries are redesigning their degree and master programmes on the basis of new academic and professional profiles incorporating a number of competences. One competence can be acquired through several learning objects. A wide variety of learning repositories that provide resources for education in the form of learning objects can be found. These resources are normally stored in learning object repositories where they are catalogued with metadata facilitating retrieval by end users. The aim of this paper is to describe the elements associated to learning objects: metadata, repositories and their related methodologies.This research has been carried out under the project of innovation and educational improvement (PIME/2014/A21) “OAICE: Learning Objects for the Innovation, Creativity and Entrepreneurship Competence” funded by the Universitat Politècnica de València and the School of Computer Science.Fernández Diego, M.; Gordo Monzó, ML.; Boza García, A.; Cuenca, L.; Ruiz Font, L.; Alemany Díaz, MDM.; Alarcón Valero, F. (2015). Metadata, repository and methodology in learning objects. En EDULEARN15: 7th International Conference on Education and New Learning Technologies. Barcelona, Spain. July 6-8, 2015. IATED. 4755-4761. http://hdl.handle.net/10251/56975S4755476
    • …
    corecore