1,598 research outputs found
Mining app reviews to support software engineering
The thesis studies how mining app reviews can support software engineering.
App reviews âshort user reviews of an app in app storesâ provide a potentially rich source of information to help software development teams maintain and evolve their products. Exploiting this information is however difficult due to the large number of reviews and the difficulty in extracting useful actionable information from short informal texts.
A variety of app review mining techniques have been proposed to classify reviews and to extract information such as feature requests, bug descriptions, and user sentiments but the usefulness of these techniques in practice is still unknown. Research in this area has grown rapidly, resulting in a large number of scientific publications (at least 182 between 2010 and 2020) but nearly no independent evaluation and description of how diverse techniques fit together to support specific software engineering tasks have been performed so far.
The thesis presents a series of contributions to address these limitations. We first report the findings of a systematic literature review in app review mining exposing the breadth and limitations of research in this area. Using findings from the literature review, we then present a reference model that relates features of app review mining tools to specific software engineering tasks supporting requirements engineering, software maintenance and evolution.
We then present two additional contributions extending previous evaluations of app review mining techniques. We present a novel independent evaluation of opinion mining techniques using an annotated dataset created for our experiment. Our evaluation finds lower effectiveness than initially reported by the techniques authors. A final part of the thesis, evaluates approaches in searching for app reviews pertinent to a particular feature. The findings show a general purpose search technique is more effective than the state-of-the-art purpose-built app review mining techniques; and suggest their usefulness for requirements elicitation.
Overall, the thesis contributes to improving the empirical evaluation of app review mining techniques and their application in software engineering practice. Researchers and developers of future app mining tools will benefit from the novel reference model, detailed experiments designs, and publicly available datasets presented in the thesis
Mining and searching app reviews for requirements engineering: Evaluation and replication studies
App reviews provide a rich source of feature-related information that can support requirement engineering activities. Analysing them manually to find this information, however, is challenging due to their large quantity and noisy nature. To overcome the problem, automated approaches have been proposed for âfeature-specific analysisâ. Unfortunately, the effectiveness of these approaches has been evaluated using different methods and datasets. Replicating these studies to confirm their results and to provide benchmarks of different approaches is a challenging problem. We address the problem by extending previous evaluations and performing a comparison of these approaches. In this paper, we present two empirical studies. In the first study, we evaluate opinion mining approaches; the approaches extract features discussed in app reviews and identify their associated sentiments. In the second study, we evaluate approaches searching for feature-related reviews. The approaches search for usersâ feedback pertinent to a particular feature. The results of both studies show these approaches achieve lower effectiveness than reported originally, and raise an important question about their practical use
Knowledge Elicitation Methods for Affect Modelling in Education
Research on the relationship between affect and cognition in Artificial Intelligence in Education (AIEd) brings an important dimension to our understanding of how learning occurs and how it can be facilitated. Emotions are crucial to learning, but their nature, the conditions under which they occur, and their exact impact on learning for different learners in diverse contexts still needs to be mapped out. The study of affect during learning can be challenging, because emotions are subjective, fleeting phenomena that are often difficult for learners to report accurately and for observers to perceive reliably. Context forms an integral part of learnersâ affect and the study thereof. This review provides a synthesis of the current knowledge elicitation methods that are used to aid the study of learnersâ affect and to inform the design of intelligent technologies for learning. Advantages and disadvantages of the specific methods are discussed along with their respective potential for enhancing research in this area, and issues related to the interpretation of data that emerges as the result of their use. References to related research are also provided together with illustrative examples of where the individual methods have been used in the past. Therefore, this review is intended as a resource for methodological decision making for those who want to study emotions and their antecedents in AIEd contexts, i.e. where the aim is to inform the design and implementation of an intelligent learning environment or to evaluate its use and educational efficacy
Decoding the cognitive states of attention and distraction in a real-life setting using EEG.
Lapses in attention can have serious consequences in situations such as driving a car, hence there is considerable interest in tracking it using neural measures. However, as most of these studies have been done in highly controlled and artificial laboratory settings, we want to explore whether it is also possible to determine attention and distraction using electroencephalogram (EEG) data collected in a natural setting using machine/deep learning. 24 participants volunteered for the study. Data were collected from pairs of participants simultaneously while they engaged in Tibetan Monastic debate, a practice that is interesting because it is a real-life situation that generates substantial variability in attention states. We found that attention was on average associated with increased left frontal alpha, increased left parietal theta, and decreased central delta compared to distraction. In an attempt to predict attention and distraction, we found that a Long Short Term Memory model classified attention and distraction with maximum accuracy of 95.86% and 95.4% corresponding to delta and theta waves respectively. This study demonstrates that EEG data collected in a real-life setting can be used to predict attention states in participants with good accuracy, opening doors for developing Brain-Computer Interfaces that track attention in real-time using data extracted in daily life settings, rendering them much more usable
Impact of annotation modality on label quality and model performance in the automatic assessment of laughter in-the-wild
Laughter is considered one of the most overt signals of joy. Laughter is
well-recognized as a multimodal phenomenon but is most commonly detected by
sensing the sound of laughter. It is unclear how perception and annotation of
laughter differ when annotated from other modalities like video, via the body
movements of laughter. In this paper we take a first step in this direction by
asking if and how well laughter can be annotated when only audio, only video
(containing full body movement information) or audiovisual modalities are
available to annotators. We ask whether annotations of laughter are congruent
across modalities, and compare the effect that labeling modality has on machine
learning model performance. We compare annotations and models for laughter
detection, intensity estimation, and segmentation, three tasks common in
previous studies of laughter. Our analysis of more than 4000 annotations
acquired from 48 annotators revealed evidence for incongruity in the perception
of laughter, and its intensity between modalities. Further analysis of
annotations against consolidated audiovisual reference annotations revealed
that recall was lower on average for video when compared to the audio
condition, but tended to increase with the intensity of the laughter samples.
Our machine learning experiments compared the performance of state-of-the-art
unimodal (audio-based, video-based and acceleration-based) and multi-modal
models for different combinations of input modalities, training label modality,
and testing label modality. Models with video and acceleration inputs had
similar performance regardless of training label modality, suggesting that it
may be entirely appropriate to train models for laughter detection from body
movements using video-acquired labels, despite their lower inter-rater
agreement
Dynamic Estimation of Rater Reliability using Multi-Armed Bandits
One of the critical success factors for supervised machine learning is the quality of target values, or predictions, associated with training instances. Predictions can be discrete labels (such as a binary variable specifying whether a blog post is positive or negative) or continuous ratings (for instance, how boring a video is on a 10-point scale). In some areas, predictions are readily available, while in others, the eort of human workers has to be involved. For instance, in the task of emotion recognition from speech, a large corpus of speech recordings is usually available, and humans denote which emotions are present in which recordings
Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications
If a machine translation is wrong, how we can tell the underlying model to fix it? Answering this question requires (1) a machine learning algorithm to define update rules, (2) an interface for feedback to be submitted, and (3) expertise on the side of the human who gives the feedback. This thesis investigates solutions for machine learning updates, the suitability of feedback interfaces, and the dependency on reliability and expertise for different types of feedback.
We start with an interactive online learning scenario where a machine translation (MT) system receives bandit feedback (i.e. only once per source) instead of references for learning. Policy gradient algorithms for statistical and neural MT are developed to learn from absolute and pairwise judgments. Our experiments on domain adaptation with simulated online feedback show that the models can largely improve under weak feedback, with variance reduction techniques being very effective.
In production environments offline learning is often preferred over online learning. We evaluate algorithms for counterfactual learning from human feedback in a study on eBay product title translations. Feedback is either collected via explicit star ratings from users, or implicitly from the user interaction with cross-lingual product search. Leveraging implicit feedback turns out to be more successful due to lower levels of noise. We compare the reliability and learnability of absolute Likert-scale ratings with pairwise preferences in a smaller user study, and find that absolute ratings are overall more effective for improvements in down-stream tasks. Furthermore, we discover that error markings provide a cheap and practical alternative to error corrections.
In a generalized interactive learning framework we propose a self-regulation approach, where the learner, guided by a regulator module, decides which type of feedback to choose for each input. The regulator is reinforced to find a good trade-off between supervision effect and cost. In our experiments, it discovers strategies that are more efficient than active learning and standard fully supervised learning
Ontologies for automatic question generation
Assessment is an important tool for formal learning, especially in higher education. At present, many universities use online assessment systems where questions are entered manually into a question bank system. This kind of system requires the instructorâs time and effort to construct questions manually. The main aim of this thesis is, therefore, to contribute to the investigation of new question generation strategies for short/long answer questions in order to allow for the development of automatic factual question generation from an ontology for educational assessment purposes. This research is guided by four research questions: (1) How well can an ontology be used for generating factual assessment questions? (2) How can questions be generated from course ontology? (3) Are the ontological question generation strategies able to generate acceptable assessment questions? and (4) Do the topic-based indexing able to improve the feasibility of AQGen.
We firstly conduct ontology validation to evaluate the appropriateness of concept representation using a competency question approach. We used revision questions from the textbook to obtain keyword (in revision questions) and a concept (in the ontology) matching. The results show that only half of the ontology concepts matched the keywords. We took further investigation on the unmatched concepts and found some incorrect concept naming and later suggest a guideline for an appropriate concept naming. At the same time, we introduce validation of ontology using revision questions as competency questions to check for ontology completeness. Furthermore, we also proposed 17 short/long answer question templates for 3 question categories, namely definition, concept completion and comparison.
In the subsequent part of the thesis, we develop the AQGen tool and evaluate the generated questions. Two Computer Science subjects, namely OS and CNS, are chosen to evaluate AQGen generated questions. We conduct a questionnaire survey from 17 domain experts to identify expertsâ agreement on the acceptability measure of AQGen generated questions. The expertsâ agreements for acceptability measure are favourable, and it is reported that three of the four QG strategies proposed can generate acceptable questions. It has generated thousands of questions from the 3 question categories. AQGen is updated with question selection to generate a feasible question set from a tremendous amount of generated questions before. We have suggested topic-based indexing with the purpose to assert knowledge about topic chapters into ontology representation for question selection. The topic indexing shows a feasible result for filtering question by topics.
Finally, our results contribute to an understanding of ontology element representation for question generations and how to automatically generate questions from ontology for education assessment
Metadata, repository and methodology in learning objects
Many universities in different countries are redesigning their degree and master programmes on the
basis of new academic and professional profiles incorporating a number of competences. One
competence can be acquired through several learning objects. A wide variety of learning repositories
that provide resources for education in the form of learning objects can be found. These resources are
normally stored in learning object repositories where they are catalogued with metadata facilitating
retrieval by end users. The aim of this paper is to describe the elements associated to learning
objects: metadata, repositories and their related methodologies.This research has been carried out under the project of innovation and educational improvement (PIME/2014/A21) âOAICE: Learning Objects for the Innovation, Creativity and Entrepreneurship Competenceâ funded by the Universitat Politècnica de València and the School of Computer Science.FernĂĄndez Diego, M.; Gordo MonzĂł, ML.; Boza GarcĂa, A.; Cuenca, L.; Ruiz Font, L.; Alemany DĂaz, MDM.; AlarcĂłn Valero, F. (2015). Metadata, repository and methodology in learning objects. En EDULEARN15: 7th International Conference on Education and New Learning Technologies. Barcelona, Spain. July 6-8, 2015. IATED. 4755-4761. http://hdl.handle.net/10251/56975S4755476
- âŚ