1,221 research outputs found

    Mining app reviews to support software engineering

    Get PDF
    The thesis studies how mining app reviews can support software engineering. App reviews —short user reviews of an app in app stores— provide a potentially rich source of information to help software development teams maintain and evolve their products. Exploiting this information is however difficult due to the large number of reviews and the difficulty in extracting useful actionable information from short informal texts. A variety of app review mining techniques have been proposed to classify reviews and to extract information such as feature requests, bug descriptions, and user sentiments but the usefulness of these techniques in practice is still unknown. Research in this area has grown rapidly, resulting in a large number of scientific publications (at least 182 between 2010 and 2020) but nearly no independent evaluation and description of how diverse techniques fit together to support specific software engineering tasks have been performed so far. The thesis presents a series of contributions to address these limitations. We first report the findings of a systematic literature review in app review mining exposing the breadth and limitations of research in this area. Using findings from the literature review, we then present a reference model that relates features of app review mining tools to specific software engineering tasks supporting requirements engineering, software maintenance and evolution. We then present two additional contributions extending previous evaluations of app review mining techniques. We present a novel independent evaluation of opinion mining techniques using an annotated dataset created for our experiment. Our evaluation finds lower effectiveness than initially reported by the techniques authors. A final part of the thesis, evaluates approaches in searching for app reviews pertinent to a particular feature. The findings show a general purpose search technique is more effective than the state-of-the-art purpose-built app review mining techniques; and suggest their usefulness for requirements elicitation. Overall, the thesis contributes to improving the empirical evaluation of app review mining techniques and their application in software engineering practice. Researchers and developers of future app mining tools will benefit from the novel reference model, detailed experiments designs, and publicly available datasets presented in the thesis

    Synthesis of Attributed Feature Models From Product Descriptions: Foundations

    Get PDF
    Feature modeling is a widely used formalism to characterize a set of products (also called configurations). As a manual elaboration is a long and arduous task, numerous techniques have been proposed to reverse engineer feature models from various kinds of artefacts. But none of them synthesize feature attributes (or constraints over attributes) despite the practical relevance of attributes for documenting the different values across a range of products. In this report, we develop an algorithm for synthesizing attributed feature models given a set of product descriptions. We present sound, complete, and parametrizable techniques for computing all possible hierarchies, feature groups, placements of feature attributes, domain values, and constraints. We perform a complexity analysis w.r.t. number of features, attributes, configurations, and domain size. We also evaluate the scalability of our synthesis procedure using randomized configuration matrices. This report is a first step that aims to describe the foundations for synthesizing attributed feature models

    Understanding, Analysis, and Handling of Software Architecture Erosion

    Get PDF
    Architecture erosion occurs when a software system's implemented architecture diverges from the intended architecture over time. Studies show erosion impacts development, maintenance, and evolution since it accumulates imperceptibly. Identifying early symptoms like architectural smells enables managing erosion through refactoring. However, research lacks comprehensive understanding of erosion, unclear which symptoms are most common, and lacks detection methods. This thesis establishes an erosion landscape, investigates symptoms, and proposes identification approaches. A mapping study covers erosion definitions, symptoms, causes, and consequences. Key findings: 1) "Architecture erosion" is the most used term, with four perspectives on definitions and respective symptom types. 2) Technical and non-technical reasons contribute to erosion, negatively impacting quality attributes. Practitioners can advocate addressing erosion to prevent failures. 3) Detection and correction approaches are categorized, with consistency and evolution-based approaches commonly mentioned.An empirical study explores practitioner perspectives through communities, surveys, and interviews. Findings reveal associated practices like code review and tools identify symptoms, while collected measures address erosion during implementation. Studying code review comments analyzes erosion in practice. One study reveals architectural violations, duplicate functionality, and cyclic dependencies are most frequent. Symptoms decreased over time, indicating increased stability. Most were addressed after review. A second study explores violation symptoms in four projects, identifying 10 categories. Refactoring and removing code address most violations, while some are disregarded.Machine learning classifiers using pre-trained word embeddings identify violation symptoms from code reviews. Key findings: 1) SVM with word2vec achieved highest performance. 2) fastText embeddings worked well. 3) 200-dimensional embeddings outperformed 100/300-dimensional. 4) Ensemble classifier improved performance. 5) Practitioners found results valuable, confirming potential.An automated recommendation system identifies qualified reviewers for violations using similarity detection on file paths and comments. Experiments show common methods perform well, outperforming a baseline approach. Sampling techniques impact recommendation performance

    Recommender Systems for Grocery Retail - A Machine Learning Approach

    Get PDF
    Recommender systems are present in our daily activities in different moments, such as when choosing a song to listen to or when doing online shopping. It is an everyday reality for people to have the help of computer systems in order to simplify regular decision activities. Grocery shopping is an essential part of people’s life and a frequent activity. Despite being a common habit, each customer has unique routines, needs and preferences regarding products and brands. This information is valuable for grocery retailers to know their customers better and to improve their marketing and operational activities. This dissertation aims to apply machine learning algorithms to the development of a recommender system capable of preparing personalized grocery shopping lists. The proposed architecture is designed to allow integration with different grocery retailers and support distinct TensorFlow algorithms. The process of extracting information from the dataset as features was explored, as well as the tuning of the model hyperparameters, to obtain better results. The recommendation engine is exposed via a distributed software architecture designed to allow retailers to integrate the recommender system with different existing solutions (e.g., websites or mobile applications). A case study to validate the implemented solution was performed, integrating it with a public dataset provided by Instacart. A comparison study between different machine learning algorithms over the adopted dataset has lead to the choice of the gradient boosted trees algorithm. The solution developed in the case study was compared against two non-machine learning approaches at predicting the last purchase of 360 arbitrary test customers. A pattern miningbased solution and a SQL-based heuristic were used. Different evaluation metrics (namely, the average accuracy, precision, recall, and f1-score) were registered. The way association rules with different strengths were reflected in the predictions of the developed solution was also analyzed. The gradient boosted trees-based implementation from the case study was capable of outperforming the compared solutions as far as evaluation metrics are concerned, and has shown a higher capability of predicting at least one correct item per customer. Also, it became evident that the strictest association rules were frequently found in the recommendations. The adopted solution and algorithm have shown promising results and a remarkable capability to provide meaningful predictions to the different customers, evidencing its capability to add value to grocery retail. Nevertheless, there is still potential for further expansion.Os sistemas de recomendação estão presentes no nosso quotidiano, em momentos como a escolha da música a ouvir ou a preparação de compras online. Estamos acostumados a contar com a ajuda de sistemas computacionais para simplificar tarefas habituais que envolvem decisões. Realizar compras de retalho alimentar é uma parte importante e frequente da nossa vida. Apesar de ser um hábito comum, cada um de nós tem as suas próprias rotinas, necessidades e preferências no que toca a produtos e marcas. Esta informação é valiosa para que os retalhistas alimentares consigam conhecer melhor os seus clientes e melhorar atividades operacionais e de marketing. Esta dissertação tem como objetivo a aplicação de algoritmos de machine learning na criação de um sistema de recomendação capaz de preparar listas de compras personalizadas. A arquitetura proposta é desenhada com o objetivo de permitir a integração com diferentes retalhistas e a utilização de diferentes algoritmos em TensorFlow. O processo de extração de informação na forma de features foi explorado, tal como a afinação dos hiperparâmetros do modelo, para obter melhores resultados. O motor de recomendações é exposto através de uma arquitetura de software distribuída, com o propósito de permitir que os retalhistas alimentares possam integrar este sistema com diferentes soluções existentes (e.g., websites ou aplicações móveis). Foi realizado um caso de estudo para validar a solução implementada, através da integração da solução com os dados públicos disponibilizados pelo retalhista Instacart. Uma comparação entre a aplicação de diferentes algoritmos de machine learning aos dados utilizados, levou à adoção do algoritmo gradient boosted trees. A solução desenvolvida no caso de estudo foi comparada com duas abordagens não baseadas em machine learning para a previsão da última compra de 360 clientes arbitrários. Foi usada uma abordagem baseada em pattern mining e uma abordagem baseada em SQL. Diferentes métricas de avaliação (nomeadamente accuracy, precision, recall e f1-score médios) foram registadas. Foi também analisada a forma como diferentes regras de associação se encontraram refletidas nas recomendações da solução desenvolvida. A implementação baseada em gradient boosted trees do caso de estudo superou as soluções com as quais foi comparada quanto às métricas de avaliação, e mostrou uma maior capacidade de recomendar pelo menos um produto correto por cliente. Verificou-se também que as regras de associação mais fortes estão frequentemente refletidas nas recomendações. A abordagem adotada e o algoritmo aprofundado mostraram resultados promissores e uma capacidade notável de fornecer recomendações úteis aos diferentes clientes, evidenciando a sua aptidão para adicionar valor ao retalho alimentar. Ainda assim, este sistema apresenta um elevado potencial para expansão

    Designing Human-Centered Collective Intelligence

    Get PDF
    Human-Centered Collective Intelligence (HCCI) is an emergent research area that seeks to bring together major research areas like machine learning, statistical modeling, information retrieval, market research, and software engineering to address challenges pertaining to deriving intelligent insights and solutions through the collaboration of several intelligent sensors, devices and data sources. An archetypal contextual CI scenario might be concerned with deriving affect-driven intelligence through multimodal emotion detection sources in a bid to determine the likability of one movie trailer over another. On the other hand, the key tenets to designing robust and evolutionary software and infrastructure architecture models to address cross-cutting quality concerns is of keen interest in the “Cloud” age of today. Some of the key quality concerns of interest in CI scenarios span the gamut of security and privacy, scalability, performance, fault-tolerance, and reliability. I present recent advances in CI system design with a focus on highlighting optimal solutions for the aforementioned cross-cutting concerns. I also describe a number of design challenges and a framework that I have determined to be critical to designing CI systems. With inspiration from machine learning, computational advertising, ubiquitous computing, and sociable robotics, this literature incorporates theories and concepts from various viewpoints to empower the collective intelligence engine, ZOEI, to discover affective state and emotional intent across multiple mediums. The discerned affective state is used in recommender systems among others to support content personalization. I dive into the design of optimal architectures that allow humans and intelligent systems to work collectively to solve complex problems. I present an evaluation of various studies that leverage the ZOEI framework to design collective intelligence

    Developers' Perception of Co-Change Patterns: An Empirical Study

    Get PDF
    International audienceCo-change clusters are groups of classes that frequently change together. They are proposed as an alternative modular view, which can be used to assess the traditional decomposition of systems in packages. To investigate developer's perception of co-change clusters, we report in this paper a study with experts on six systems, implemented in two languages. We mine 102 co-change clusters from the version history of such systems, which are classified in three patterns regarding their projection to the package structure: Encapsulated, Crosscutting, and Octopus. We then collect the perception of expert developers on such clusters, aiming to ask two central questions: (a) what concerns and changes are captured by the extracted clusters? (b) do the extracted clusters reveal design anomalies? We conclude that Encapsulated Clusters are often viewed as healthy designs and that Crosscutting Clusters tend to be associated to design anomalies. Octopus Clusters are normally associated to expected class distributions, which are not easy to implement in an encapsulated way, according to the interviewed developers
    corecore