134 research outputs found

    Analysis and Interactive Visualization of Software Bug Reports

    Get PDF
    A software Bug report contains information about the bug in the form of problem description and comments using natural language texts. Managing reported bugs is a significant challenge for a project manager when the number of bugs for a software project is large. Prior to the assignment of a newly reported bug to an appropriate developer, the triager (e.g., manager) attempts to categorize it into existing categories and looks for duplicate bugs. The goal is to reuse existing knowledge to fix or resolve the new bug, and she often spends a lot of time in reading a number of bug reports. When fixing or resolving a bug, a developer also consults with a series of relevant bug reports from the repository in order to maximize the knowledge required for the fixation. It is also preferable that developers new to a project first familiarize themselves with the project along with the reported bugs before actually working on the project. Because of the sheer numbers and size of the bug reports, manually analyzing a collection of bug reports is time-consuming and ineffective. One of the ways to mitigate the problem is to analyze summaries of the bug reports instead of analyzing full bug reports, and there have been a number of summarization techniques proposed in the literature. Most of these techniques generate extractive summaries of bug reports. However, it is not clear how useful those generated extractive summaries are, in particular when the developers do not have prior knowledge of the bug reports. In order to better understand the usefulness of the bug report summaries, in this thesis, we first reimplement a state of the art unsupervised summarization technique and evaluate it with a user study with nine participants. Although in our study, 70% of the time participants marked our developed summaries as a reliable means of comprehending the software bugs, the study also reports a practical problem with extractive summaries. An extractive summary is often created by choosing a certain number of statements from the bug report. The statements are extracted out of their contexts, and thus often lose their consistency, which makes it hard for a manager or a developer to comprehend the reported bug from the extractive summary. Based on the findings from the user study and in order to further assist the managers as well as the developers, we thus propose an interactive visualization for the bug reports that visualizes not only the extractive summaries but also the topic evolution of the bug reports. Topic evolution refers to the evolution of technical topics discussed in the bug reports of a software system over a certain time period. Our visualization technique interactively visualizes such information which can help in different project management activities. Our proposed visualization also highlights the summary statements within their contexts in the original report for easier comprehension of the reported bug. In order to validate the applicability of our proposed visualization technique, we implement the technique as a standalone tool, and conduct both a case study with 3914 bug reports and a user study with six participants. The experiments in the case study show that our topic analysis can reveal useful keywords or other insightful information about the bug reports for aiding the managers or triagers in different management activities. The findings from the user study also show that our proposed visualization technique is highly promising for easier comprehension of the bug reports

    Automatic sentence annotation for more useful bug report summarization

    Get PDF
    Bug reports are a useful software artifact with software developers referring to them for various information needs. As bug reports can become long, users of bug reports may need to spend a lot of time reading them. Previous studies developed summarizers and the quality of summaries was determined based on human-created gold-standard summaries. We believe creating such summaries for evaluating summarizers is not a good practice. First, we have observed a high level of disagreement between the annotated summaries. Second, the number of annotators involved is lower than the established minimum for the creation of a stable annotated summary. Finally, the traditional fixed threshold of 25% of the bug report word count does not adequately serve the different information needs. Consequently, we developed an automatic sentence annotation method to identify content in bug report comments which allows bug report users to customize a view for their task-dependent information needs

    EFFECTIVE METHODS AND TOOLS FOR MINING APP STORE REVIEWS

    Get PDF
    Research on mining user reviews in mobile application (app) stores has noticeably advanced in the past few years. The main objective is to extract useful information that app developers can use to build more sustainable apps. In general, existing research on app store mining can be classified into three genres: classification of user feedback into different types of software maintenance requests (e.g., bug reports and feature requests), building practical tools that are readily available for developers to use, and proposing visions for enhanced mobile app stores that integrate multiple sources of user feedback to ensure app survivability. Despite these major advances, existing tools and techniques still suffer from several drawbacks. Specifically, the majority of techniques rely on the textual content of user reviews for classification. However, due to the inherently diverse and unstructured nature of user-generated online textual reviews, text-based review mining techniques often produce excessively complicated models that are prone to over-fitting. Furthermore, the majority of proposed techniques focus on extracting and classifying the functional requirements in mobile app reviews, providing a little or no support for extracting and synthesizing the non-functional requirements (NFRs) raised in user feedback (e.g., security, reliability, and usability). In terms of tool support, existing tools are still far from being adequate for practical applications. In general, there is a lack of off-the-shelf tools that can be used by researchers and practitioners to accurately mine user reviews. Motivated by these observations, in this dissertation, we explore several research directions aimed at addressing the current issues and shortcomings in app store review mining research. In particular, we introduce a novel semantically aware approach for mining and classifying functional requirements from app store reviews. This approach reduces the dimensionality of the data and enhances the predictive capabilities of the classifier. We then present a two-phase study aimed at automatically capturing the NFRs in user reviews. We also introduce MARC, a tool that enables developers to extract, classify, and summarize user reviews

    Interactive Natural Language Processing for Clinical Text

    Get PDF
    Free-text allows clinicians to capture rich information about patients in narratives and first-person stories. Care providers are likely to continue using free-text in Electronic Medical Records (EMRs) for the foreseeable future due to convenience and utility offered. However, this complicates information extraction tasks for big-data applications. Despite advances in Natural Language Processing (NLP) techniques, building models on clinical text is often expensive and time-consuming. Current approaches require a long collaboration between clinicians and data-scientists. Clinicians provide annotations and training data, while data-scientists build the models. With the current approaches, the domain experts - clinicians and clinical researchers - do not have provisions to inspect these models or give direct feedback. This forms a barrier to NLP adoption and limits its power and utility for real-world clinical applications. Interactive learning systems may allow clinicians without machine learning experience to build NLP models on their own. Interactive methods are particularly attractive for clinical text due to the diversity of tasks that need customized training data. Interactivity could enable end-users (clinicians) to review model outputs and provide feedback for model revisions within an closed feedback loop. This approach may make it feasible to extract understanding from unstructured text in patient records; classifying documents against clinical concepts, summarizing records and other sophisticated NLP tasks while reducing the need for prior annotations and training data upfront. In my dissertation, I demonstrate this approach by building and evaluating prototype systems for both clinical care and research applications. I built NLPReViz as an interactive tool for clinicians to train and build binary NLP models on their own for retrospective review of colonoscopy procedure notes. Next, I extended this effort to design an intelligent signout tool to identify incidental findings in a clinical care setting. I followed a two-step evaluation with clinicians as study participants: a usability evaluation to demonstrate feasibility and overall usefulness of the tool, followed by an empirical evaluation to evaluate model correctness and utility. Lessons learned from the development and evaluation of these prototypes will provide insight into the generalized design of interactive NLP systems for wider clinical applications

    Improving Automated Software Testing while re-engineering legacy systems in the absence of documentation

    Get PDF
    Legacy software systems are essential assets that contain an organizations' valuable business logic. Because of outdated technologies and methods used in these systems, they are challenging to maintain and expand. Therefore, organizations need to decide whether to redevelop or re-engineer the legacy system. Although in most cases, re-engineering is the safer and less expensive choice, it has risks such as failure to meet the expected quality and delays due to testing blockades. These risks are even more severe when the legacy system does not have adequate documentation. A comprehensive testing strategy, which includes automated tests and reliable test cases, can substantially reduce the risks. To mitigate the hazards associated with re-engineering, we have conducted three studies in this thesis to improve the testing process. Our rst study introduces a new testing model for the re-engineering process and investigates test automation solutions to detect defects in the early re-engineering stages. We implemented this model on the Cold Region Hydrological Model (CRHM) application and discovered bugs that would not likely have been found manually. Although this approach helped us discover great numbers of software defects, designing test cases is very time-consuming due to the lack of documentation, especially for large systems. Therefore, in our second study, we investigated an approach to generate test cases from user footprints automatically. To do this, we extended an existing tool to collect user actions and legacy system reactions, including database and le system changes. Then we analyzed the data based on the order of user actions and time of them and generated human-readable test cases. Our evaluation shows that this approach can detect more bugs than other existing tools. Moreover, the test cases generated using this approach contain detailed oracles that make them suitable for both black-box and white-box testing. Many scienti c legacy systems such as CRHM are data-driven; they take large amounts of data as input and produce massive data after applying mathematical models. Applying test cases and nding bugs is more demanding when we are dealing with large amounts of data. Hence in our third study, we created a comparative visualization tool (ComVis) to compare a legacy system's output after each change. Visualization helps testers to nd data issues resulting from newly introduced bugs. Twenty participants took part in a user study in which they were asked to nd data issued using ComVis and embedded CRHM visualization tool. Our user study shows that ComVis can nd 51% more data issues than embedded visualization tools in the legacy system can. Also, results from the NASA-TLX assessment and thematic analysis of open-ended questions about each task show users prefer to use ComVis over the built-in visualization tool. We believe our introduced approaches and developed systems will signi cantly reduce the risks associated with the re-engineering process. i

    Holistic recommender systems for software engineering

    Get PDF
    The knowledge possessed by developers is often not sufficient to overcome a programming problem. Short of talking to teammates, when available, developers often gather additional knowledge from development artifacts (e.g., project documentation), as well as online resources. The web has become an essential component in the modern developer’s daily life, providing a plethora of information from sources like forums, tutorials, Q&A websites, API documentation, and even video tutorials. Recommender Systems for Software Engineering (RSSE) provide developers with assistance to navigate the information space, automatically suggest useful items, and reduce the time required to locate the needed information. Current RSSEs consider development artifacts as containers of homogeneous information in form of pure text. However, text is a means to represent heterogeneous information provided by, for example, natural language, source code, interchange formats (e.g., XML, JSON), and stack traces. Interpreting the information from a pure textual point of view misses the intrinsic heterogeneity of the artifacts, thus leading to a reductionist approach. We propose the concept of Holistic Recommender Systems for Software Engineering (H-RSSE), i.e., RSSEs that go beyond the textual interpretation of the information contained in development artifacts. Our thesis is that modeling and aggregating information in a holistic fashion enables novel and advanced analyses of development artifacts. To validate our thesis we developed a framework to extract, model and analyze information contained in development artifacts in a reusable meta- information model. We show how RSSEs benefit from a meta-information model, since it enables customized and novel analyses built on top of our framework. The information can be thus reinterpreted from an holistic point of view, preserving its multi-dimensionality, and opening the path towards the concept of holistic recommender systems for software engineering

    Continuous Rationale Management

    Get PDF
    Continuous Software Engineering (CSE) is a software life cycle model open to frequent changes in requirements or technology. During CSE, software developers continuously make decisions on the requirements and design of the software or the development process. They establish essential decision knowledge, which they need to document and share so that it supports the evolution and changes of the software. The management of decision knowledge is called rationale management. Rationale management provides an opportunity to support the change process during CSE. However, rationale management is not well integrated into CSE. The overall goal of this dissertation is to provide workflows and tool support for continuous rationale management. The dissertation contributes an interview study with practitioners from the industry, which investigates rationale management problems, current practices, and features to support continuous rationale management beneficial for practitioners. Problems of rationale management in practice are threefold: First, documenting decision knowledge is intrusive in the development process and an additional effort. Second, the high amount of distributed decision knowledge documentation is difficult to access and use. Third, the documented knowledge can be of low quality, e.g., outdated, which impedes its use. The dissertation contributes a systematic mapping study on recommendation and classification approaches to treat the rationale management problems. The major contribution of this dissertation is a validated approach for continuous rationale management consisting of the ConRat life cycle model extension and the comprehensive ConDec tool support. To reduce intrusiveness and additional effort, ConRat integrates rationale management activities into existing workflows, such as requirements elicitation, development, and meetings. ConDec integrates into standard development tools instead of providing a separate tool. ConDec enables lightweight capturing and use of decision knowledge from various artifacts and reduces the developers' effort through automatic text classification, recommendation, and nudging mechanisms for rationale management. To enable access and use of distributed decision knowledge documentation, ConRat defines a knowledge model of decision knowledge and other artifacts. ConDec instantiates the model as a knowledge graph and offers interactive knowledge views with useful tailoring, e.g., transitive linking. To operationalize high quality, ConRat introduces the rationale backlog, the definition of done for knowledge documentation, and metrics for intra-rationale completeness and decision coverage of requirements and code. ConDec implements these agile concepts for rationale management and a knowledge dashboard. ConDec also supports consistent changes through change impact analysis. The dissertation shows the feasibility, effectiveness, and user acceptance of ConRat and ConDec in six case study projects in an industrial setting. Besides, it comprehensively analyses the rationale documentation created in the projects. The validation indicates that ConRat and ConDec benefit CSE projects. Based on the dissertation, continuous rationale management should become a standard part of CSE, like automated testing or continuous integration

    Natural Language Processing in-and-for Design Research

    Full text link
    We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains
    corecore