2,202 research outputs found

    An Introduction to Programming for Bioscientists: A Python-based Primer

    Full text link
    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

    Nudge: Accelerating Overdue Pull Requests Towards Completion

    Full text link
    Pull requests are a key part of the collaborative software development and code review process today. However, pull requests can also slow down the software development process when the reviewer(s) or the author do not actively engage with the pull request. In this work, we design an end-to-end service, Nudge, for accelerating overdue pull requests towards completion by reminding the author or the reviewer(s) to engage with their overdue pull requests. First, we use models based on effort estimation and machine learning to predict the completion time for a given pull request. Second, we use activity detection to reduce false positives. Lastly, we use dependency determination to understand the blocker of the pull request and nudge the appropriate actor(author or reviewer(s)). We also do a correlation analysis to understand the statistical relationship between the pull request completion times and various pull request and developer related attributes. Nudge has been deployed on 147 repositories at Microsoft since 2019. We do a large scale evaluation based on the implicit and explicit feedback we received from sending the Nudge notifications on 8,500 pull requests. We observe significant reduction in completion time, by over 60%, for pull requests which were nudged thus increasing the efficiency of the code review process and accelerating the pull request progression

    Almost Rerere: Learning to resolve conflicts in distributed projects

    Get PDF
    The concurrent development of applications requires reconciling conflicting code updates by different developers. Recent research on the nature of merge conflicts in open source projects shows that a significant fraction of merge conflicts have limited size (one or two lines of code) and are resolved with simple strategies that use code present in the merged versions. Thus the opportunity arises of supporting the resolution of merge conflicts automatically by learning the way in which developers fix them. In this paper we propose a framework for automating the resolution of merge conflicts which learns from the resolutions made by developers and encodes such knowledge into conflict resolution rules applicable to conflicts not seen before. The proposed approach is text-based, does not depend on the programming languages of the merged files and exploits a well-known and general language (search and replacement regular expressions) to encode the conflict resolution rules. Evaluation results on 14,872 conflicts from 25 projects show that the system can synthesize a resolution for 49% of the conflicts occurred during the merge process (89% if one considers conflicts that have at least one similar conflict in the data set) and can reproduce exactly the same solution that human developers have applied in 55% of the cases (62% for single line conflicts)

    Decision Support System for Pull Requests Review Using Path-based Network Portrait Divergence and Visualization

    Get PDF
    Title from PDF of title page, viewed September 8, 2022Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 74-79)Thesis (M.S.)--Department of Computer Science Electrical Engineering. University of Missouri--Kansas City, 2022Pull requests are widely used in open-source and industrial environments to contribute and assess contributions. Unlike the typical code review process, pull requests provide a more lightweight approach for committing, reviewing, and managing code changes. Pull request code reviews also serve multiple objectives, including detecting problems in code, giving a venue to discuss code contributions, and supporting the easy integration of external contributions by project maintainers. The code changes and tests are written for a specific work item are contained in a pull request. Previous studies have reported that pull request review is crucial for software development and that reviewers do not spend more time on test files than on code files. At the same time, code reviewers are concerned that the tests that accompany the code modifications are adequate and cover all possible paths. The purpose of this research is to determine whether the test changes that go along with the code changes match the structural changes made in the Pull Request. The structural changes are determined using recent network comparison breakthroughs in prior work with GraphEvo. We also determine whether or not the visual representation and software metrics can support the software review process. We conducted a case study of 14 Java open-source projects, analyzing thousands of lines of code quality issues in 627 pull requests. We calculated the class level metrics, including network portrait divergence for each Pull request with and without change. In addition, for each pull request, we counted the number of existing test cases that failed due to the modification. Furthermore, correlations were investigated between class-level metrics, including network portrait divergence and tests that failed in pull requests.Introduction -- Related work -- Methodology -- Results and evaluations -- Conclusio

    Duplicate Defect Detection

    Get PDF
    Discovering and fixing faults is an unavoidable process in Software Engineering. It is always a good practice to document and organize fault reports. This facilitates the effectiveness of development and maintenance process. Bug Tracking Repositories, such as Bugzilla, are designed to provide fault reporting facilities for developers, testers and users of the system. Allowing anyone to contribute finding and reporting faults has an immediate impact on software quality. However, this benefit comes with one side-effect. Users often file reports that describe the same fault. This increases the triaging time spent by the maintainers. At the same time, important information required to fix the fault is likely to be distributed across different reports.;The objective of this thesis is twofold. First, we want to understand the dynamics of bug report filing for a large, long duration open source project, Firefox. Second, we present a new approach that can reduce the number of duplicate reports. The novel element in the proposed approach is the ability to concentrate the search for duplicates on specific portions of the bug repository. This improves the performance of Information Retrieval techniques and classification runtime of our algorithm. Our system can be deployed as a search tool to help reporters query the repository or it can be adopted to help maintainers detect duplicate reports. In both cases the performance is satisfactory. When tested as a search tool our system is able to detect up to 53% of duplicate reports. The approach adapted for maintainers has a maximum recall rate of 59%
    • …
    corecore