300,240 research outputs found

    ARIES: Acquisition of Requirements and Incremental Evolution of Specifications

    Get PDF
    This paper describes a requirements/specification environment specifically designed for large-scale software systems. This environment is called ARIES (Acquisition of Requirements and Incremental Evolution of Specifications). ARIES provides assistance to requirements analysts for developing operational specifications of systems. This development begins with the acquisition of informal system requirements. The requirements are then formalized and gradually elaborated (transformed) into formal and complete specifications. ARIES provides guidance to the user in validating formal requirements by translating them into natural language representations and graphical diagrams. ARIES also provides ways of analyzing the specification to ensure that it is correct, e.g., testing the specification against a running simulation of the system to be built. Another important ARIES feature, especially when developing large systems, is the sharing and reuse of requirements knowledge. This leads to much less duplication of effort. ARIES combines all of its features in a single environment that makes the process of capturing a formal specification quicker and easier

    Mapping the Landscape of Mutation Rate Heterogeneity in the Human Genome: Approaches and Applications

    Full text link
    All heritable genetic variation is ultimately the result of mutations that have occurred in the past. Understanding the processes which determine the rate and spectra of new mutations is therefore fundamentally important in efforts to characterize the genetic basis of heritable disease, infer the timing and extent of past demographic events (e.g., population expansion, migration), or identify signals of natural selection. This dissertation aims to describe patterns of mutation rate heterogeneity in detail, identify factors contributing to this heterogeneity, and develop methods and tools to harness such knowledge for more effective and efficient analysis of whole-genome sequencing data. In Chapters 2 and 3, we catalog granular patterns of germline mutation rate heterogeneity throughout the human genome by analyzing extremely rare variants ascertained from large-scale whole-genome sequencing datasets. In Chapter 2, we describe how mutation rates are influenced by local sequence context and various features of the genomic landscape (e.g., histone marks, recombination rate, replication timing), providing detailed insight into the determinants of single-nucleotide mutation rate variation. We show that these estimates reflect genuine patterns of variation among de novo mutations, with broad potential for improving our understanding of the biology of underlying mutation processes and the consequences for human health and evolution. These estimated rates are publicly available at http://mutation.sph.umich.edu/. In Chapter 3, we introduce a novel statistical model to elucidate the variation in rate and spectra of multinucleotide mutations throughout the genome. We catalog two major classes of multinucleotide mutations: those resulting from error-prone translesion synthesis, and those resulting from repair of double-strand breaks. In addition, we identify specific hotspots for these unique mutation classes and describe the genomic features associated with their spatial variation. We show how these multinucleotide mutation processes, along with sample demography and mutation rate heterogeneity, contribute to the overall patterns of clustered variation throughout the genome, promoting a more holistic approach to interpreting the source of these patterns. In chapter 4, we develop Helmsman, a computationally efficient software tool to infer mutational signatures in large samples of cancer genomes. By incorporating parallelization routines and efficient programming techniques, Helmsman performs this task up to 300 times faster and with a memory footprint 100 times smaller than existing mutation signature analysis software. Moreover, Helmsman is the only such program capable of directly analyzing arbitrarily large datasets. The Helmsman software can be accessed at https://github.com/carjed/helmsman. Finally, in Chapter 5, we present a new method for quality control in large-scale whole-genome sequencing datasets, using a combination of dimensionality reduction algorithms and unsupervised anomaly detection techniques. Just as the mutation spectrum can be used to infer the presence of underlying mechanisms, we show that the spectrum of rare variation is a powerful and informative indicator of sample sequencing quality. Analyzing three large-scale datasets, we demonstrate that our method is capable of identifying samples affected by a variety of technical artifacts that would otherwise go undetected by standard ad hoc filtering criteria. We have implemented this method in a software package, Doomsayer, available at https://github.com/carjed/doomsayer.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147537/1/jedidiah_1.pd

    How do Microservices Evolve?:An Empirical Analysis of Changes in Open-Source Microservice Repositories

    Get PDF
    Context.Microservice architectures are an emergent service-oriented paradigm widely used in industry to develop and deploy scalable software systems. The underlying idea is to design highly independent services that implement small units of functionality and can interact with each other through lightweight interfaces.Objective.Even though microservices are often used with success, their design and maintenance pose novel challenges to software engineers. In particular, it is questionable whether the intended independence of microservices can actually be achieved in practice.Method.So, it is important to understand how and why microservices evolve during a system’s life-cycle, for instance, to scope refactorings and improvements of a system’s architecture or to develop supporting tools. To provide insights into how microservices evolve, we report a large-scale empirical study on the (co-)evolution of microservices in 11 open-source systems, involving quantitative and qualitative analyses of 7,319 commits.Findings.Our quantitative results show that there are recurring patterns of (co-)evolution across all systems, for instance, “shotgun surgery” commits and microservices that are largely independent, evolve in tuples, or are evolved in almost all changes. We refine our results by analyzing service-evolving commits qualitatively to explore the (in-)dependence of microservices and the causes for their specific evolution.Conclusion.The contributions in this article provide an understanding for practitioners and researchers on how microservices evolve in what way, and how microservice-based systems may be improved

    The Evolutionary Analysis of Emerging Low Frequency HIV-1 CXCR4 Using Variants through Time—An Ultra-Deep Approach

    Get PDF
    Large-scale parallel pyrosequencing produces unprecedented quantities of sequence data. However, when generated from viral populations current mapping software is inadequate for dealing with the high levels of variation present, resulting in the potential for biased data loss. In order to apply the 454 Life Sciences' pyrosequencing system to the study of viral populations, we have developed software for the processing of highly variable sequence data. Here we demonstrate our software by analyzing two temporally sampled HIV-1 intra-patient datasets from a clinical study of maraviroc. This drug binds the CCR5 coreceptor, thus preventing HIV-1 infection of the cell. The objective is to determine viral tropism (CCR5 versus CXCR4 usage) and track the evolution of minority CXCR4-using variants that may limit the response to a maraviroc-containing treatment regimen. Five time points (two prior to treatment) were available from each patient. We first quantify the effects of divergence on initial read k-mer mapping and demonstrate the importance of utilizing population-specific template sequences in relation to the analysis of next-generation sequence data. Then, in conjunction with coreceptor prediction algorithms that infer HIV tropism, our software was used to quantify the viral population structure pre- and post-treatment. In both cases, low frequency CXCR4-using variants (2.5–15%) were detected prior to treatment. Following phylogenetic inference, these variants were observed to exist as distinct lineages that were maintained through time. Our analysis, thus confirms the role of pre-existing CXCR4-using virus in the emergence of maraviroc-insensitive HIV. The software will have utility for the study of intra-host viral diversity and evolution of other fast evolving viruses, and is available from http://www.bioinf.manchester.ac.uk/segminator/

    Prevalence of Code Smells in Reinforcement Learning Projects

    Full text link
    Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed by RL engineers, which as a consequence, may lead to poor program quality yielding bugs, suboptimal performance, maintainability, and evolution problems for RL-based projects. In this paper we begin the exploration of this hypothesis, specific to code utilizing RL, analyzing different projects found in the wild, to assess their quality from a software engineering perspective. Our study includes 24 popular RL-based Python projects, analyzed with standard software engineering metrics. Our results, aligned with similar analyses for ML code in general, show that popular and widely reused RL repositories contain many code smells (3.95% of the code base on average), significantly affecting the projects' maintainability. The most common code smells detected are long method and long method chain, highlighting problems in the definition and interaction of agents. Detected code smells suggest problems in responsibility separation, and the appropriateness of current abstractions for the definition of RL algorithms.Comment: Paper preprint for the 2nd International Conference on AI Engineering Software Engineering for AI CAIN202

    Code Flows: Visualizing Structural Evolution of Source Code

    Get PDF
    Understanding detailed changes done to source code is of great importance in software maintenance. We present Code Flows, a method to visualize the evolution of source code geared to the understanding of fine and mid-level scale changes across several file versions. We enhance an existing visual metaphor to depict software structure changes with techniques that emphasize both following unchanged code as well as detecting and highlighting important events such as code drift, splits, merges, insertions and deletions. The method is illustrated with the analysis of a real-world C++ code system.

    Code Flows: Visualizing Structural Evolution of Source Code

    Get PDF
    Understanding detailed changes done to source code is of great importance in software maintenance. We present Code Flows, a method to visualize the evolution of source code geared to the understanding of fine and mid-level scale changes across several file versions. We enhance an existing visual metaphor to depict software structure changes with techniques that emphasize both following unchanged code as well as detecting and highlighting important events such as code drift, splits, merges, insertions and deletions. The method is illustrated with the analysis of a real-world C++ code system.
    corecore