90 research outputs found

    Evolution of whole genomes through inversions:models and algorithms for duplicates, ancestors, and edit scenarios

    Get PDF
    Advances in sequencing technology are yielding DNA sequence data at an alarming rate – a rate reminiscent of Moore's law. Biologists' abilities to analyze this data, however, have not kept pace. On the other hand, the discrete and mechanical nature of the cell life-cycle has been tantalizing to computer scientists. Thus in the 1980s, pioneers of the field now called Computational Biology began to uncover a wealth of computer science problems, some confronting modern Biologists and some hidden in the annals of the biological literature. In particular, many interesting twists were introduced to classical string matching, sorting, and graph problems. One such problem, first posed in 1941 but rediscovered in the early 1980s, is that of sorting by inversions (also called reversals): given two permutations, find the minimum number of inversions required to transform one into the other, where an inversion inverts the order of a subpermutation. Indeed, many genomes have evolved mostly or only through inversions. Thus it becomes possible to trace evolutionary histories by inferring sequences of such inversions that led to today's genomes from a distant common ancestor. But unlike the classic edit distance problem where string editing was relatively simple, editing permutation in this way has proved to be more complex. In this dissertation, we extend the theory so as to make these edit distances more broadly applicable and faster to compute, and work towards more powerful tools that can accurately infer evolutionary histories. In particular, we present work that for the first time considers genomic distances between any pair of genomes, with no limitation on the number of occurrences of a gene. Next we show that there are conditions under which an ancestral genome (or one close to the true ancestor) can be reliably reconstructed. Finally we present new methodology that computes a minimum-length sequence of inversions to transform one permutation into another in, on average, O(n log n) steps, whereas the best worst-case algorithm to compute such a sequence uses O(n√n log n) steps

    Median clouds and a fast transposition median solver

    Get PDF
    The median problem seeks a permutation whose total distance to a given set of permutations (the base set) is minimal. This is an important problem in comparative genomics and has been studied for several distance measures such as reversals. The transposition distance is less relevant biologically, but it has been shown that it behaves similarly to the most important biological distances, and can thus give important information on their properties. We have derived an algorithm which solves the transposition median problem, giving all transposition medians (the median cloud). We show that our algorithm can be modified to accept median clouds as elements in the base set and briefly discuss the new concept of median iterates (medians of medians) and limit medians, that is the limit of this iterate

    Gene order rearrangement methods for the reconstruction of phylogeny

    Get PDF
    The study of phylogeny, i.e. the evolutionary history of species, is a central problem in biology and a key for understanding characteristics of contemporary species. Many problems in this area can be formulated as combinatorial optimisation problems which makes it particularly interesting for computer scientists. The reconstruction of the phylogeny of species can be based on various kinds of data, e.g. morphological properties or characteristics of the genetic information of the species. Maximum parsimony is a popular and widely used method for phylogenetic reconstruction aiming for an explanation of the observed data requiring the least evolutionary changes. A certain property of the genetic information gained much interest for the reconstruction of phylogeny in recent time: the organisation of the genomes of species, i.e. the arrangement of the genes on the chromosomes. But the idea to reconstruct phylogenetic information from gene arrangements has a long history. In Dobzhansky and Sturtevant (1938) it was already pointed out that “a comparison of the different gene arrangements in the same chromosome may, in certain cases, throw light on the historical relationships of these structures, and consequently on the history of the species as a whole”. This kind of data is promising for the study of deep evolutionary relationships because gene arrangements are believed to evolve slowly (Rokas and Holland, 2000). This seems to be the case especially for mitochondrial genomes which are available for a wide range of species (Boore, 1999). The development of methods for the reconstruction of phylogeny from gene arrangement data has made considerable progress during the last years. Prominent examples are the computation of parsimonious evolutionary scenarios, i.e. a shortest sequence of rearrangements transforming one arrangement of genes into another or the length of such a minimal scenario (Hannenhalli and Pevzner, 1995b; Sankoff, 1992; Watterson et al., 1982); the reconstruction of parsimonious phylogenetic trees from gene arrangement data (Bader et al., 2008; Bernt et al., 2007b; Bourque and Pevzner, 2002; Moret et al., 2002a); or the computation of the similarities of gene arrangements (Bergeron et al., 2008a; Heber et al., 2009). 1 1 Introduction The central theme of this work is to provide efficient algorithms for modified versions of fundamental genome rearrangement problems using more plausible rearrangement models. Two types of modified rearrangement models are explored. The first type is to restrict the set of allowed rearrangements as follows. It can be observed that certain groups of genes are preserved during evolution. This may be caused by functional constraints which prevented the destruction (Lathe et al., 2000; SĂ©mon and Duret, 2006; Xie et al., 2003), certain properties of the rearrangements which shaped the gene orders (Eisen et al., 2000; Sankoff, 2002; Tillier and Collins, 2000), or just because no destructive rearrangement happened since the speciation of the gene orders. It can be assumed that gene groups, found in all studied gene orders, are not acquired independently. Accordingly, these gene groups should be preserved in plausible reconstructions of the course of evolution, in particular the gene groups should be present in the reconstructed putative ancestral gene orders. This can be achieved by restricting the set of rearrangements, which are allowed for the reconstruction, to those which preserve the gene groups of the given gene orders. Since it is difficult to determine functionally what a gene group is, it has been proposed to consider common combinatorial structures of the gene orders as gene groups (Marcotte et al., 1999; Overbeek et al., 1999). The second considered modification of the rearrangement model is extending the set of allowed rearrangement types. Different types of rearrangement operations have shuffled the gene orders during evolution. It should be attempted to use the same set of rearrangement operations for the reconstruction otherwise distorted or even wrong phylogenetic conclusions may be obtained in the worst case. Both possibilities have been considered for certain rearrangement problems before. Restricted sets of allowed rearrangements have been used successfully for the computation of parsimonious rearrangement scenarios consisting of inversions only where the gene groups are identified as common intervals (BĂ©rard et al., 2007; Figeac and VarrĂ©, 2004). Extending the set of allowed rearrangement operations is a delicate task. On the one hand it is unknown which rearrangements have to be regarded because this is part of the phylogeny to be discovered. On the other hand, efficient exact rearrangement methods including several operations are still rare, in particular when transpositions should be included. For example, the problem to compute shortest rearrangement scenarios including transpositions is still of unknown computational complexity. Currently, only efficient approximation algorithms are known (e.g. Bader and Ohlebusch, 2007; Elias and Hartman, 2006). Two problems have been studied with respect to one or even both of these possibilities in the scope of this work. The first one is the inversion median problem. Given the gene orders of some taxa, this problem asks for potential ancestral gene orders such that the corresponding inversion scenario is parsimonious, i.e. has a minimum length. Solving this problem is an essential component 2 of algorithms for computing phylogenetic trees from gene arrangements (Bourque and Pevzner, 2002; Moret et al., 2002a, 2001). The unconstrained inversion median problem is NP-hard (Caprara, 2003). In Chapter 3 the inversion median problem is studied under the additional constraint to preserve gene groups of the input gene orders. Common intervals, i.e. sets of genes that appear consecutively in the gene orders, are used for modelling gene groups. The problem of finding such ancestral gene orders is called the preserving inversion median problem. Already the problem of finding a shortest inversion scenario for two gene orders is NP-hard (Figeac and VarrĂ©, 2004). Mitochondrial gene orders are a rich source for phylogenetic investigations because they are known for more than 1 000 species. Four rearrangement operations are reported at least in the literature to be relevant for the study of mitochondrial gene order evolution (Boore, 1999): That is inversions, transpositions, inverse transpositions, and tandem duplication random loss (TDRL). Efficient methods for a plausible reconstruction of genome rearrangements for mitochondrial gene orders using all four operations are presented in Chapter 4. An important rearrangement operation, in particular for the study of mitochondrial gene orders, is the tandem duplication random loss operation (e.g. Boore, 2000; Mauro et al., 2006). This rearrangement duplicates a part of a gene order followed by the random loss of one of the redundant copies of each gene. The gene order is rearranged depending on which copy is lost. This rearrangement should be regarded for reconstructing phylogeny from gene order data. But the properties of this rearrangement operation have rarely been studied (Bouvel and Rossin, 2009; Chaudhuri et al., 2006). The combinatorial properties of the TDRL operation are studied in Chapter 5. The enumeration and counting of sorting TDRLs, that is TDRL operations reducing the distance, is studied in particular. Closed formulas for computing the number of sorting TDRLs and methods for the enumeration are presented. Furthermore, TDRLs are one of the operations considered in Chapter 4. An interesting property of this rearrangement, distinguishing it from other rearrangements, is its asymmetry. That is the effects of a single TDRL can (in the most cases) not be reversed with a single TDRL. The use of this property for phylogeny reconstruction is studied in Section 4.3. This thesis is structured as follows. The existing approaches obeying similar types of modified rearrangement models as well as important concepts and computational methods to related problems are reviewed in Chapter 2. The combinatorial structures of gene orders that have been proposed for identifying gene groups, in particular common intervals, as well as the computational approaches for their computation are reviewed in Section 2.2. Approaches for computing parsimonious pairwise rearrangement scenarios are outlined in Section 2.3. Methods for the computation genome rearrangement scenarios obeying biologically motivated constraints, as introduced above, are detailed in Section 2.4. The approaches for the inversion median problem are covered in Section 2.5. Methods for the reconstruction of phylogenetic trees from gene arrangement data are briefly outlined in Section 2.6.3 1 Introduction Chapter 3 introduces the new algorithms CIP, ECIP, and TCIP for solving the preserving inversion median problem. The efficiency of the algorithm is empirically studied for simulated as well as mitochondrial data. The description of algorithms CIP and ECIP is based on Bernt et al. (2006b). TCIP has been described in Bernt et al. (2007a, 2008b). But the theoretical foundation of TCIP is extended significantly within this work in order to allow for more than three input permutations. Gene order rearrangement methods that have been developed for the reconstruction of the phylogeny of mitochondrial gene orders are presented in the fourth chapter. The presented algorithm CREx computes rearrangement scenarios for pairs of gene orders. CREx regards the four types of rearrangement operations which are important for mitochondrial gene orders. Based on CREx the algorithm TreeREx for assigning rearrangement events to a given tree is developed. The quality of the CREx reconstructions is analysed in a large empirical study for simulated gene orders. The results of TreeREx are analysed for several mitochondrial data sets. Algorithms CREx and TreeREx have been published in Bernt et al. (2008a, 2007c). The analysis of the mitochondrial gene orders of Echinodermata was included in Perseke et al. (2008). Additionally, a new and simple method is presented to explore the potential of the CREx method. The new method is applied to the complete mitochondrial data set. The problem of enumerating and counting sorting TDRLs is studied in Chapter 5. The theoretical results are covered to a large extent by Bernt et al. (2009b). The missing combinatorial explanation for some of the presented formulas is given here for the first time. Therefor, a new method for the enumeration and counting of sorting TDRLs has been developed (Bernt et al., 2009a)

    Design and Optimization in Near-term Quantum Computation

    Get PDF
    Quantum computers have come a long way since conception, and there is still a long way to go before the dream of universal, fault-tolerant computation is realized. In the near term, quantum computers will occupy a middle ground that is popularly known as the “Noisy, Intermediate-Scale Quantum” (or NISQ) regime. The NISQ era represents a transition in the nature of quantum devices from experimental to computational. There is significant interest in engineering NISQ devices and NISQ algorithms in a manner that will guide the development of quantum computation in this regime and into the era of fault-tolerant quantum computing. In this thesis, we study two aspects of near-term quantum computation. The first of these is the design of device architectures, covered in Chapters 2, 3, and 4. We examine different qubit connectivities on the basis of their graph properties, and present numerical and analytical results on the speed at which large entangled states can be created on nearest-neighbor grids and graphs with modular structure. Next, we discuss the problem of permuting qubits among the nodes of the connectivity graph using only local operations, also known as routing. Using a fast quantum primitive to reverse the qubits in a chain, we construct a hybrid, quantum/classical routing algorithm on the chain. We show via rigorous bounds that this approach is faster than any SWAP-based algorithm for the same problem. The second part, which spans the final three chapters, discusses variational algorithms, which are a class of algorithms particularly suited to near-term quantum computation. Two prototypical variational algorithms, quantum adiabatic optimization (QAO) and the quantum approximate optimization algorithm (QAOA), are studied for the difference in their control strategies. We show that on certain crafted problem instances, bang-bang control (QAOA) can be as much as exponentially faster than quasistatic control (QAO). Next, we demonstrate the performance of variational state preparation on an analog quantum simulator based on trapped ions. We show that using classical heuristics that exploit structure in the variational parameter landscape, one can find circuit parameters efficiently in system size as well as circuit depth. In the experiment, we approximate the ground state of a critical Ising model with long-ranged interactions on up to 40 spins. Finally, we study the performance of Local Tensor, a classical heuristic algorithm inspired by QAOA on benchmarking instances of the MaxCut problem, and suggest physically motivated choices for the algorithm hyperparameters that are found to perform well empirically. We also show that our implementation of Local Tensor mimics imaginary-time quantum evolution under the problem Hamiltonian

    Modelling Italian potential output and the output gap

    Get PDF
    The aim of the paper is to estimate a reliable quarterly time-series of potential output for the Italian economy, exploiting four alternative approaches: a Bayesian unobserved component method, a univariate time-varying autoregressive model, a production function approach and a structural VAR. Based on a wide range of evaluation criteria, all methods generate output gaps that accurately describe the Italian business cycle over the past three decades. All output gap measures are subject to non-negligible revisions when new data become available. Nonetheless they still prove to be informative about the current cyclical phase and, unlike the evidence reported in most of the literature, helpful at predicting inflation compared with simple benchmarks. We assess also the performance of output gap estimates obtained by combining the four original indicators, using either equal weights or Bayesian averaging, showing that the resulting measures (i) are less sensitive to revisions; (ii) are at least as good as the originals at tracking business cycle fluctuations; (iii) are more accurate as inflation predictors.potential output, business cycle, Phillips curve, output gap

    Slowdown in immigration, labor shortages, and declining skill premia

    Get PDF
    We document a steady decline in low-skilled immigration that began with the onset of the Great Recession in 2007, which was associated with labor shortages in low-skilled service occupations and a decline in the skill premium. Falling returns to high-skilled jobs coincided with a decline in the educational attainment of native-born workers. We develop and estimate a stochastic growth model with endogenous immigration and training to account for these facts and study macroeconomic performance and welfare. Lower immigration leads to higher wages for low-skilled workers and higher consumer prices. Importantly, the decline in the skill premium discourages the training of native workers, persistently reducing aggregate productivity and welfare. Stimulus policies during the COVID-19 pandemic, amid a widespread shortage of low-skilled immigrant labor, exacerbated the rise in consumer prices and reduced welfare. We show that the 2021-2023 immigration surge helped to partially alleviate existing labor shortages and restore welfare

    A Methodology for policy analysis and spatial conflicts in transport policies

    Get PDF

    Spatial Dependence and Heterogeneity in Empirical Analyses of Regional Labour Market Dynamics

    Get PDF
    Are regions within a country really independent islands? Do economic relations and effects really have a homogenous, unique size across an entire country? These two assumptions are often imposed implicitly in empirical economic and social research. In his doctoral thesis, the author discusses how statistical methods can deviate from this unrealistic model structure through employing spatial patterns in both observable variables and presumed relations. Opportunities to improve our understanding of the economy as well as chances and perils in the application of such methods are demonstrated in a number of studies on aspects of regional labour market dynamics.Warum sollen Regionen innerhalb eines Landes unabhĂ€ngige Inseln sein? Und warum sollen, ĂŒber das gesamte Land hinweg, einheitlich starke ökonomische oder soziale WirkungszusammenhĂ€nge bestehen? Diese zwei Annahmen werden in der angewandten empirischen Wirtschafts- und Sozialforschung ĂŒblicherweise implizit unterstellt. Wie in statistischen Verfahren von dieser unrealistischen Modellstruktur unter Ausnutzung der rĂ€umlichen Strukturen in beobachteten Variablen und unterstellten ZusammenhĂ€ngen abgewichen werden kann, diskutiert der Autor im vorliegenden Band. Möglichkeiten, unser VerstĂ€ndnis der Ökonomie zu vertiefen, werden ebenso verdeutlicht, wie Chancen und TĂŒcken beim Einsatz der Methoden in Studien zu verschiedenen Aspekten der Arbeitsmarktdynamik

    Development, evolution and genetic analysis of C. elegans-inspired foraging algorithms under different environmental conditions

    Get PDF
    In this work 3 minimalist bio-inspired foraging algorithms based on C. elegans’ chemotaxis and foraging behaviour were developed and investigated. The main goal of the work is to apply the algorithms to robots with limited sensing capabilities. The refined versions of these algorithms were developed and optimised in 22 different environments. The results were processed using a novel set of techniques presented here, named Genotype Clustering. The results lead to two distinct conclusions, one practical and one more academic. From a practical perspective, the results suggest that, when suitably tuned, minimalist C. elegans-inspired foraging algorithms can lead to effective navigation to unknown targets even in the presence of repellents and under the influence of a significant sensor noise. From an academic perspective, the work demonstrates that even simple models can serve as an interesting and informative testbed for exploring fundamental evolutionary principles. The simulated robots were grounded in real hardware parameters, aiming at future application of the foraging algorithms in real robots. Another achievement of the project was the development of the simulation framework that provides a simple yet flexible program for the development and optimisation of behavioural algorithms

    Human Settlement Systems: Spatial Patterns and Trends

    Get PDF
    The papers in this volume were originally presented at a conference on the analysis of human settlement systems held at IIASA. This meeting closed an IIASA research activity, started in 1975, that had the goals of identifying functional urban regions in several industrialized countries and making comparative analyses of their population and employment trends to enhance our understanding of the spatial and temporal evolution of human settlement systems. This research on human settlement systems and strategies established a wide international collaborative network and created a sizeable data base for examining demographic and economic changes. This book presents the findings of some of this work
    • 

    corecore