544 research outputs found

    Analysis of game playing agents with fingerprints

    Get PDF
    Evolutionary computation (EC) can create a vast number of strategies for playing simple games in a short time. Analysis of these strategies is typically more time-consuming than their production. As a result, analysis of strategies produced by an EC system is often lacking or restricted to the extraction of superficial summary Statistics and Probability; This thesis presents a technique for extracting a functional signature from evolved agents that play games. This signature can be used as a visualization of agent behavior in games with two moves and also provides a numerical target for clustering and other forms of automatic analysis. The fingerprint can be used to induce a similarity measure on the space of game strategies. This thesis develops fingerprints in the context of the iterated prisoner\u27s dilemma; we note that they can be computed for any two player simultaneous game with a finite set of moves. When using a clustering algorithm, the results are strongly influenced by the choice of the measure used to find the distance between or to compare the similarity of the data being clustered. The Euclidean metric, for example, rates a convex polytope as the most compact type of object and builds clusters that are contained in compact polytopes. Presented here is a general method, called multi-clustering, that compensates for the intrinsic shape of a metric or similarity measure. The method is tested on synthetic data sets that are natural for the Euclidean metric and on data sets designed to defeat k-means clustering with the Euclidean metric. Multi-clustering successfully discovers the designed cluster structure of all the synthetic data sets used with a minimum of parameter tuning. We then use multi-clustering and filtration on fingerprint data. Cellular representation is the practice of evolving a set of instructions for constructing a desired structure. This thesis presents a cellular encoding for finite state machines and specializes it to play the iterated prisoner\u27s dilemma. The impact on the character and behavior of finite state agents of using the cellular representation is investigated. For the cellular representation resented a statistically significant drop in the level of cooperation is found. Other differences in the character of the automaton generated with a direct and cellular representation are reported

    On the design of architecture-aware algorithms for emerging applications

    Get PDF
    This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot

    Problems Related to Classical and Universal List Broadcasting

    Get PDF
    Broadcasting is a fundamental problem in the information dissemination area. In classical broadcasting, a message must be sent from one network member to all other members as rapidly as feasible. Although it has been demonstrated that this problem is NP-Hard for arbitrary graphs, it has several applications in various fields. As a result, the universal lists model, replicating real-world restrictions like the memory limits of nodes in large networks, is introduced as a branch of this problem in the literature. In the universal lists model, each node is equipped with a fixed list and has to follow the list regardless of the originator. In this study, we focus on both classical and universal lists broadcasting. Classical broadcasting is solvable for a few families of networks, such as trees, unicyclic graphs, tree of cycles, and tree of cliques. In this study, we begin by presenting an optimal algorithm that finds the broadcast time of any vertex in a Fully Connected Tree (FCT_n) in O(|V | log log n) time. An FCT_n is formed by attaching arbitrary trees to vertices of a complete graph of size n where |V| is the total number of vertices in the graph. Then, we replace the complete graph with a Hypercube H_k and propose a new heuristic for the Hypercube of Trees (HT_k). Not only does this heuristic have the same approximation ratio as the best-known algorithm, but our numerical results also show its superiority in most experiments. Our heuristic is able to outperform the current upper bound in up to 90% of the situations, resulting in an average speedup of 30%. Most importantly, our results illustrate that it can maintain its performance even if the network size grows, making the proposed heuristic practically useful. Afterward, we focus on broadcasting with universal lists, in which once a vertex is informed, it must follow its corresponding list, regardless of the originator and the neighbor from which it received the message. The problem of broadcasting with universal lists could be categorized into two sub-models: non-adaptive and adaptive. In the latter model, a sender will skip the vertices on its list from which it has received the message, while those vertices will not be skipped in the first model. In this study, we will present another sub-model called fully adaptive. Not only does this model benefit from a significantly better space complexity compared to the classical model, but, as will be proved, it is faster than the two other sub-models. Since the suggested model fits real-world network architectures, we will design optimal broadcast algorithms for well-known interconnection networks such as trees, grids, and cube-connected cycles. We also present an upper bound for tori under the same model. Then we focus on designing broadcast graphs (bg)’s under this model. A bg is a graph with minimum possible broadcast time from any originator. Additionally, a minimum broadcast graph (mbg) is a bg with the minimum possible number of edges. We propose mbg’s on n vertices for n ≀ 10 and sparse bg’s for 11 ≀ n ≀ 14 under the fully-adaptive model. Afterward, we introduce the first infinite families of bg’s under this model, and we prove that hypercubes are mbg under this model. Later, we establish the optimal broadcast time of k−ary trees and binomial trees under the nonadaptive model and provide an upper bound for complete bipartite graphs. We also improved a general upper bound for trees under the same model. We then suggest several general upper bounds for the universal lists by comparing them with the messy broadcasting model. Finally, we propose the first heuristic for this problem, namely HUB-GA: a Heuristic for Universal lists Broadcasting with Genetic Algorithm. We undertake various numerical experiments on frequently used interconnection networks in the literature, graphs with clique-like structures, and synthetic instances in order to cover many possibilities of industrial topologies. We also compare our results with state-of-the-art methods for classical broadcasting, which is proved to be the fastest model among all. Although the universal list model utilizes less memory than the classical model, our algorithm finds the same broadcast time as the classical model in diverse situations

    Models and Algorithms for Whole-Genome Evolution and their Use in Phylogenetic Inference

    Get PDF
    The rapid accumulation of sequenced genomes offers the chance to resolve longstanding questions about the evolutionary histories, or phylogenies, of groups of organisms. The relatively rare occurrence of large-scale evolutionary events in a whole genome, events such as genome rearrangements, duplications and losses, enables us to extract a strong and robust phylogenetic signal from whole-genome data. The work presented in this dissertation focuses on models and algorithms for whole-genome evolution and their use in phylogenetic inference. We designed algorithms to estimate pairwise genomic distances from large-scale genomic changes. We refined the evolutionary models on whole-genome evolution. We also made use of these results to provide fast and accurate methods for phylogenetic inference, that scales up, in both speed and accuracy, to modern high-resolution whole-genome data. We designed algorithms to estimate the true evolutionary distance between two genomes under genome rearrangements, and also under rearrangements, plus gains and losses. We refined the evolutionary model to be the first mathematical model to preserve the structural dichotomy in genomic organization between most prokaryotes and most eukaryotes. Those models and associated distance estimators provide a basis for studying facets of possible mechanisms of evolution through simulation and application to real genomes. Phylogenetic analyses from whole-genome data have been limited to small collections of genomes and low-resolution data; they have also lacked an effective assessment of robustness. We developed an approach that combines our distance estimator, any standard distance-based reconstruction algorithm, and a novel bootstrapping method based on resampling genomic adjacencies. The resulting tool overcomes a serious and long-standing impediment to the use of whole-genome data in phylogenetic inference and provides results comparable in accuracy and robustness to distance-based methods for sequence data. Maximum-likelihood approaches have been successfully applied to phylogenetic inferences for aligned sequences, but such applications remain primitive for whole-genome data. We developed a maximum-likelihood approach to phylogenetic analysis from whole-genome data. In combination with our bootstrap scheme, this new approach yields the first reliable phylogenetic tool for the analysis of whole-genome data at the level of syntenic blocks

    Inferring phylogenetic trees under the general Markov model via a minimum spanning tree backbone

    Get PDF
    Phylogenetic trees are models of the evolutionary relationships among species, with species typically placed at the leaves of trees. We address the following problems regarding the calculation of phylogenetic trees. (1) Leaf-labeled phylogenetic trees may not be appropriate models of evolutionary relationships among rapidly evolving pathogens which may contain ancestor-descendant pairs. (2) The models of gene evolution that are widely used unrealistically assume that the base composition of DNA sequences does not evolve. Regarding problem (1) we present a method for inferring generally labeled phylogenetic trees that allow sampled species to be placed at non-leaf nodes of the tree. Regarding problem (2), we present a structural expectation maximization method (SEM-GM) for inferring leaf-labeled phylogenetic trees under the general Markov model (GM) which is the most complex model of DNA substitution that allows the evolution of base composition. In order to improve the scalability of SEM-GM we present a minimum spanning tree (MST) framework called MST-backbone. MST-backbone scales linearly with the number of leaves. However, the unrealistic location of the root as inferred on empirical data suggests that the GM model may be overtrained. MST-backbone was inspired by the topological relationship between MSTs and phylogenetic trees that was introduced by Choi et al. (2011). We discovered that the topological relationship does not necessarily hold if there is no unique MST. We propose so-called vertex-order based MSTs (VMSTs) that guarantee a topological relationship with phylogenetic trees.Phylogenetische BĂ€ume modellieren evolutionĂ€re Beziehungen zwischen Spezies, wobei die Spezies typischerweise an den BlĂ€ttern der BĂ€ume sitzen. Wir befassen uns mit den folgenden Problemen bei der Berechnung von phylogenetischen BĂ€umen. (1) Blattmarkierte phylogenetische BĂ€ume sind möglicherweise keine geeigneten Modelle der evolutionĂ€ren Beziehungen zwischen sich schnell entwickelnden Krankheitserregern, die Vorfahren-Nachfahren-Paare enthalten können. (2) Die weit verbreiteten Modelle der Genevolution gehen unrealistischerweise davon aus, dass sich die Basenzusammensetzung von DNA-Sequenzen nicht Ă€ndert. BezĂŒglich Problem (1) stellen wir eine Methode zur Ableitung von allgemein markierten phylogenetischen BĂ€umen vor, die es erlaubt, Spezies, fĂŒr die Proben vorliegen, an inneren des Baumes zu platzieren. BezĂŒglich Problem (2) stellen wir eine strukturelle Expectation-Maximization-Methode (SEM-GM) zur Ableitung von blattmarkierten phylogenetischen BĂ€umen unter dem allgemeinen Markov-Modell (GM) vor, das das komplexeste Modell von DNA-Substitution ist und das die Evolution von Basenzusammensetzung erlaubt. Um die Skalierbarkeit von SEM-GM zu verbessern, stellen wir ein Minimale Spannbaum (MST)-Methode vor, die als MST-Backbone bezeichnet wird. MST-Backbone skaliert linear mit der Anzahl der BlĂ€tter. Die Tatsache, dass die Lage der Wurzel aus empirischen Daten nicht immer realistisch abgeleitet warden kann, legt jedoch nahe, dass das GM-Modell möglicherweise ĂŒbertrainiert ist. MST-backbone wurde von einer topologischen Beziehung zwischen minimalen SpannbĂ€umen und phylogenetischen BĂ€umen inspiriert, die von Choi et al. 2011 eingefĂŒhrt wurde. Wir entdeckten, dass die topologische Beziehung nicht unbedingt Bestand hat, wenn es keinen eindeutigen minimalen Spannbaum gibt. Wir schlagen so genannte vertex-order-based MSTs (VMSTs) vor, die eine topologische Beziehung zu phylogenetischen BĂ€umen garantieren

    Evolution of whole genomes through inversions:models and algorithms for duplicates, ancestors, and edit scenarios

    Get PDF
    Advances in sequencing technology are yielding DNA sequence data at an alarming rate – a rate reminiscent of Moore's law. Biologists' abilities to analyze this data, however, have not kept pace. On the other hand, the discrete and mechanical nature of the cell life-cycle has been tantalizing to computer scientists. Thus in the 1980s, pioneers of the field now called Computational Biology began to uncover a wealth of computer science problems, some confronting modern Biologists and some hidden in the annals of the biological literature. In particular, many interesting twists were introduced to classical string matching, sorting, and graph problems. One such problem, first posed in 1941 but rediscovered in the early 1980s, is that of sorting by inversions (also called reversals): given two permutations, find the minimum number of inversions required to transform one into the other, where an inversion inverts the order of a subpermutation. Indeed, many genomes have evolved mostly or only through inversions. Thus it becomes possible to trace evolutionary histories by inferring sequences of such inversions that led to today's genomes from a distant common ancestor. But unlike the classic edit distance problem where string editing was relatively simple, editing permutation in this way has proved to be more complex. In this dissertation, we extend the theory so as to make these edit distances more broadly applicable and faster to compute, and work towards more powerful tools that can accurately infer evolutionary histories. In particular, we present work that for the first time considers genomic distances between any pair of genomes, with no limitation on the number of occurrences of a gene. Next we show that there are conditions under which an ancestral genome (or one close to the true ancestor) can be reliably reconstructed. Finally we present new methodology that computes a minimum-length sequence of inversions to transform one permutation into another in, on average, O(n log n) steps, whereas the best worst-case algorithm to compute such a sequence uses O(n√n log n) steps

    Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison

    Get PDF
    Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics research. This dissertation discusses, specifically, protein structure related problems including protein structure alignment and query, and genome sequence related problems including haplotype reconstruction and genome rearrangement. It first presents an algorithm for pairwise protein structure alignment that is tested with structures from the Protein Data Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and CE. The preliminary algorithm is a graph-theory based approach, which uses the concept of \stars to reduce the complexity of clique-finding algorithms. The algorithm is then improved by introducing \double-center stars in the graph and applying a self-learning strategy. The updated algorithm is tested with a much larger set of protein structures and shown to be an improvement in accuracy, especially in cases of weak similarity. A protein structure query algorithm is designed to search for similar structures in the PDB, using the improved alignment algorithm. It is compared with SSM and shows better performance with lower maximum and average Q-score for missing proteins. An interesting problem dealing with the calculation of the diameter of a 3-D sequence of points arose and its connection to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence is approximated by a series of sublinear time deterministic, zero-error and bounded-error randomized algorithms and we have obtained a series of separations about the power of sublinear time computations. This dissertation also discusses two genome sequence related problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices with incomplete and inconsistent errors. The experiments with simulated data show both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy of the algorithm. Finally, a genome rearrangement problem is studied. The concept of non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several polynomial time algorithms are presented

    On coding labeled trees

    Get PDF
    Trees are probably the most studied class of graphs in Computer Science. In this thesis we study bijective codes that represent labeled trees by means of string of node labels. We contribute to the understanding of their algorithmic tractability, their properties, and their applications. The thesis is divided into two parts. In the first part we focus on two types of tree codes, namely Prufer-like codes and Transformation codes. We study optimal encoding and decoding algorithms, both in a sequential and in a parallel setting. We propose a unified approach that works for all Prufer-like codes and a more generic scheme based on the transformation of a tree into a functional digraph suitable for all bijective codes. Our results in this area close a variety of open problems. We also consider possible applications of tree encodings, discussing how to exploit these codes in Genetic Algorithms and in the generation of random trees. Moreover, we introduce a modified version of a known code that, in Genetic Algorithms, outperform all the other known codes. In the second part of the thesis we focus on two possible generalizations of our work. We first take into account the classes of k-trees and k-arch graphs (both superclasses of trees): we study bijective codes for this classes of graphs and their algorithmic feasibility. Then, we shift our attention to Informative Labeling Schemes. In this context labels are no longer considered as simple unique node identifiers, they rather convey information useful to achieve efficient computations on the tree. We exploit this idea to design a concurrent data structure for the lowest common ancestor problem on dynamic trees. We also present an experimental comparison between our labeling scheme and the one proposed by Peleg for static trees
    • 

    corecore