1,661 research outputs found

    Simultaneously Mapping and Superimposing Landmark Configurations with Parsimony as Optimality Criterion

    Get PDF
    All methods proposed to date for mapping landmark configurations on a phylogenetic tree start from an alignment generated by methods that make no use of phylogenetic information, usually by superimposing all configurations against a consensus configuration. In order to properly interpret differences between landmark configurations along the tree as changes in shape, the metric chosen to define the ancestral assignments should also form the basis to superimpose the configurations. Thus, we present here a method that merges both steps, map and align, into a single procedure that (for the given tree) produces a multiple alignment and ancestral assignments such that the sum of the Euclidean distances between the corresponding landmarks along tree nodes is minimized. This approach is an extension of the method proposed by Catalano et al. (2010. Phylogenetic morphometrics (I): the use of landmark data in a phylogenetic framework. Cladistics. 26:539–549) for mapping landmark data with parsimony as optimality criterion. In the context of phylogenetics, this method allows maximizing the degree to which similarity in landmark positions can be accounted for by common ancestry. In the context of morphometrics, this approach guarantees (heuristics aside) that all the transformations inferred on the tree represent changes in shape. The performance of the method was evaluated on different data sets, indicating that the method produces marked improvements in tree score (up to 5% compared with generalized superimpositions, up to 11% compared with ordinary superimpositions). These empirical results stress the importance of incorporating the phylogenetic information into the alignment step.Fil: Catalano, Santiago Andres. Universidad Nacional de Tucumán. Facultad de Ciencias Naturales e Instituto Miguel Lillo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán; ArgentinaFil: Goloboff, Pablo Augusto. Universidad Nacional de Tucumán. Facultad de Ciencias Naturales e Instituto Miguel Lillo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tucumán; Argentin

    音声翻訳における文解析技法について

    Get PDF
    本文データは平成22年度国立国会図書館の学位論文(博士)のデジタル化実施により作成された画像ファイルを基にpdf変換したものである京都大学0048新制・論文博士博士(工学)乙第8652号論工博第2893号新制||工||968(附属図書館)UT51-94-R411(主査)教授 長尾 真, 教授 堂下 修司, 教授 池田 克夫学位規則第4条第2項該当Doctor of EngineeringKyoto UniversityDFA

    Indices and Applications in High-Throughput Sequencing

    Get PDF
    Recent advances in sequencing technology allow to produce billions of base pairs per day in the form of reads of length 100 bp an longer and current developments promise the personal $1,000 genome in a couple of years. The analysis of these unprecedented amounts of data demands for efficient data structures and algorithms. One such data structures is the substring index, that represents all substrings or substrings up to a certain length contained in a given text. In this thesis we propose 3 substring indices, which we extend to be applicable to millions of sequences. We devise internal and external memory construction algorithms and a uniform framework for accessing the generalized suffix tree. Additionally we propose different index-based applications, e.g. exact and approximate pattern matching and different repeat search algorithms. Second, we present the read mapping tool RazerS, which aligns millions of single or paired-end reads of arbitrary lengths to their potential genomic origin using either Hamming or edit distance. Our tool can work either lossless or with a user-defined loss rate at higher speeds. Given the loss rate, we present a novel approach that guarantees not to lose more reads than specified. This enables the user to adapt to the problem at hand and provides a seamless tradeoff between sensitivity and running time. We compare RazerS with other state-of-the-art read mappers and show that it has the highest sensitivity and a comparable performance on various real-world datasets. At last, we propose a general approach for frequency based string mining, which has many applications, e.g. in contrast data mining. Our contribution is a novel and lightweight algorithm that is faster and uses less memory than the best available algorithms. We show its applicability for mining multiple databases with a variety of frequency constraints. As such, we use the notion of entropy from information theory to generalize the emerging substring mining problem to multiple databases. To demonstrate the improvement of our algorithm we compared to recent approaches on real-world experiments of various string domains, e.g. natural language, DNA, or protein sequences

    Working Notes from the 1992 AAAI Spring Symposium on Practical Approaches to Scheduling and Planning

    Get PDF
    The symposium presented issues involved in the development of scheduling systems that can deal with resource and time limitations. To qualify, a system must be implemented and tested to some degree on non-trivial problems (ideally, on real-world problems). However, a system need not be fully deployed to qualify. Systems that schedule actions in terms of metric time constraints typically represent and reason about an external numeric clock or calendar and can be contrasted with those systems that represent time purely symbolically. The following topics are discussed: integrating planning and scheduling; integrating symbolic goals and numerical utilities; managing uncertainty; incremental rescheduling; managing limited computation time; anytime scheduling and planning algorithms, systems; dependency analysis and schedule reuse; management of schedule and plan execution; and incorporation of discrete event techniques

    Behavior Alignment via Reward Function Optimization

    Full text link
    Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the environment's primary rewards. Our approach automatically determines the most effective way to blend these types of feedback, thereby enhancing robustness against heuristic reward misspecification. Remarkably, it can also adapt an agent's policy optimization process to mitigate suboptimalities resulting from limitations and biases inherent in the underlying RL algorithms. We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges. We investigate heuristic auxiliary rewards of varying quality -- some of which are beneficial and others detrimental to the learning process. Our results show that our framework offers a robust and principled way to integrate designer-specified heuristics. It not only addresses key shortcomings of existing approaches but also consistently leads to high-performing solutions, even when given misaligned or poorly-specified auxiliary reward functions.Comment: (Spotlight) Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023

    Algorithms for reconstruction of chromosomal structures

    Get PDF

    Large-scale methods in computational genomics

    Get PDF
    The explosive growth in biological sequence data coupled with the design and deployment of increasingly high throughput sequencing technologies has created a need for methods capable of processing large-scale sequence data in a time and cost effective manner. In this dissertation, we address this need through the development of faster algorithms, space-efficient methods, and high-performance parallel computing techniques for some key problems in computational genomics;The first problem addressed is the clustering of DNA sequences based on a measure of sequence similarity. Our clustering method: (i) guarantees linear space complexity, in contrast to the quadratic memory requirements of previously developed methods; (ii) identifies sequence pairs containing long maximal matches in the decreasing order of their maximal match lengths in run-time proportional to the sum of input and output sizes; (iii) provides heuristics to significantly reduce the number of pairs evaluated for checking sequence similarity without affecting quality; and (iv) has parallel strategies that provide linear speedup and a proportionate reduction in space per processor. Our approach has significantly enhanced the problem size reach while also drastically reducing the time to solution;The next problem we address is the de novo detection of genomic repeats called Long Terminal Repeat (LTR) retrotransposons. Our algorithm guarantees linear space complexity and produces high quality candidates for prediction in run-time proportional to the sum of input and output sizes. Validation of our approach on the yeast genome demonstrates both superior quality and performance results when compared to previously developed software;In a genome assembly project, fragments sequenced from a target genome are computationally assembled into numerous supersequences called contigs , which are then ordered and oriented into scaffolds . In this dissertation, we introduce a new problem called retroscaffolding for scaffolding contigs based on the knowledge of their LTR retrotransposon content. Through identification of sequencing gaps that span LTR retrotransposons, retroscaffolding provides a mechanism for prioritizing sequencing gaps for finishing purposes;While most of the problems addressed here have been studied previously, the main contribution in this dissertation is the development of methods that can scale to the largest available sequence collections
    corecore