6 research outputs found

    SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences

    Full text link
    Computational complexity is a key limitation of genomic analyses. Thus, over the last 30 years, researchers have proposed numerous fast heuristic methods that provide computational relief. Comparing genomic sequences is one of the most fundamental computational steps in most genomic analyses. Due to its high computational complexity, optimized exact and heuristic algorithms are still being developed. We find that these methods are highly sensitive to the underlying data, its quality, and various hyperparameters. Despite their wide use, no in-depth analysis has been performed, potentially falsely discarding genetic sequences from further analysis and unnecessarily inflating computational costs. We provide the first analysis and benchmark of this heterogeneity. We deliver an actionable overview of the 11 most widely used state-of-the-art methods for comparing genomic sequences. We also inform readers about their advantages and downsides using thorough experimental evaluation and different real datasets from all major manufacturers (i.e., Illumina, ONT, and PacBio). SequenceLab is publicly available at https://github.com/CMU-SAFARI/SequenceLab

    Technology dictates algorithms: Recent developments in read alignment

    Full text link
    Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies

    A comparative study of automated reviewer assignment methods

    Get PDF
    vii, 75 leaves : ill. ; 29 cm.Includes abstract.Includes bibliographical references (leaves 55-60).The reviewer assignment problem is the problem of determining suitable reviewers for papers submitted to journals or conferences. Automated solutions to this problem have used standard information retrieval methods such as the vector space model and latent semantic indexing. In this work we introduce two new methods. One method assigns reviewers using compression approximated information distance. This method approximates the Kolmogorov complexity of papers using their size when compressed by a compression program, and then approximates the relatedness of the papers using an information distance equation. This method performs better than standard information retrieval methods. The second method assigns reviewers using Google desktop a more advanced information retrieval system. The method searches for key terms from a paper needing reviewers in a set of papers written by possible reviewers and uses the search results as votes for reviewers. This method is relatively simple and is very effective for assigning reviewers

    Additive Distances and Quasi_Distances Between Words

    No full text
    We study additive distances and quasi-distances between words. We show that every additive distance is finite. We then prove that every additive quasi-distance is regularity-preserving, that is, the neighborhood of any radius of a regular language with respect to an additive quasi-distance is regular. Finally, similar results will be proven for context-free, computable and computably enumerable languages. 1.) C. S. Calude, K. Salomaa, S. Yu (eds.). Advances and Trends in Automata and Formal Languages. A Collection of Papers in Honour of the 60th Birthday of Helmut Jürgensen
    corecore