6 research outputs found
SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences
Computational complexity is a key limitation of genomic analyses. Thus, over
the last 30 years, researchers have proposed numerous fast heuristic methods
that provide computational relief. Comparing genomic sequences is one of the
most fundamental computational steps in most genomic analyses. Due to its high
computational complexity, optimized exact and heuristic algorithms are still
being developed. We find that these methods are highly sensitive to the
underlying data, its quality, and various hyperparameters. Despite their wide
use, no in-depth analysis has been performed, potentially falsely discarding
genetic sequences from further analysis and unnecessarily inflating
computational costs. We provide the first analysis and benchmark of this
heterogeneity. We deliver an actionable overview of the 11 most widely used
state-of-the-art methods for comparing genomic sequences. We also inform
readers about their advantages and downsides using thorough experimental
evaluation and different real datasets from all major manufacturers (i.e.,
Illumina, ONT, and PacBio). SequenceLab is publicly available at
https://github.com/CMU-SAFARI/SequenceLab
Technology dictates algorithms: Recent developments in read alignment
Massively parallel sequencing techniques have revolutionized biological and
medical sciences by providing unprecedented insight into the genomes of humans,
animals, and microbes. Modern sequencing platforms generate enormous amounts of
genomic data in the form of nucleotide sequences or reads. Aligning reads onto
reference genomes enables the identification of individual-specific genetic
variants and is an essential step of the majority of genomic analysis
pipelines. Aligned reads are essential for answering important biological
questions, such as detecting mutations driving various human diseases and
complex traits as well as identifying species present in metagenomic samples.
The read alignment problem is extremely challenging due to the large size of
analyzed datasets and numerous technological limitations of sequencing
platforms, and researchers have developed novel bioinformatics algorithms to
tackle these difficulties. Importantly, computational algorithms have evolved
and diversified in accordance with technological advances, leading to todays
diverse array of bioinformatics tools. Our review provides a survey of
algorithmic foundations and methodologies across 107 alignment methods
published between 1988 and 2020, for both short and long reads. We provide
rigorous experimental evaluation of 11 read aligners to demonstrate the effect
of these underlying algorithms on speed and efficiency of read aligners. We
separately discuss how longer read lengths produce unique advantages and
limitations to read alignment techniques. We also discuss how general alignment
algorithms have been tailored to the specific needs of various domains in
biology, including whole transcriptome, adaptive immune repertoire, and human
microbiome studies
A comparative study of automated reviewer assignment methods
vii, 75 leaves : ill. ; 29 cm.Includes abstract.Includes bibliographical references (leaves 55-60).The reviewer assignment problem is the problem of determining suitable
reviewers for papers submitted to journals or conferences. Automated solutions to this problem have used standard information retrieval methods such as the vector space model and latent semantic indexing. In this work we introduce two new methods. One method assigns reviewers using compression approximated information distance. This method approximates the Kolmogorov complexity of papers using their size when compressed by a compression program, and then approximates the relatedness of the papers using an information distance equation. This method performs better than standard information retrieval methods. The second method assigns reviewers using Google desktop a more advanced information retrieval system. The method searches for key terms from a paper needing reviewers in a set of papers written by possible reviewers and uses the search
results as votes for reviewers. This method is relatively simple and is very effective for assigning reviewers
Additive Distances and Quasi_Distances Between Words
We study additive distances and quasi-distances between words. We show that every additive distance is finite. We then prove that every additive quasi-distance is regularity-preserving, that is, the neighborhood of any radius of a regular language with respect to an additive quasi-distance is regular. Finally, similar results will be proven for context-free, computable and computably enumerable languages. 1.) C. S. Calude, K. Salomaa, S. Yu (eds.). Advances and Trends in Automata and Formal Languages. A Collection of Papers in Honour of the 60th Birthday of Helmut Jürgensen