Search CORE

2,072 research outputs found

The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences.

Author: Anton
Avraam Tapinos
Bede Constantinides
Bellman
David L. Robertson
Hendriks
Jensen
Kotsakos
Matthew Cotten
Mitsa
My V. T. Phan
Mörchen
Nair
Samaneh Kouchaki
Shumway
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data

Multidisciplinary Digital Publishing Institute

Crossref

LSHTM Research Online

Directory of Open Access Journals

EUR Research Repository

Oxford University Research Archive

The University of Manchester - Institutional Repository

Erasmus University Digital Repository

Enlighten

JACKIE: Fast Enumeration of Genome-Wide Single- and Multicopy CRISPR Target Sites and Their Off-Target Numbers.

Author: Cheng Albert
Zhu Jacqueline Jufen
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/08/2022
Field of study

Zinc finger protein-, transcription activator like effector-, and CRISPR-based methods for genome and epigenome editing and imaging have provided powerful tools to investigate functions of genomes. Targeting sequence design is vital to the success of these experiments. Although existing design software mainly focus on designing target sequence for specific elements, we report here the implementation of Jackie and Albert\u27s Comprehensive K-mer Instances Enumerator (JACKIE), a suite of software for enumerating all single- and multicopy sites in the genome that can be incorporated for genome-scale designs as well as loaded onto genome browsers alongside other tracks for convenient web-based graphic-user-interface-enabled design. We also implement fast algorithms to identify sequence neighborhoods or off-target counts of targeting sequences so that designs with low probability of off-target can be identified among millions of design sequences in reasonable time. We demonstrate the application of JACKIE-designed CRISPR site clusters for genome imaging

The Jackson Laboratory: The Mouseion at the JAXlibrary

PubMed Central

Using multiple alignments to improve seeded local alignment algorithms

Author: Batzoglou Serafim
Flannick Jason
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

Multiple alignments among genomes are becoming increasingly prevalent. This trend motivates the development of tools for efficient homology search between a query sequence and a database of multiple alignments. In this paper, we present an algorithm that uses the information implicit in a multiple alignment to dynamically build an index that is weighted most heavily towards the promising regions of the multiple alignment. We have implemented Typhon, a local alignment tool that incorporates our indexing algorithm, which our test results show to be more sensitive than algorithms that index only a sequence. This suggests that when applied on a whole-genome scale, Typhon should provide improved homology searches in time comparable to existing algorithms

CiteSeerX

Crossref

PubMed Central

A perceptual hash function to store and retrieve large scale DNA sequences

Author: Bailly Xavier
De Herve Jocelyn De Goer
Kang Myoung-Ah
Nguifo Engelbert Mephu
Publication venue
Publication date: 01/01/2014
Field of study

This paper proposes a novel approach for storing and retrieving massive DNA sequences.. The method is based on a perceptual hash function, commonly used to determine the similarity between digital images, that we adapted for DNA sequences. Perceptual hash function presented here is based on a Discrete Cosine Transform Sign Only (DCT-SO). Each nucleotide is encoded as a fixed gray level intensity pixel and the hash is calculated from its significant frequency characteristics. This results to a drastic data reduction between the sequence and the perceptual hash. Unlike cryptographic hash functions, perceptual hashes are not affected by "avalanche effect" and thus can be compared. The similarity distance between two hashes is estimated with the Hamming Distance, which is used to retrieve DNA sequences. Experiments that we conducted show that our approach is relevant for storing massive DNA sequences, and retrieving them

arXiv.org e-Print Archive

HAL Clermont Université

Hal-Diderot