Search CORE

29,601 research outputs found

De Novo Assembly of Nucleotide Sequences in a Compressed Feature Space

Author: Robertson David L.
Tapinos Avraam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Sequencing technologies allow for an in-depth analysis of biological species but the size of the generated datasets introduce a number of analytical challenges. Recently, we demonstrated the application of numerical sequence representations and data transformations for the alignment of short reads to a reference genome. Here, we expand out approach for de novo assembly of short reads. Our results demonstrate that highly compressed data can encapsulate the signal suffi- ciently to accurately assemble reads to big contigs or complete genomes

Crossref

Enlighten

Graph theoretic methods for the analysis of structural relationships in biological macromolecules

Author: Altschul
Artymiuk
Artymiuk
Artymiuk
Artymiuk
Artymiuk
Barnard
Baxevanis
Benning
Berman
Bernstein
Brint
Brint
Bron
Bruno
Bryant
Crandell
Dean
Diestel
Doubet
Fan
Feizi
Figueras
Flores
Gardiner
Gati
Good
Gray
Groves
Gruer
Gund
Hagadone
Harrison
Holden
Hutchinson
Jasanoff
Johnson
Kanna
Klausner
Kleywegt
Koch
Kraulis
Lengauer
Lesk
Martin
Martin
McGregor
Messmer
Mitchell
Ollis
Pickering
Ray
Raymond
Read
Salton
Samudrala
Sayle
Simon
Srere
Sussenguth
Tesmer
Tinoco
Trinajstic
Tsukada
Ullmann
van Rijsbergen
Willett
Willett
Willett
Willett
Williams
Wilson
Zhang
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

Subgraph isomorphism and maximum common subgraph isomorphism algorithms from graph theory provide an effective and an efficient way of identifying structural relationships between biological macromolecules. They thus provide a natural complement to the pattern matching algorithms that are used in bioinformatics to identify sequence relationships. Examples are provided of the use of graph theory to analyze proteins for which three-dimensional crystallographic or NMR structures are available, focusing on the use of the Bron-Kerbosch clique detection algorithm to identify common folding motifs and of the Ullmann subgraph isomorphism algorithm to identify patterns of amino acid residues. Our methods are also applicable to other types of biological macromolecule, such as carbohydrate and nucleic acid structures

CiteSeerX

Crossref

White Rose Research Online

Sussex Research Online

Spaced seeds improve k-mer-based metagenomic classification

Author: Brinda Karel
Kucherov Gregory
Sykulski Maciej
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/07/2015
Field of study

Metagenomics is a powerful approach to study genetic content of environmental samples that has been strongly promoted by NGS technologies. To cope with massive data involved in modern metagenomic projects, recent tools [4, 39] rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes. Within this general framework, we show in this work that spaced seeds provide a significant improvement of classification accuracy as opposed to traditional contiguous k-mers. We support this thesis through a series a different computational experiments, including simulations of large-scale metagenomic projects. Scripts and programs used in this study, as well as supplementary material, are available from http://github.com/gregorykucherov/spaced-seeds-for-metagenomics.Comment: 23 page

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Highly Scalable Algorithms for Robust String Barcoding

Author: DasGupta Bhaskar
Konwar Kishori M.
Mandoiu Ion I.
Shvartsman Alex A.
Publication venue
Publication date: 01/01/2005
Field of study

String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem

arXiv.org e-Print Archive

CiteSeerX

Crossref