Search CORE

301 research outputs found

Highly Scalable Algorithms for Robust String Barcoding

Author: DasGupta Bhaskar
Konwar Kishori M.
Mandoiu Ion I.
Shvartsman Alex A.
Publication venue
Publication date: 01/01/2005
Field of study

String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem

arXiv.org e-Print Archive

CiteSeerX

Crossref

Highly Scalable Algorithms for Robust String Barcoding

Author: C. Linhart
D.S. Johnson
F.B. Dean
J. Borneman
L. Lovász
P. Berman
P. Berman
V. Chvátal
V.G. Cheung
V.V. Vazirani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Crossref

Efficient alignment-free DNA barcode analytics

Author: A Zhang
B Holmes
B Schölkopf
BJ Frey
C Leslie
CS Leslie
CS Leslie
CS Leslie
D Steinke
E Wong
EL Allwein
G Saunders
I Kononenko
J Robins
KF Armstrong
M Linares
M Stoeckle
ML Sogin
MV Matz
MW Chase
P Kuksa
P Smith
Pavel Kuksa
PDN Hebert
PDN Hebert
R Barrett
R Kuang
R Nielsen
RD Ward
S Menchetti
T Jaakkola
V Vapnik
Vladimir Pavlovic
W Kress
WJ Kress
WM Rand
Z Abdo
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Robust and scalable barcoding for massively parallel long‑read sequencing

Author: Arranz Silvia Eda
Bulacio Pilar
Ezpeleta Joaquín
Krsticevic Flavia
Labari Ignacio Garcia
Lavista Llanos Sofía
Posner Victoria María
Tapia Elizabeth
Villanova Gabriela Vanina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/05/2022
Field of study

Nucleic-acid barcoding is an enabling technique for many applications, but its use remains limited in emerging long-read sequencing technologies with intrinsically low raw accuracy. Here, we apply so-called NS-watermark barcodes, whose error correction capability was previously validated in silico, in a proof of concept where we synthesize 3840 NS-watermark barcodes and use them to asymmetrically tag and simultaneously sequence amplicons from two evolutionarily distant species (namely Bordetella pertussis and Drosophila mojavensis) on the ONT MinION platform. To our knowledge, this is the largest number of distinct, non-random tags ever sequenced in parallel and the frst report of microarray-based synthesis as a source for large oligonucleotide pools for barcoding. We recovered the identity of more than 86% of the barcodes, with a crosstalk rate of 0.17% (i.e., one misassignment every 584 reads). This falls in the range of the index hopping rate of established, highaccuracy Illumina sequencing, despite the increased number of tags and the relatively low accuracy of both microarray-based synthesis and long-read sequencing. The robustness of NS-watermark barcodes, together with their scalable design and compatibility with low-cost massive synthesis, makes them promising for present and future sequencing applications requiring massive labeling, such as long-read single-cell RNA-Seq.Fil: Ezpeleta, Joaquín. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina.Fil: Labari, Ignacio Garcia. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina.Fil: Bulacio, Pilar. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina.Fil: Tapia, Elizabeth. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina.Fil: Ezpeleta, Joaquín. Universidad Nacional de Rosario. Facultad de Ciencias Exactas, Ingeniería y Agrimensura; Argentina.Fil: Bulacio, Pilar. Universidad Nacional de Rosario. Facultad de Ciencias Exactas, Ingeniería y Agrimensura; Argentina.Fil: Tapia, Elizabeth. Universidad Nacional de Rosario. Facultad de Ciencias Exactas, Ingeniería y Agrimensura; Argentina.Fil: Villanova, Gabriela Vanina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.Fil: Lavista Llanos, Sofía. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.Fil: Villanova, Gabriela Vanina. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Laboratorio Mixto de Biotecnología Acuática. Centro Científico Tecnológico y Educativo Acuario del Río Paraná; Argentina.Fil: Posner, Victoria María. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Laboratorio Mixto de Biotecnología Acuática. Centro Científico Tecnológico y Educativo Acuario del Río Paraná; Argentina.Fil: Arranz, Silvia Eda. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Laboratorio Mixto de Biotecnología Acuática. Centro Científico Tecnológico y Educativo Acuario del Río Paraná; Argentina.Fil: Krsticevic, Flavia. The Hebrew University of Jerusalem. Robert H Smith Faculty of Agriculture, Food and Environment; Israel

PubMed Central

Repositorio Hipermedial de la Universidad Nacional de Rosario

An efficient and accurate framework for large-scale sequences of DNA barcodes

Author: Neto Luís Manuel Pacheco
Publication venue
Publication date: 02/12/2021
Field of study

Dissertação de mestrado integrado em Engenharia InformáticaDNA barcodes are short sequences of pre-defined gene regions that contain a sufficient amount of intra- and inter-species genetic information. High-throughput sequencing techniques are currently used to identify large sequences of DNA barcodes in a species genome, in a relatively short time. Domain experts require adequate self-contained tools to accurately and efficiently process DNA barcode data in a reasonable time, taking advantage of current parallel and heterogeneous computing systems. They also expect to use these tools on different computing platforms, from laptops to high-performance servers, without requiring a broad knowledge in software engineering to develop efficient computational applications. The main goal of this project was to develop a framework and associated user-friendly tools for domain experts to efficiently support DNA barcoding studies, providing an abstraction of the performance issues. 4SpecID is the key outcome of this work: an application software that integrates a semi-automated auditing and annotation tool for reference libraries, to ensure the quality standards of the compiled data, aiming to enable a grounded decision when identifying species from DNA barcodes. Its graphics interface aids the end user to specify the operations and it also simplifies data filtering and remote file handling. The C++ ported version (from MATLAB) was fully tested and is more robust than the original version. Architecture features common to laptop and compute servers were exploited, namely parallel programming techniques and memory models. The presented validation and performance results show significant improvements on execution times, not only on the sequential version, but also by using the available parallel capabilities of the underlying computing platforms.Os códigos de barras de ADN são pequenas sequência de regiões genéticas predefinidas que contêm uma quantidade suficiente de informação genética intra e interespécies. Técnicas de sequenciamento de alto desempenho são usadas na identificação de grandes sequências de códigos de barras de ADN no genoma de uma espécie. No entanto, é necessário que sejam desenvolvidas ferramentas adequadas para que os especialistas de domínio processem dados de código de barras de ADN de forma precisa e num intervalo de tempo viável, utilizando os sistemas de computação paralelos e heterogêneos que existem. Destas ferramentas é esperado que possam ser utilizadas recorrendo a diferentes plataformas de computação, de laptops a servidores de alto desempenho, sem exigir um amplo conhecimento em engenharia de software para serem utilizadas ou usadas para a criação de outras ferramentas. O objetivo principal deste projeto é desenvolver uma estrutura que forneça uma abstração dos possíveis desafios de desempenho e permitir que especialistas no domínio tenham uma forma computacional eficiente para realizar um estudo de código de barras de DNA. Neste projecto desenvolveu-se uma ferramenta, 4SpecID, que visa permitir uma decisão fundamentada na identificação de espécies através de códigos de barras de DNA: uma auditoria semi-automática e ferramenta de anotação para bibliotecas de referência, para garantir os padrões de qualidade dos dados compilados. Este projeto também explorou as vantagens das arquiteturas de servidores de computação e laptops mais comuns, como técnicas de programação paralela e modelos de memória. Os resultados de validação e desempenho apresentados mostram que é possível obter melhores tempos de execução utilizando as características disponíveis das plataformas subjacentes

Universidade do Minho: RepositoriUM

High-Throughput SNP Genotyping by SBE/SBH

Author: Mandoiu Ion I.
Prajescu Claudia
Publication venue
Publication date: 01/01/2005
Field of study

Despite much progress over the past decade, current Single Nucleotide Polymorphism (SNP) genotyping technologies still offer an insufficient degree of multiplexing when required to handle user-selected sets of SNPs. In this paper we propose a new genotyping assay architecture combining multiplexed solution-phase single-base extension (SBE) reactions with sequencing by hybridization (SBH) using universal DNA arrays such as all

k

-mer arrays. In addition to PCR amplification of genomic DNA, SNP genotyping using SBE/SBH assays involves the following steps: (1) Synthesizing primers complementing the genomic sequence immediately preceding SNPs of interest; (2) Hybridizing these primers with the genomic DNA; (3) Extending each primer by a single base using polymerase enzyme and dideoxynucleotides labeled with 4 different fluorescent dyes; and finally (4) Hybridizing extended primers to a universal DNA array and determining the identity of the bases that extend each primer by hybridization pattern analysis. Our contributions include a study of multiplexing algorithms for SBE/SBH genotyping assays and preliminary experimental results showing the achievable tradeoffs between the number of array probes and primer length on one hand and the number of SNPs that can be assayed simultaneously on the other. Simulation results on datasets both randomly generated and extracted from the NCBI dbSNP database suggest that the SBE/SBH architecture provides a flexible and cost-effective alternative to genotyping assays currently used in the industry, enabling genotyping of up to hundreds of thousands of user-specified SNPs per assay.Comment: 19 page

arXiv.org e-Print Archive

CiteSeerX

DNA Barcoding in the Cycadales: Testing the Potential of Proposed Barcoding Markers for Species Identification of Cycads

Author: B DasGupta
B DasGupta
Brian Dilkes
C Moritz
Chelsea D. Specht
Chodon Sass
CP Meyer
D Gonzalez
D Rubinoff
Damon P. Little
Dennis Wm. Stevenson
DL Jones
DP Little
EJ Hermsen
F Maggini
GG Presting
HE Driscoll
J Lidholm
J Lidholm
JD Palmer
JD Thompson
JJ Doyle
JN Timmis
JS Donaldson
K Hill
K Will
KD Hill
KW Will
LM Whitelock
M Blaxter
M Chase
M Stoeckle
MC Ebach
MW Chase
PDN Hebert
PDN Hebert
PDN Hebert
PG Wolf
R Meier
RD Ward
S Ratnasingham
SF Altschul
SM Chaw
SS Jakob
WJ Kress
WJ Kress
Y Cho
Z Gompert
Publication venue: Public Library of Science
Publication date: 07/11/2007
Field of study

Barcodes are short segments of DNA that can be used to uniquely identify an unknown specimen to species, particularly when diagnostic morphological features are absent. These sequences could offer a new forensic tool in plant and animal conservation—especially for endangered species such as members of the Cycadales. Ideally, barcodes could be used to positively identify illegally obtained material even in cases where diagnostic features have been purposefully removed or to release confiscated organisms into the proper breeding population. In order to be useful, a DNA barcode sequence must not only easily PCR amplify with universal or near-universal reaction conditions and primers, but also contain enough variation to generate unique identifiers at either the species or population levels. Chloroplast regions suggested by the Plant Working Group of the Consortium for the Barcode of Life (CBoL), and two alternatives, the chloroplast psbA-trnH intergenic spacer and the nuclear ribosomal internal transcribed spacer (nrITS), were tested for their utility in generating unique identifiers for members of the Cycadales. Ease of amplification and sequence generation with universal primers and reaction conditions was determined for each of the seven proposed markers. While none of the proposed markers provided unique identifiers for all species tested, nrITS showed the most promise in terms of variability, although sequencing difficulties remain a drawback. We suggest a workflow for DNA barcoding, including database generation and management, which will ultimately be necessary if we are to succeed in establishing a universal DNA barcode for plants

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.

Author: Chan Michelle M
Hussmann Jeffrey A
Jones Matthew G
Khodaverdian Alex
Quinn Jeffrey J
Wang Robert
Weissman Jonathan S
Xu Chenling
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia

eScholarship - University of California

DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability

Author: A Kocyan
A Sanchez
A Sogo
A Stamatakis
AB Zhang
AG Kluge
AJ Conger
AJ Fazekas
AN Muellner
B DasGupta
B DasGupta
B Larget
C van den Berg
D Hao
D Savir
DA Morrison
Damon P. Little
DE Soltis
DH Goldman
DH Les
DJ Funk
DL Erickson
DP Little
DP Little
DS Gernandt
EB Wilson
FE Harrell Jr
G Landan
G Petersen
GA Salazar
GE Fox
GG Lopez
GM Plunkett
GT Chandler
I Meusnier
IN Sarkar
IN Sarkar
IN Sarkar
J Felsenstein
J McNeal
J Treutlein
J Yokoyama
JH Zar
JJ Doyle
JL Fleiss
JP Der
JS Farris
JS Farris
JV Freudenstein
K Hayashi
K Hayashi
K Hayashi
K Munch
K Munch
KC Nixon
KF Müller
KH Wolfe
KM Cameron
L Andersson
L Andersson
L Drábková
LB Zhang
LB Zhang
LI Cabrera
M Gamer
M Kato
M Nakazawa
M Tamura
MA Larkin
MD Crisp
MD Pirie
MG Harrington
ML Hollingsworth
MP Simmons
MV Matz
N Tanaka
NP Tippery
P Goldblatt
P Goloboff
P Rice
P Wilkin
PA Gadek
PA Goloboff
PA Goloboff
PDN Hebert
R Floyd
R Lahaye
R Meier
R Nyffeler
R Vidal-Russell
RC Edgar
S Ratnasingham
S Renner
SB Needleman
SE Bartlett
Sergios-Orestis Kolokotronis
SF Altschul
SF Altschul
SG Newmaster
SJ Wagstaff
SK Osaloo
T Ohi-Toma
T Tokuoka
TF Smith
V Malécot
W Gish
WJ Kress
WR Pearson
XR Wang
Y Bouchenak-Khelladi
Y Kita
YCF Su
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple–sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple–sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment–free sequence identification algorithm–BRONX–that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple–sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user–defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini–barcode queries against a full–length barcode database). BRONX consistently produced better identifications at the genus–level for all query types

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central