Search CORE

3 research outputs found

Microbiome profiling by Illumina sequencing of combinatorial sequence-tagged PCR products

Author: A Chao
A Chao
A Mortazavi
A Nocker
AF Andersson
Andrew D. Fernandes
B Efron
C Camacho
C Quince
CL Lauber
DA Benson
DJG Lahr
DN Frank
DR Bentley
DR Smith
EP Smith
Frank R. DeLeo
Gregor Reid
Gregory B. Gloor
J Oksanen
J Pawlowski
J Ravel
J Reeder
J Schellenberg
Jean M. Macklaim
JF Petrosino
JG Caporaso
JR Cole
LA Amaral-Zettler
M Hamady
N Whiteford
PJA Cock
PN Polymenakou
R Colwell
R Hummelen
Roderick MacPhee
Ruben Hummelen
Russell J. Dickson
S Hurlbert
S Rodrigue
S Srinivasan
SF Altschul
SF Altschul
SM Huse
V Laurikari
WR Engels
Y Shi
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/07/2010
Field of study

We developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end. The number of reads generated permitted saturating or near-saturating analysis of samples of the vaginal microbiome. The large number of reads al- lowed an in-depth analysis of errors, and we found that PCR-induced errors composed the vast majority of non-organism derived species variants, an ob- servation that has significant implications for sequence clustering of similar high-throughput data. We show that the short reads are sufficient to assign organisms to the genus or species level in most cases. We suggest that this method will be useful for the deep sequencing of any short nucleotide region that is taxonomically informative; these include the V3, V5 regions of the bac- terial 16S rRNA genes and the eukaryotic V9 region that is gaining popularity for sampling protist diversity.Comment: 28 pages, 13 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Scholarship@Western

Crossref

Directory of Open Access Journals

PubMed Central

Erasmus University Digital Repository

Type inference for unique pattern matching

Author: Abiteboul S.
Book R.
Elgaard J.
Frisch A.
Frisch A.
Frisch A.
Frisch A.
Laurikari V.
Levin M. Y.
Murata M.
Murata M.
Neumann A.
Neven F.
Stijn Vansummeren
Tabuchi N.
Vianu V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Cleaning inconsistencies in information extraction via prioritized repairs

Author: Ajmera J.
Benson E.
Chiticariu L.
Cunningham H.
DeJong G.
Holzer M.
Lafferty J. D.
Laurikari V.
Leek T. R.
McCallum A.
Poon H.
Riloff E.
Shen W.
Soderland S.
Xu H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

The population of a predefined relational schema from textual content, commonly known as Information Extraction (IE), is a pervasive task in contemporary computational challenges associated with Big Data. Since the textual content varies widely in nature and structure (from machine logs to informal natural language), it is notoriously difficult to write IE programs that extract the sought information without any inconsistencies (e.g. a substring should not be annotated as both an address and a person name). Dealing with inconsistencies is hence of crucial importance in IE systems. Industrial-strength IE systems like GATE and IBM SystemT therefore provide a built-in collection of cleaning operations to remove inconsistencies from extracted relations. These operations, however, are collected in an ad-hoc fashion through use cases. Ideally, we would like to allow IEdevelopers to declare their own policies. But existing cleaning operations are defined in an algorithmic way and, hence, it is not clear how to extend the built-in operations without requiring low-level coding of internal or external functions.We embark on the establishment of a framework for declarative cleaning of inconsistencies in IE, though principles of database theory. Specifically, building upon the formalism of document spanners for IE, we adopt the concept of prioritized repairs, which has been recently proposed as an extension of the traditional database repairs to incorporate priorities among conflicting facts. We show that our framework captures the popular cleaning policies, as well as the POSIX semantics for extraction through regular expressions. We explore the problem of determining whether a cleaning declaration is unambiguous (i.e. always results in a single repair), and whether it increases the expressive power of the extraction language. We give both positive and negative results, some of which are general, and some of which apply to policies used in practice.info:eu-repo/semantics/publishe

CiteSeerX

Crossref

DI-fusion