Search CORE

eScholarship - University of California

Caltech Authors

PIQA: pipeline for Illumina G1 genome analyzer data quality assessment

Author: A. Martinez-Alcantara
Bentley
C. Feng
Church
Dolan
E. Ballesteros
H. Koshinsky
Holt
M. Rojas
P. Havlak
Srivatsan
V. Y. Fofanov
Y. Fofanov
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Summary: PIQA is a quality analysis pipeline designed to examine genomic reads produced by Next Generation Sequencing technology (Illumina G1 Genome Analyzer). A short statistical summary, as well as tile-by-tile and cycle-by-cycle graphical representation of clusters density, quality scores and nucleotide frequencies allow easy identification of various technical problems including defective tiles, mistakes in sample/library preparations and abnormalities in the frequencies of appearance of sequenced genomic reads. PIQA is written in the R statistical programming language and is compatible with bustard, fastq and scarf Illumina G1 Genome Analyzer data formats

CiteSeerX

Red Mexicana de Repositorios Institucionales

MicroRNA enrichment among short ‘ultraconserved’ sequences in insects

Author: Ambros
Ambros
AMBROS
Bartel
Berezikov
Boffelli
Bray
Brown
Brudno
Brudno
Cullen
Drysdale
Elnitski
Frazer
Grad
Griffiths-Jones
Hatfield
Havlak
Hillier
J. Miller
Karolchik
Karolchik
Kolbe
Lai
Lee
Lewis
Lu
Mattick
Mattick
Mattick
Miller
Ning
P. Havlak
Pasquinelli
Pasquinelli
Peng
Reinhart
Sandelin
Stajich
Stone
T. Tran
Thomas
Voss
Weber
Zdobnov
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

MicroRNAs are short (∼22 nt) regulatory RNA molecules that play key roles in metazoan development and have been implicated in human disease. First discovered in Caenorhabditis elegans, over 2500 microRNAs have been isolated in metazoans and plants; it has been estimated that there may be more than a thousand microRNA genes in the human genome alone. Motivated by the experimental observation of strong conservation of the microRNA let-7 among nearly all metazoans, we developed a novel methodology to characterize the class of such strongly conserved sequences: we identified a non-redundant set of all sequences 20 to 29 bases in length that are shared among three insects: fly, bee and mosquito. Among the few hundred sequences greater than 20 bases in length are close to 40% of the 78 confirmed fly microRNAs, along with other non-coding RNAs and coding sequence

CiteSeerX

Public Library of Science (PLOS)

Re-Assembly of the Genome of Francisella tularensis Subsp. holarctica OSU18

Author: A Johansson
AM Phillippy
AM Phillippy
D Gordon
Daniela Puiu
DT Dennis
EW Myers
JF Petrosino
JR White
L Rohmer
M Enserink
M Pop
Matthew W. Hahn
MC Schatz
P Havlak
S Kurtz
SL Salzberg
SL Salzberg
Steven L. Salzberg
Publication venue: Public Library of Science
Publication date: 17/10/2008
Field of study

Francisella tularensis is a highly infectious human intracellular pathogen that is the causative agent of tularemia. It occurs in several major subtypes, including the live vaccine strain holarctica (type B). F. tularensis is classified as category A biodefense agent in part because a relatively small number of organisms can cause severe illness. Three complete genomes of subspecies holarctica have been sequenced and deposited in public archives, of which OSU18 was the first and the only strain for which a scientific publication has appeared [1]. We re-assembled the OSU18 strain using both de novo and comparative assembly techniques, and found that the published sequence has two large inversion mis-assemblies. We generated a corrected assembly of the entire genome along with detailed information on the placement of individual reads within the assembly. This assembly will provide a more accurate basis for future comparative studies of this pathogen

Linkage mapping bovine EST-based SNP

Author: A Braun
A Everts-van der Wind
BT Page
C Li
E Casas
EE Connor
GD Schuler
JL Williams
LB Rowe
MD Bishop
ML Clawson
MS Ashwell
N Ihara
P Green
P Havlak
RT Stone
RT Stone
SF Altschul
SM Kappes
SP Wilder
T Schiex
TD Thue
TP Smith
W Barendse
WJ Kent
WM Snelling
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Existing linkage maps of the bovine genome primarily contain anonymous microsatellite markers. These maps have proved valuable for mapping quantitative trait loci (QTL) to broad regions of the genome, but more closely spaced markers are needed to fine-map QTL, and markers associated with genes and annotated sequence are needed to identify genes and sequence variation that may explain QTL. RESULTS: Bovine expressed sequence tag (EST) and bacterial artificial chromosome (BAC)sequence data were used to develop 918 single nucleotide polymorphism (SNP) markers to map genes on the bovine linkage map. DNA of sires from the MARC reference population was used to detect SNPs, and progeny and mates of heterozygous sires were genotyped. Chromosome assignments for 861 SNPs were determined by twopoint analysis, and positions for 735 SNPs were established by multipoint analyses. Linkage maps of bovine autosomes with these SNPs represent 4585 markers in 2475 positions spanning 3058 cM . Markers include 3612 microsatellites, 913 SNPs and 60 other markers. Mean separation between marker positions is 1.2 cM. New SNP markers appear in 511 positions, with mean separation of 4.7 cM. Multi-allelic markers, mostly microsatellites, had a mean (maximum) of 216 (366) informative meioses, and a mean 3-lod confidence interval of 3.6 cM Bi-allelic markers, including SNP and other marker types, had a mean (maximum) of 55 (191) informative meioses, and were placed within a mean 8.5 cM 3-lod confidence interval. Homologous human sequences were identified for 1159 markers, including 582 newly developed and mapped SNP. CONCLUSION: Addition of these EST- and BAC-based SNPs to the bovine linkage map not only increases marker density, but provides connections to gene-rich physical maps, including annotated human sequence. The map provides a resource for fine-mapping quantitative trait loci and identification of positional candidate genes, and can be integrated with other data to guide and refine assembly of bovine genome sequence. Even after the bovine genome is completely sequenced, the map will continue to be a useful tool to link observable phenotypes and animal genotypes to underlying genes and molecular mechanisms influencing economically important beef and dairy traits

DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI

Author: B Langmead
B Schmidt
Bertil Schmidt
BH Bloom
Douglas L Maskell
DR Zerbino
E Lindholm
EW Myers
H Shi
H Shi
J Butler
J Nickolls
J Schröder
JC Dohm
JT Simpson
L Fan
L Salmela
MJ Chaisson
P Havlak
PA Pevzner
R Li
RL Warren
S Batzoglou
WR Jeck
X Huang
Y Liu
Y Liu
Y Liu
Y Liu
Yongchao Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Next-generation sequencing technologies have led to the high-throughput production of sequence data (reads) at low cost. However, these reads are significantly shorter and more error-prone than conventional Sanger shotgun reads. This poses a challenge for the <it>de novo </it>assembly in terms of assembly quality and scalability for large-scale short read datasets. Results We present DecGPU, the first parallel and distributed error correction algorithm for high-throughput short reads (HTSRs) using a hybrid combination of CUDA and MPI parallel programming models. DecGPU provides CPU-based and GPU-based versions, where the CPU-based version employs coarse-grained and fine-grained parallelism using the MPI and OpenMP parallel programming models, and the GPU-based version takes advantage of the CUDA and MPI parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation. The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale HTSR datasets. Using simulated and real datasets, our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the existing error correction algorithms. Furthermore, when combined with Velvet and ABySS, the resulting DecGPU-Velvet and DecGPU-ABySS assemblers demonstrate the potential of our algorithm to improve <it>de novo </it>assembly quality for <it>de</it>-<it>Bruijn</it>-graph-based assemblers. Conclusions DecGPU is publicly available open-source software, written in CUDA C++ and MPI. The experimental results suggest that DecGPU is an effective and feasible error correction algorithm to tackle the flood of short reads produced by next-generation sequencing technologies.</p

MirZ: an integrated microRNA expression atlas and target prediction resource

Author: Ambros
Aravin
Bard
Bennett
Berninger
Brennecke
Brennecke
C. Rodak
Castillo-Davis
Chen
Clark
Eglen
Gaidatzis
Gibbs
Goldston
Griffiths-Jones
Grimson
Havlak
He
Huang
J. Hausser
Kertesz
Krek
Kr tzfeldt
Lai
Landgraf
Lewis
Lewis
Lim
Lindblad-Toh
Lu
M. Zavolan
Majoros
Nollmann
P. Berninger
Pasquinelli
Poy
Pruitt
Rajewsky
REHMSMEIER
Ruby
S. Wirth
Schomburg
Stark
Tafer
The Bovine Genome Sequencing and Analysis Consorti
Waterston
Wightman
Xiao
Xie
Xu
Y. Jantscher
Zhao
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

MicroRNAs (miRNAs) are short RNAs that act as guides for the degradation and translational repression of protein-coding mRNAs. A large body of work showed that miRNAs are involved in the regulation of a broad range of biological functions, from development to cardiac and immune system function, to metabolism, to cancer. For most of the over 500 miRNAs that are encoded in the human genome the functions still remain to be uncovered. Identifying miRNAs whose expression changes between cell types or between normal and pathological conditions is an important step towards characterizing their function as is the prediction of mRNAs that could be targeted by these miRNAs. To provide the community the possibility of exploring interactively miRNA expression patterns and the candidate targets of miRNAs in an integrated environment, we developed the MirZ web server, which is accessible at www.mirz.unibas.ch. The server provides experimental and computational biologists with statistical analysis and data mining tools operating on up-to-date databases of sequencing-based miRNA expression profiles and of predicted miRNA target sites in species ranging from Caenorhabditis elegans to Homo sapiens

edoc

Computational Biology Methods and Their Application to the Comparative Genomics of Endocellular Symbiotic Bacteria of Insects

Author: A. C. Darling
A. C. Tzika
A. E. Douglas
A. E. Hirsh
A. Edwards
A. Romualdi
A. Tauch
B. Ewing
B. Ewing
B. J. Haas
B. R. Graveley
C. M. Fraser
C. Potera
C. S. Riesenfeld
C. Toft
Christina Toft
D. Bartels
D. D. Sommer
D. Gordon
D. Gordon
D. L. Wheeler
D. P. Leader
D. P. Wall
D. Walther
E. Arner
E. Branscomb
E. Camon
E. F. Kirkness
E. R. Tillier
E. Selkov
E. V. Koonin
E. V. Koonin
E. W. Myers
F. Bensadia
F. Sanger
G. Sutton
G. W. Tyson
H. J. Muller
H. Mi
H. Mi
H. Peltola
H. Shizuya
I. Dubchak
I. K. Jordan
J. C. Mullikin
J. C. Venter
J. D. Peterson
J. R. Grant
J. Yang
Jennifer Commins
K. Chen
K. Choi
K. M. Oliver
L. A. Pennacchio
L. B. Koski
L. Stein
M. D. Prickett
M. G. Reese
M. Hohl
M. M. Lee
M. Pop
M. Pop
M. Pop
M. S. Poptsova
M. S. Rappe
M. T. Tammi
M. T. Tammi
Mario A. Fares
N. A. Moran
O. Kaiser
P. A. Pevzner
P. Buchner
P. D. Thomas
P. D. Thomas
P. Green
P. H. Degnan
P. Havlak
P. Hugenholtz
R. A. Holt
R. Fleischmann
R. Ghai
R. J. Mural
S. Batzoglou
S. Celamkoti
S. F. Altschul
S. F. Altschul
S. G. Andersson
S. G. Tringe
S. Istrail
S. M. D. Goldberg
S. Y. Gerdes
T. Chen
T. Chen
T. F. Deluca
T. J. Treangen
T. M. Lowe
T. Wicker
T. Xie
V. Perez-Brocal
W. M. Fitch
W. M. Fitch
X. Huang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Comparative genomics has become a real tantalizing challenge in the postgenomic era. This fact has been mostly magnified by the plethora of new genomes becoming available in a daily bases. The overwhelming list of new genomes to compare has pushed the field of bioinformatics and computational biology forward toward the design and development of methods capable of identifying patterns in a sea of swamping data noise. Despite many advances made in such endeavor, the ever-lasting annoying exceptions to the general patterns remain to pose difficulties in generalizing methods for comparative genomics. In this review, we discuss the different tools devised to undertake the challenge of comparative genomics and some of the exceptions that compromise the generality of such methods. We focus on endosymbiotic bacteria of insects because of their genomic dynamics peculiarities when compared to free-living organisms

Subtle genetic changes enhance virulence of methicillin resistant and sensitive Staphylococcus aureus

Author: A Lukashin
AC Darling
Akif Uzman
AL Delcher
Alicia C Hawes
AM Hanssen
AM Mishaan
Ana Maria Cardenas
AS Bayer
B Andersson
BA Diep
BE Gonzalez
BE Gonzalez
BE Gonzalez
BE Gonzalez
C Abreu-Goodger
CE Bocchini
Christian J Buhay
Christie L Kovar
CJ Chen
DA Robinson
Donna M Muzny
DW Dietrich
Edward O Mason
EM Zdobnov
ES Pan
George E Fox
George M Weinstock
GJ Moran
GW Coombs
Huaiyang Jiang
Huyen H Dinh
IB Dodd
J Jose
J Kaneko
J Sambrook
JA Lindsay
JD Bendtsen
JD Thompson
Jianling Zhou
JL Gardy
Joseph Petrosino
JS Francis
K Tamura
Kristina G Hultén
L Ferrero
LG Miller
Lisa Hemphill
LK McDougal
Lynne V Nazareth
M Margulies
Madhan Tirumalai
MC Enright
MG Bowden
Michael Holder
MP McLeod
MS Francis
ND Rawlings
Okezie Igboeli
P Bengert
P Havlak
PC Appelbaum
Peter R Blyth
Qiaoyan Wang
RP Novick
Régine M Fortunov
S Deresinski
S Narita
S Sreedharan
Sandra L Lee
Sarah K Highlander
Shailaja Yerrapragada
Shannon Dugan
Sheldon L Kaplan
SL Kaplan
SL Kaplan
SR Gill
SV Kazakova
T Hawkins
Tiffany M Williams
Wen Liu
WJ van Wamel
Xiang Qin
XX Ma
Yamei Liu
Yan Ding
Yue Shang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Community acquired (CA) methicillin-resistant <it>Staphylococcus aureus </it>(MRSA) increasingly causes disease worldwide. USA300 has emerged as the predominant clone causing superficial and invasive infections in children and adults in the USA. Epidemiological studies suggest that USA300 is more virulent than other CA-MRSA. The genetic determinants that render virulence and dominance to USA300 remain unclear. Results We sequenced the genomes of two pediatric USA300 isolates: one CA-MRSA and one CA-methicillin susceptible (MSSA), isolated at Texas Children's Hospital in Houston. DNA sequencing was performed by Sanger dideoxy whole genome shotgun (WGS) and 454 Life Sciences pyrosequencing strategies. The sequence of the USA300 MRSA strain was rigorously annotated. In USA300-MRSA 2658 chromosomal open reading frames were predicted and 3.1 and 27 kilobase (kb) plasmids were identified. USA300-MSSA contained a 20 kb plasmid with some homology to the 27 kb plasmid found in USA300-MRSA. Two regions found in US300-MRSA were absent in USA300-MSSA. One of these carried the arginine deiminase operon that appears to have been acquired from <it>S. epidermidis</it>. The USA300 sequence was aligned with other sequenced <it>S. aureus </it>genomes and regions unique to USA300 MRSA were identified. Conclusion USA300-MRSA is highly similar to other MRSA strains based on whole genome alignments and gene content, indicating that the differences in pathogenesis are due to subtle changes rather than to large-scale acquisition of virulence factor genes. The USA300 Houston isolate differs from another sequenced USA300 strain isolate, derived from a patient in San Francisco, in plasmid content and a number of sequence polymorphisms. Such differences will provide new insights into the evolution of pathogens.</p