Search CORE

82 research outputs found

An efficient pseudomedian filter for tiling microrrays

Author: Carriero Nicholas J
Gerstein Mark B
Royce Thomas E
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a <it>O</it>(<it>n</it>2log<it>n</it>) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. Results We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of <it>n </it>numbers to <it>O</it>(<it>n</it>log<it>n</it>) from <it>O</it>(<it>n</it>2log<it>n</it>). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple <it>O</it>(log <it>n</it>) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. Conclusion Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at <url>http://tiling.gersteinlab.org/pseudomedian/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Whole-genome association studies on alcoholism comparing different phenotypes using single-nucleotide polymorphisms and microsatellites

Author: Carriero Nicholas J
Chen Liang
Liu Nianjun
Oh Cheongeun
Wang Shuang
Zhao Hongyu
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

Alcoholism is a complex disease. As with other common diseases, genetic variants underlying alcoholism have been illusive, possibly due to the small effect from each individual susceptible variant, gene × environment and gene × gene interactions and complications in phenotype definition. We conducted association tests, the family-based association tests (FBAT) and the backward haplotype transmission association (BHTA), on the Collaborative Study of the Genetics of Alcoholism (COGA) data provided by Genetic Analysis Workshop (GAW) 14. Efron's local false discovery rate method was applied to control the proportion of false discoveries. For FBAT, we compared the results based on different types of genetic markers (single-nucleotide polymorphisms (SNPs) versus microsatellites) and different phenotype definitions (clinical diagnoses versus electrophysiological phenotypes). Significant association results were found only between SNPs and clinical diagnoses. In contrast, significant results were found only between microsatellites and electrophysiological phenotypes. In addition, we obtained the association results for SNPs and microsatellites using COGA diagnosis as phenotype based on BHTA. In this case, the results for SNPs and microsatellites are more consistent. Compared to FBAT, more significant markers are detected with BHTA

Crossref

Springer - Publisher Connector

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

Hinge Atlas: relating protein sequence to sites of structural flexibility

Author: Carriero Nicholas
Flores Samuel C
Gerstein Mark B
Lu Long J
Yang Julie
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Relating features of protein sequences to structural hinges is important for identifying domain boundaries, understanding structure-function relationships, and designing flexibility into proteins. Efforts in this field have been hampered by the lack of a proper dataset for studying characteristics of hinges. Results Using the Molecular Motions Database we have created a Hinge Atlas of manually annotated hinges and a statistical formalism for calculating the enrichment of various types of residues in these hinges. Conclusion We found various correlations between hinges and sequence features. Some of these are expected; for instance, we found that hinges tend to occur on the surface and in coils and turns and to be enriched with small and hydrophilic residues. Others are less obvious and intuitive. In particular, we found that hinges tend to coincide with active sites, but unlike the latter they are not at all conserved in evolution. We evaluate the potential for hinge prediction based on sequence. Motions play an important role in catalysis and protein-ligand interactions. Hinge bending motions comprise the largest class of known motions. Therefore it is important to relate the hinge location to sequence features such as residue type, physicochemical class, secondary structure, solvent exposure, evolutionary conservation, and proximity to active sites. To do this, we first generated the Hinge Atlas, a set of protein motions with the hinge locations manually annotated, and then studied the coincidence of these features with the hinge location. We found that all of the features have bearing on the hinge location. Most interestingly, we found that hinges tend to occur at or near active sites and yet unlike the latter are not conserved. Less surprisingly, we found that hinge residues tend to be small, not hydrophobic or aliphatic, and occur in turns and random coils on the surface. A functional sequence based hinge predictor was made which uses some of the data generated in this study. The Hinge Atlas is made available to the community for further flexibility studies.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity

Author: Balasubramanian Suganthi
Carriero Nicholas
Gerstein Mark B
Khurana Ekta
Liu Yuen-Jong
Robilotto Rebecca
Zheng Deyou
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Pseudogenes provide a record of the molecular evolution of genes. As glycolysis is such a highly conserved and fundamental metabolic pathway, the pseudogenes of glycolytic enzymes comprise a standardized genomic measuring stick and an ideal platform for studying molecular evolution. One of the glycolytic enzymes, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), has already been noted to have one of the largest numbers of associated pseudogenes, among all proteins. Results We assembled the first comprehensive catalog of the processed and duplicated pseudogenes of glycolytic enzymes in many vertebrate model-organism genomes, including human, chimpanzee, mouse, rat, chicken, zebrafish, pufferfish, fruitfly, and worm (available at <url>http://pseudogene.org/glycolysis/</url>). We found that glycolytic pseudogenes are predominantly processed, i.e. retrotransposed from the mRNA of their parent genes. Although each glycolytic enzyme plays a unique role, GAPDH has by far the most pseudogenes, perhaps reflecting its large number of non-glycolytic functions or its possession of a particularly retrotranspositionally active sub-sequence. Furthermore, the number of GAPDH pseudogenes varies significantly among the genomes we studied: none in zebrafish, pufferfish, fruitfly, and worm, 1 in chicken, 50 in chimpanzee, 62 in human, 331 in mouse, and 364 in rat. Next, we developed a simple method of identifying conserved syntenic blocks (consistently applicable to the wide range of organisms in the study) by using orthologous genes as anchors delimiting a conserved block between a pair of genomes. This approach showed that few glycolytic pseudogenes are shared between primate and rodent lineages. Finally, by estimating pseudogene ages using Kimura's two-parameter model of nucleotide substitution, we found evidence for bursts of retrotranspositional activity approximately 42, 36, and 26 million years ago in the human, mouse, and rat lineages, respectively. Conclusion Overall, we performed a consistent analysis of one group of pseudogenes across multiple genomes, finding evidence that most of them were created within the last 50 million years, subsequent to the divergence of rodent and primate lineages.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data

Author: Abyzov Alexej
Carriero Nicholas
Cayting Philip
Gerstein Mark B
Korbel Jan O
Mu Xinmeng Jasmine
Snyder Michael
Zhang Zhengdong
Publication venue: BioMed Central
Publication date: 23/02/2009
Field of study

Paired-End Mapper (PEMer) enables mapping of genomic structural variants at considerably enhanced sensitivity, specificity and resolution over previous approaches

Springer - Publisher Connector

PubMed Central

Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes

Author: Balasubramanian Suganthi
Carriero Nicholas
Cayting Philip
Fang Gang
Frankish Adam
Gerstein Mark
Liu Yuen-Jong
Robilotto Rebecca
Zheng Deyou
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

An analysis of ribosomal protein pseudogenes in the four mammalian genomes reveals no correlation between number of pseudogenes and mRNA abundance

Springer - Publisher Connector

PubMed Central

Latest Achievements on Climate Change and Forest Interactions in a Polluted Environment

Author: Boscaleri Fabio
Calfapietra Carlo
Carriero Giulia
Clarke Nicholas
Cudlin Pavel
Feng Zhaozhong
Fischer Richard
Matteucci Giorgio
Matyssek Rainer
Mikkelsen Teis Nørgaard
Serengil Yusuf
Tuovinen Juha-Pekka
Wieser Gerhard
Publication venue: 'Scientific Research Publishing, Inc.'
Publication date: 01/01/2014
Field of study

Crossref

Online Research Database In Technology

Human neural stem cell transplantation in ALS: initial results from a phase I trial

We report the initial results from a phase I clinical trial for ALS. We transplanted GMP-grade, fetal human neural stem cells from natural in utero death (hNSCs) into the anterior horns of the spinal cord to test for the safety of both cells and neurosurgical procedures in these patients. The trial was approved by the Istituto Superiore di Sanit\ue0 and the competent Ethics Committees and was monitored by an external Safety Board

Crossref

Springer - Publisher Connector

PubMed Central

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Archivio istituzionale della ricerca - Università di Padova

Pseudofam: the pseudogene families database

Author: Altschul
Altschul
Bailey
Bateman
Doxiadis
Durinck
Eisenberg
Ekta Khurana
Finn
Flicek
Gang Fang
Gerstein
Gonclaves
Gruber
Harrison
Hugo Y. K. Lam
Kei-Hoi Cheung
Kim
Liu
Mark B. Gerstein
Nicholas Carriero
Ortutay
Pearson
Philip Cayting
Sassi
Stoesser
Su
Svensson
Tam
Yao
Zhang
Zhang
Zheng
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125 000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes

CiteSeerX

Crossref

PubMed Central

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation

Author: Apweiler
Benson
Birney
Collins
Dennis
Deyou Zheng
Harrison
Harrison
Harrison
Hubbard
John E. Karro
Kent
Khelifi
Khelifi
Liu
Mark Gerstein
Nadkarni
Nicholas Carriero
Ohshima
Paul Harrrison
Philip Cayting
Torrents
Wang
Yangpan Yan
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhaolei Zhang
Zheng
Zheng
Publication venue: Oxford University Press
Publication date: 11/11/2006
Field of study

The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation. The definition of a pseudogene varies within the literature, resulting in significantly different approaches to the problem of identification. Consequently, it is difficult to maintain a consistent collection of pseudogenes in detail necessary for their effective use. Our database is designed to address this issue. It integrates a variety of heterogeneous resources and supports a subset structure that highlights specific groups of pseudogenes that are of interest to the research community. Tools are provided for the comparison of sets and the creation of layered set unions, enabling researchers to derive a current ‘consensus’ set of pseudogenes. Additional features include versatile search, the capacity for robust interaction with other databases, the ability to reconstruct older versions of the database (accounting for changing genome builds) and an underlying object-oriented interface designed for researchers with a minimal knowledge of programming. At the present time, the database contains more than 100 000 pseudogenes spanning 64 prokaryote and 11 eukaryote genomes, including a collection of human annotations compiled from 16 sources

CiteSeerX

Crossref

PubMed Central