Search CORE

23,267 research outputs found

Genome comparison using Gene Ontology (GO) with statistical testing

Author: Cai Zhaotao
Li Songgang
Mao Xizeng
Wei Liping
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Automated comparison of complete sets of genes encoded in two genomes can provide insight on the genetic basis of differences in biological traits between species. Gene ontology (GO) is used as a common vocabulary to annotate genes for comparison. Current approaches calculate the fold of unweighted or weighted differences between two species at the high-level GO functional categories. However, to ensure the reliability of the differences detected, it is important to evaluate their statistical significance. It is also useful to search for differences at all levels of GO. RESULTS: We propose a statistical approach to find reliable differences between the complete sets of genes encoded in two genomes at all levels of GO. The genes are first assigned GO terms from BLAST searches against genes with known GO assignments, and for each GO term the abundance of genes in the two genomes is compared using a chi-squared test followed by false discovery rate (FDR) correction. We applied this method to find statistically significant differences between two cyanobacteria, Synechocystis sp. PCC6803 and Anabaena sp. PCC7120. We then studied how the set of identified differences vary when different BLAST cutoffs are used. We also studied how the results vary when only subsets of the genes were used in the comparison of human vs. mouse and that of Saccharomyces cerevisiae vs. Schizosaccharomyces pombe. CONCLUSION: There is a surprising lack of statistical approaches for comparing complete genomes at all levels of GO. With the rapid increase of the number of sequenced genomes, we hope that the approach we proposed and tested can make valuable contribution to comparative genomics

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Multiple tests of association with biological annotation metadata

Author: Mark J. Van Der Laan
Mark J. Van Der Laan
Rine Dudoit
Rine Dudoit Sunduz Keles
Sunduz Keles
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

We propose a general and formal statistical framework for multiple tests of association between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest. The known gene-annotation profiles, corresponding to the fixed features of the genome, may concern Gene Ontology (GO) annotation, pathway membership, regulation by particular transcription factors, nucleotide sequences, or protein sequences. The unknown gene-parameter profiles, corresponding to the variable features of the genome, may be, for example, regression coefficients relating possibly censored biological and clinical outcomes to genome-wide transcript levels, DNA copy numbers, and other covariates. A generic question of great interest in current genomic research regards the detection of associations between biological annotation metadata and genome-wide expression measures. This biological question may be translated as the test of multiple hypotheses concerning association measures between gene-annotation profiles and gene-parameter profiles. A general and rigorous formulation of the statistical inference question allows us to apply the multiple hypothesis testing methodology developed in [Multiple Testing Procedures with Applications to Genomics (2008) Springer, New York] and related articles, to control a broad class of Type I error rates, defined as generalized tail probabilities and expected values for arbitrary functions of the numbers of Type I errors and rejected hypotheses. The resampling-based single-step and stepwise multiple testing procedures of [Multiple Testing Procedures with Applications to Genomics (2008) Springer, New York] take into account the joint distribution of the test statistics and provide Type I error control in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics.Comment: Published in at http://dx.doi.org/10.1214/193940307000000446 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis

Author: Ahlquist Paul
Boon Johan A. den
Newton Michael A.
Quintana Fernando A.
Sengupta Srikumar
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 31/08/2007
Field of study

A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene ontology (GO) annotations, is valuable for analyzing the biological signals in microarray expression data. A common approach to measuring enrichment is by cross-classifying genes according to membership in a functional category and membership on a selected list of significantly altered genes. A small Fisher's exact test

p

-value, for example, in this

2\times2

table is indicative of enrichment. Other category analysis methods retain the quantitative gene-level scores and measure significance by referring a category-level statistic to a permutation distribution associated with the original differential expression problem. We describe a class of random-set scoring methods that measure distinct components of the enrichment signal. The class includes Fisher's test based on selected genes and also tests that average gene-level evidence across the category. Averaging and selection methods are compared empirically using Affymetrix data on expression in nasopharyngeal cancer tissue, and theoretically using a location model of differential expression. We find that each method has a domain of superiority in the state space of enrichment problems, and that both methods have benefits in practice. Our analysis also addresses two problems related to multiple-category inference, namely, that equally enriched categories are not detected with equal probability if they are of different sizes, and also that there is dependence among category statistics owing to shared genes. Random-set enrichment calculations do not require Monte Carlo for implementation. They are made available in the R package allez.Comment: Published at http://dx.doi.org/10.1214/07-AOAS104 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Recommended from our members

Common CHD8 Genomic Targets Contrast With Model-Specific Transcriptional Impacts of CHD8 Haploinsufficiency.

Author: Catta-Preta Rinaldo
Lim Kenneth
Nord Alex S
Wade A Ayanna
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

The packaging of DNA into chromatin determines the transcriptional potential of cells and is central to eukaryotic gene regulation. Case sequencing studies have revealed mutations to proteins that regulate chromatin state, known as chromatin remodeling factors, with causal roles in neurodevelopmental disorders. Chromodomain helicase DNA binding protein 8 (CHD8) encodes a chromatin remodeling factor with among the highest de novo loss-of-function mutation rates in patients with autism spectrum disorder (ASD). However, mechanisms associated with CHD8 pathology have yet to be elucidated. We analyzed published transcriptomic data across CHD8 in vitro and in vivo knockdown and knockout models and CHD8 binding across published ChIP-seq datasets to identify convergent mechanisms of gene regulation by CHD8. Differentially expressed genes (DEGs) across models varied, but overlap was observed between downregulated genes involved in neuronal development and function, cell cycle, chromatin dynamics, and RNA processing, and between upregulated genes involved in metabolism and immune response. Considering the variability in transcriptional changes and the cells and tissues represented across ChIP-seq analysis, we found a surprisingly consistent set of high-affinity CHD8 genomic interactions. CHD8 was enriched near promoters of genes involved in basic cell functions and gene regulation. Overlap between high-affinity CHD8 targets and DEGs shows that reduced dosage of CHD8 directly relates to decreased expression of cell cycle, chromatin organization, and RNA processing genes, but only in a subset of studies. This meta-analysis verifies CHD8 as a master regulator of gene expression and reveals a consistent set of high-affinity CHD8 targets across human, mouse, and rat in vivo and in vitro studies. These conserved regulatory targets include many genes that are also implicated in ASD. Our findings suggest a model where perturbation to dosage-sensitive CHD8 genomic interactions with a highly-conserved set of regulatory targets leads to model-specific downstream transcriptional impacts

eScholarship - University of California

FigShare

Sex differences in DNA methylation assessed by 450 K BeadChip in newborns.

Author: Barcellos Lisa
Davé Veronica
Eskenazi Brenda
Holland Nina
Huen Karen
Yousefi Paul
Publication venue: eScholarship, University of California
Publication date: 01/11/2015
Field of study

BackgroundDNA methylation is an important epigenetic mark that can potentially link early life exposures to adverse health outcomes later in life. Host factors like sex and age strongly influence biological variation of DNA methylation, but characterization of these relationships is still limited, particularly in young children.MethodsIn a sample of 111 Mexican-American subjects (58 girls , 53 boys), we interrogated DNA methylation differences by sex at birth using the 450 K BeadChip in umbilical cord blood specimens, adjusting for cell composition.ResultsWe observed that ~3% of CpG sites were differentially methylated between girls and boys at birth (FDR P < 0.05). Of those CpGs, 3031 were located on autosomes, and 82.8% of those were hypermethylated in girls compared to boys. Beyond individual CpGs, we found 3604 sex-associated differentially methylated regions (DMRs) where the majority (75.8%) had higher methylation in girls. Using pathway analysis, we found that sex-associated autosomal CpGs were significantly enriched for gene ontology terms related to nervous system development and behavior. Among hits in our study, 35.9% had been previously reported as sex-associated CpG sites in other published human studies. Further, for replicated hits, the direction of the association with methylation was highly concordant (98.5-100%) with previous studies.ConclusionsTo our knowledge, this is the first reported epigenome-wide analysis by sex at birth that examined DMRs and adjusted for confounding by cell composition. We confirmed previously reported trends that methylation profiles are sex-specific even in autosomal genes, and also identified novel sex-associated CpGs in our methylome-wide analysis immediately after birth, a critical yet relatively unstudied developmental window

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Yeast Features: Identifying Significant Features Shared Among Yeast Proteins for Functional Genomics

Author: Ashkan Golshani
Frank Dehne
James J. Cheetham
James R. Green
Md Alamgir
Michel Dumontier
Myron L. Smith
Nadereh Mir-Rashed
Veronika Eroukova
Publication venue
Publication date: 18/09/2008
Field of study

Background
High throughput yeast functional genomics experiments are revealing associations among tens to hundreds of genes using numerous experimental conditions. To fully understand how the identified genes might be involved in the observed system, it is essential to consider the widest range of biological annotation possible. Biologists often start their search by collating the annotation provided for each protein within databases such as the Saccharomyces Genome Database, manually comparing them for similar features, and empirically assessing their significance. Such tasks can be automated, and more precise calculations of the significance can be determined using established probability measures. 
Results
We developed Yeast Features, an intuitive online tool to help establish the significance of finding a diverse set of shared features among a collection of yeast proteins. A total of 18,786 features from the Saccharomyces Genome Database are considered, including annotation based on the Gene Ontology’s molecular function, biological process and cellular compartment, as well as conserved domains, protein-protein and genetic interactions, complexes, metabolic pathways, phenotypes and publications. The significance of shared features is estimated using a hypergeometric probability, but novel options exist to improve the significance by adding background knowledge of the experimental system. For instance, increased statistical significance is achieved in gene deletion experiments because interactions with essential genes will never be observed. We further demonstrate the utility by suggesting the functional roles of the indirect targets of an aminoglycoside with a known mechanism of action, and also the targets of an herbal extract with a previously unknown mode of action. The identification of shared functional features may also be used to propose novel roles for proteins of unknown function, including a role in protein synthesis for YKL075C.
Conclusions
Yeast Features (YF) is an easy to use web-based application (http://software.dumontierlab.com/yeastfeatures/) which can identify and prioritize features that are shared among a set of yeast proteins. This approach is shown to be valuable in the analysis of complex data sets, in which the extracted associations revealed significant functional relationships among the gene products.&#xa

Nature Precedings

GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

Author: Gordon SV
Hernández B
MacHugh DE
Magee DA
McGettigan PA
Nalpas NC
Parnell AC
Rue-Albrecht K
Publication venue: BioMed Central
Publication date: 25/02/2016
Field of study

Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli

Research Repository UCD

ZENODO

Springer - Publisher Connector

Irish Universities

PubMed Central

Spiral - Imperial College Digital Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Adaptations in energy metabolism and gene family expansions revealed by comparative transcriptomics of three Chagas disease triatomine vectors

Author: Beliera Melina Daniela
Godoy Lozano Ernestina
Lavore Andres Esteban
Martínez Barnetche Jesús
Palacio Victorio Gabriel
Rivera Pomar Rolando
Rodríguez Mario Henry
Téllez Sosa Juan
Zumaya Estrada Federico A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2018
Field of study

Background: Chagas disease is a parasitic infection caused by Trypanosoma cruzi. It is an important public health problem affecting around seven to eight million people in the Americas. A large number of hematophagous triatomine insect species, occupying diverse natural and human-modified ecological niches transmit this disease. Triatomines are long-living hemipterans that have evolved to explode different habitats to associate with their vertebrate hosts. Understanding the molecular basis of the extreme physiological conditions including starvation tolerance and longevity could provide insights for developing novel control strategies. We describe the normalized cDNA, full body transcriptome analysis of three main vectors in North, Central and South America, Triatoma pallidipennis, T. dimidiata and T. infestans. Results: Two-thirds of the de novo assembled transcriptomes map to the Rhodnius prolixus genome and proteome. A Triatoma expansion of the calycin family and two types of protease inhibitors, pacifastins and cystatins were identified. A high number of transcriptionally active class I transposable elements was documented in T. infestans, compared with T. dimidiata and T. pallidipennis. Sequence identity in Triatoma-R. prolixus 1:1 orthologs revealed high sequence divergence in four enzymes participating in gluconeogenesis, glycogen synthesis and the pentose phosphate pathway, indicating high evolutionary rates of these genes. Also, molecular evidence suggesting positive selection was found for several genes of the oxidative phosphorylation I, III and V complexes. Conclusions: Protease inhibitors and calycin-coding gene expansions provide insights into rapidly evolving processes of protease regulation and haematophagy. Higher evolutionary rates in enzymes that exert metabolic flux control towards anabolism and evidence for positive selection in oxidative phosphorylation complexes might represent genetic adaptations, possibly related to prolonged starvation, oxidative stress tolerance, longevity, and hematophagy and flight reduction. Overall, this work generated novel hypothesis related to biological adaptations to extreme physiological conditions and diverse ecological niches that sustain Chagas disease transmission.Fil: Martínez Barnetche, Jesús. Instituto Nacional de Salud Pública; MéxicoFil: Lavore, Andres Esteban. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro de Investigaciones y Transferencia del Noroeste de la Provincia de Buenos Aires. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Centro de Investigaciones y Transferencia del Noroeste de la Provincia de Buenos Aires; Argentina. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Centro de Bioinvestigaciones (Sede Pergamino); ArgentinaFil: Beliera, Melina Daniela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro de Investigaciones y Transferencia del Noroeste de la Provincia de Buenos Aires. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Centro de Investigaciones y Transferencia del Noroeste de la Provincia de Buenos Aires; Argentina. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Centro de Bioinvestigaciones (Sede Pergamino); ArgentinaFil: Téllez Sosa, Juan. Instituto Nacional de Salud Pública; MéxicoFil: Zumaya Estrada, Federico A.. Instituto Nacional de Salud Pública; MéxicoFil: Palacio, Victorio Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro de Investigaciones y Transferencia del Noroeste de la Provincia de Buenos Aires. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Centro de Investigaciones y Transferencia del Noroeste de la Provincia de Buenos Aires; Argentina. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Centro de Bioinvestigaciones (Sede Pergamino); ArgentinaFil: Godoy Lozano, Ernestina. Instituto Nacional de Salud Pública; MéxicoFil: Rivera Pomar, Rolando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro de Investigaciones y Transferencia del Noroeste de la Provincia de Buenos Aires. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Centro de Investigaciones y Transferencia del Noroeste de la Provincia de Buenos Aires; Argentina. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Centro de Bioinvestigaciones (Sede Pergamino); ArgentinaFil: Rodríguez, Mario Henry. Instituto Nacional de Salud Pública; Méxic

CONICET Digital

Directory of Open Access Journals

FigShare