Search CORE

25,687 research outputs found

Inferring gene regulatory networks using ensembles of feature selection techniques

Author: Demeester Piet
Dhaene Tom
Geurts Pierre
Huynh-thu Vân anh
Ruyssinck Joeri
Saeys Yvan
Publication venue
Publication date: 01/01/2012
Field of study

Predictive response-relevant clustering of expression data provides insights into disease processes

Author: Abe
Amanda K. Sampson
Anna F. Dominiczak
Bach
Bae
Benjamini
Bennett
Bishop
Breitling
Bunger
Clark
de Snoo
Delyth Graham
Doi
Dudoit
Golub
Gore
Graham
Graham Young
Hanczar
Harris
Hoffbrand
Hubert
Huffman
Irizarry
Jeffs
John D. McClure
Kearney
Keith J. Harris
Lee
Lee
Lisa E. M. Hopcroft
Mark A. Girolami
Martin W. McBride
McBride
Mohri
Park
Stein
Tessa L. Holyoake
Tibshirani
Vinh
Weinberger
Woon
Ziino
Zuber
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/06/2010
Field of study

This article describes and illustrates a novel method of microarray data analysis that couples model-based clustering and binary classification to form clusters of ;response-relevant' genes; that is, genes that are informative when discriminating between the different values of the response. Predictions are subsequently made using an appropriate statistical summary of each gene cluster, which we call the ;meta-covariate' representation of the cluster, in a probit regression model. We first illustrate this method by analysing a leukaemia expression dataset, before focusing closely on the meta-covariate analysis of a renal gene expression dataset in a rat model of salt-sensitive hypertension. We explore the biological insights provided by our analysis of these data. In particular, we identify a highly influential cluster of 13 genes-including three transcription factors (Arntl, Bhlhe41 and Npas2)-that is implicated as being protective against hypertension in response to increased dietary sodium. Functional and canonical pathway analysis of this cluster using Ingenuity Pathway Analysis implicated transcriptional activation and circadian rhythm signalling, respectively. Although we illustrate our method using only expression data, the method is applicable to any high-dimensional datasets

Crossref

PubMed Central

Enlighten

White Rose Research Online

CUED - Cambridge University Engineering Department

A hybrid algorithm for Bayesian network structure learning with application to multi-label learning

Author: Aussem Alex
Elghazel Haytham
Gasse Maxime
Publication venue: 'Elsevier BV'
Publication date: 01/11/2014
Field of study

We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author

arXiv.org e-Print Archive

HAL

Hal-Diderot

Consensus and meta-analysis regulatory networks for combining multiple microarray gene expression datasets

Author: Akaike
Allan Tucker
Beissbarth
Conlon
Courcelle
DerSimonian
Eisen
Emma Steele
Faith
Friedman
Gasch
Grigull
Hanley
Hartemink
Jarvinen
Khil
Kuo
Matzkevich
Ng
Pearl
Pearl
Pennock
Pe’er
Pe’er
Quillardet
Salgado
Sangurdekar
Smyth
Soinov
Spellman
Stoica
Sutton
Teixeira
Wang
Yauk
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with greater confidence, and place less reliance on a single dataset. However, combining datasets directly can be difficult as experiments are often conducted on different microarray platforms, and in different laboratories leading to inherent biases in the data that are not always removed through pre-processing such as normalisation. In this paper we compare two frameworks for combining microarray datasets to model regulatory networks: pre- and post-learning aggregation. In pre-learning approaches, such as using simple scale-normalisation prior to the concatenation of datasets, a model is learnt from a combined dataset, whilst in post-learning aggregation individual models are learnt from each dataset and the models are combined. We present two novel approaches for post-learning aggregation, each based on aggregating high-level features of Bayesian network models that have been generated from different microarray expression datasets. Meta-analysis Bayesian networks are based on combining statistical confidences attached to network edges whilst Consensus Bayesian networks identify consistent network features across all datasets. We apply both approaches to multiple datasets from synthetic and real (Escherichia coli and yeast) networks and demonstrate that both methods can improve on networks learnt from a single dataset or an aggregated dataset formed using a standard scale-normalisation

Elsevier - Publisher Connector

Crossref

Brunel University Research Archive

Recommended from our members

Characterization of the Mycobiome of the Seagrass, Zostera marina, Reveals Putative Associations With Marine Chytrids.

Author: Eisen Jonathan A
Ettinger Cassandra L
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Seagrasses are globally distributed marine flowering plants that are foundation species in coastal ecosystems. Seagrass beds play essential roles as habitats and hatcheries, in nutrient cycling, and in protecting the coastline from erosion. Although many studies have focused on seagrass ecology, only a limited number have investigated their associated fungi. In terrestrial systems, fungi can have beneficial and detrimental effects on plant fitness. However, not much is known about marine fungi and even less is known about seagrass associated fungi. Here we used culture-independent sequencing of the ribosomal internal transcribed spacer (ITS) region to characterize the taxonomic diversity of fungi associated with the seagrass, Zostera marina. We sampled from two Z. marina beds in Bodega Bay over three time points to investigate fungal diversity within and between plants. Our results indicate that there are many fungal taxa for which a taxonomic assignment cannot be made living on and inside Z. marina leaves, roots and rhizomes and that these plant tissues harbor distinct fungal communities. We also identified differences in the abundances of the orders, Glomerellales, Agaricales and Malasseziales, between seagrass tissues. The most prevalent ITS amplicon sequence variants (ASVs) associated with Z. marina tissues could not initially be confidently assigned to a fungal phylum, but shared significant sequence similarity with Chytridiomycota and Aphelidomycota. To obtain a more definitive taxonomic classification of the most abundant ASV associated with Z. marina leaves, we used PCR with one primer targeting a unique region of this ASV's ITS2 and a second primer targeting fungal 28S rRNA genes to amplify part of the 28S rRNA gene region corresponding to this ASV. Sequencing and phylogenetic analysis of the resulting partial 28S rRNA gene revealed that the organism that this ASV comes from is a member of Novel Clade SW-I in the order Lobulomycetales in the phylum Chytridiomycota. This clade includes known parasites of freshwater diatoms and algae and it is possible this chytrid is directly infecting Z. marina leaf tissues. This work highlights a need for further studies focusing on marine fungi and the potential importance of these understudied communities to the larger seagrass ecosystem

eScholarship - University of California

Bioinformatics tools in predictive ecology: Applications to fisheries

Author: Allan Tucker
Anvar Y.
Bishop C. M.
Bundy A.
Choi J. S.
Daniel Duplisea
Ghahramani Z.
Hand D. J.
Hartemink A. J.
Imoto S.
Langley P.
Liang S.
Pe'er D.
Pe'er D.
Pearl J.
Spirtes P.
Steele E.
Publication venue: 'The Royal Society'
Publication date: 19/01/2012
Field of study

This article is made available throught the Brunel Open Access Publishing Fund - Copygith @ 2012 Tucker et al.There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their ‘crossover potential’ with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse

Crossref

PubMed Central

Brunel University Research Archive

Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle

Author: Fan Xiaodan
Liu Jun S.
Pyne Saumyadipta
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 09/11/2010
Field of study

The effort to identify genes with periodic expression during the cell cycle from genome-wide microarray time series data has been ongoing for a decade. However, the lack of rigorous modeling of periodic expression as well as the lack of a comprehensive model for integrating information across genes and experiments has impaired the effort for the accurate identification of periodically expressed genes. To address the problem, we introduce a Bayesian model to integrate multiple independent microarray data sets from three recent genome-wide cell cycle studies on fission yeast. A hierarchical model was used for data integration. In order to facilitate an efficient Monte Carlo sampling from the joint posterior distribution, we develop a novel Metropolis--Hastings group move. A surprising finding from our integrated analysis is that more than 40% of the genes in fission yeast are significantly periodically expressed, greatly enhancing the reported 10--15% of the genes in the current literature. It calls for a reconsideration of the periodically expressed gene detection problem.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS300 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref