Search CORE

71 research outputs found

A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data

Author: Jérôme Pagès
Marie Verbanck
Sébastien Lê
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

BACKGROUND: Gene clustering algorithms are massively used by biologists when analysing omics data. Classical gene clustering strategies are based on the use of expression data only, directly as in Heatmaps, or indirectly as in clustering based on coexpression networks for instance. However, the classical strategies may not be sufficient to bring out all potential relationships amongst genes. RESULTS: We propose a new unsupervised gene clustering algorithm based on the integration of external biological knowledge, such as Gene Ontology annotations, into expression data. We introduce a new distance between genes which consists in integrating biological knowledge into the analysis of expression data. Therefore, two genes are close if they have both similar expression profiles and similar functional profiles at once. Then a classical algorithm (e.g. K-means) is used to obtain gene clusters. In addition, we propose an automatic evaluation procedure of gene clusters. This procedure is based on two indicators which measure the global coexpression and biological homogeneity of gene clusters. They are associated with hypothesis testing which allows to complement each indicator with a p-value. Our clustering algorithm is compared to the Heatmap clustering and the clustering based on gene coexpression network, both on simulated and real data. In both cases, it outperforms the other methodologies as it provides the highest proportion of significantly coexpressed and biologically homogeneous gene clusters, which are good candidates for interpretation. CONCLUSION: Our new clustering algorithm provides a higher proportion of good candidates for interpretation. Therefore, we expect the interpretation of these clusters to help biologists to formulate new hypothesis on the relationships amongst genes

Springer - Publisher Connector

PubMed Central

Impaired emotional facial expression recognition in alcoholics, opiate dependence subjects, methadone maintained subjects and mixed alcohol-opiate antecedents subjects compared with normal controls

Author: Avants
Beck
Bernard Dan
Brown
Carton
Charles Kornreich
Cloninger
Darke
Davis
Drake
Emde
Finn
Guo
Hall
Handelsman
Haviland
Helmers
Isidore Pelc
Jones
Juan Tecco
Kauhanen
Kendler
Kornreich
Kornreich
Kornreich
Lane
Loas
Lombardo
Marie-Line Foisy
Marlatt
Matsumoto
Monnot
Nelson
Nixon
Parker
Parsons
Patterson
Paul Verbanck
Philippot
Pierre Philippot
Poole
Ross
Rounsaville
Segrin
Selby
Sher
Silberstein
Silberstein
Specka
Streit
Ursula Hess
Xavier Noël
Zacny
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Recommended from our members

Exploiting the GTEx resources to decipher the mechanisms at GWAS loci.

Author: Aguet François
Ardlie Kristin
Barbeira Alvaro N
Bastarache Lisa
Bonazzola Rodrigo
Brown Christopher D
Do Ron
Gamazon Eric R
GTEx Consortium
GTEx GWAS Working Group
Hamel Andrew R
Hormozdiari Farhad
Im Hae Kyung
Jiang Zhuoxun
Jordan Daniel M
Kim-Hellmuth Sarah
Lappalainen Tuuli
Liang Yanyu
Liu Boxiang
McCarthy Mark
Montgomery Stephen B
Park YoSon
Pividori Milton D
Rao Abhiram
Segrè Ayellet V
Stephens Matthew
Verbanck Marie
Wang Gao
Wen Xiaoquan
Zhou Dan
Publication venue: Genome Biol
Publication date: 01/01/2021
Field of study

The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined

Columbia University Academic Commons

Apollo (Cambridge)

MPG.PuRe

Deep Blue Documents at the University of Michigan

Integrating biological knowledge related to coexpression when analysing Xomic data

Author: Lê Sébastien
Verbanck Marie
Publication venue: HAL CCSD
Publication date: 22/08/2010
Field of study

Interpreting results provided by multivariate exploratory methods (such as Principal Component Analysis for instance) applied on genomic data is almost impossible at a gene level due to the number of genes. Integrative approaches which involve the incorporation of biological knowledge have become unavoidable. De Tayrac et al. (2009) proposed a strategy which allows to use an a priori information, such as Gene Ontology (GO) or Kegg terms to enhance their results. The idea consists in constituting modules of genes according to the a priori information and using those modules as a supplementary information in order to interpret results on the basis of the genes' functions. However, the composition of those modules may be disconnected from the structure of the genomic data to be studied and does not consider the di erent degrees of speci city of the terms which convey the existence of di erent levels of regulation. Hence appears the natural idea of improving the way modules are constituted. The aim of this talk is to propose a new approach combining Canonical Correspondence Analysis with Hierarchical Multiple Factor Analysis (Francoa et al., 2009) to get modules that have two main features: 1) they are constituted of genes that belong to the same biological processes; 2) they are constituted of genes that are co-expressed with respect to the data set of interest. The interpretation of the biological processes is thus facilitated by the co-expression of the genes within a group, whereas the method highlights a few key- genes whose functions can be easily taken into account to go deeper into the interpretation. An application of this method to a chicken microarray data set has allowed to bring out the well-known mechanisms implemented in reply to fasting, and to come up with new trails

HAL-Rennes 1

Regularised PCA to denoise and visualise data

Author: Husson François
Josse Julie
Verbanck Marie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

International audiencePrincipal component analysis (PCA) is a well-established method commonly used to explore and visualise data. A classical PCA model is the fixed effect model where data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we propose a regularised version of PCA that boils down to threshold the singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularised term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularised PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The gap between PCA and regularised PCA is all the more important that data are noisy

HAL-Rennes 1