Search CORE

2,111 research outputs found

Clustering gene expression data with a penalized graph-based metric

Author: A Baya
A Ben-Hur
A Fred
A Karatzoglou
A Ng
A Richards
A Soukas
A Thalamuthu
AA Alizadeh
AI Su
AK Jain
AK Jain
Ariel E Bayá
B Fischer
B Fischer
B Fischer
B King
B Tjaden
BJ Frey
EJ Yeoh
EP Xing
EY Kim
G McLachlan
G Milligan
J McQueen
J Risinger
J Shawe-Taylor
J Shi
J Tenenbaum
JP Brunet
K Yeung
L Dyrskjot
L Heyer
L Kaufman
L Li
L Liu
M Belkin
M Brito
M de Souto
M Dettling
M Filippone
M Polito
MB Eisen
N Mekuz
P Arabie
P Franti
P Marttinen
Pablo M Granitto
PHA Sneath
R Shai
R Tibshirani
R Tibshirani
R Waite
R Xu
S Calza
S Michele Leone
S Monti
S Pomeroy
S Ramaswamy
S Roweis
TH Cormen
TR Golub
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. Results In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. Conclusions In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

CONICET Digital

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Repositorio Hipermedial de la Universidad Nacional de Rosario

Defining a robust biological prior from Pathway Analysis to drive Network Inference

Author: Ambroise Christophe
Guedj Mickael
Jeanmougin Marine
Publication venue
Publication date: 01/01/2011
Field of study

Inferring genetic networks from gene expression data is one of the most challenging work in the post-genomic era, partly due to the vast space of possible networks and the relatively small amount of data available. In this field, Gaussian Graphical Model (GGM) provides a convenient framework for the discovery of biological networks. In this paper, we propose an original approach for inferring gene regulation networks using a robust biological prior on their structure in order to limit the set of candidate networks. Pathways, that represent biological knowledge on the regulatory networks, will be used as an informative prior knowledge to drive Network Inference. This approach is based on the selection of a relevant set of genes, called the "molecular signature", associated with a condition of interest (for instance, the genes involved in disease development). In this context, differential expression analysis is a well established strategy. However outcome signatures are often not consistent and show little overlap between studies. Thus, we will dedicate the first part of our work to the improvement of the standard process of biomarker identification to guarantee the robustness and reproducibility of the molecular signature. Our approach enables to compare the networks inferred between two conditions of interest (for instance case and control networks) and help along the biological interpretation of results. Thus it allows to identify differential regulations that occur in these conditions. We illustrate the proposed approach by applying our method to a study of breast cancer's response to treatment

arXiv.org e-Print Archive

Numérisation de Documents Anciens Mathématiques

SUBIC: A Supervised Bi-Clustering Approach for Precision Medicine

Author: Levy Phillip
Nezhad Milad Zafar
Sadati Najibesadat
Yang Kai
Zhu Dongxiao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/09/2017
Field of study

Traditional medicine typically applies one-size-fits-all treatment for the entire patient population whereas precision medicine develops tailored treatment schemes for different patient subgroups. The fact that some factors may be more significant for a specific patient subgroup motivates clinicians and medical researchers to develop new approaches to subgroup detection and analysis, which is an effective strategy to personalize treatment. In this study, we propose a novel patient subgroup detection method, called Supervised Biclustring (SUBIC) using convex optimization and apply our approach to detect patient subgroups and prioritize risk factors for hypertension (HTN) in a vulnerable demographic subgroup (African-American). Our approach not only finds patient subgroups with guidance of a clinically relevant target variable but also identifies and prioritizes risk factors by pursuing sparsity of the input variables and encouraging similarity among the input variables and between the input and target variable

arXiv.org e-Print Archive

Crossref

Nonlinear Dimension Reduction for Micro-array Data (Small n and Large p)

Author: Shontz Suzanne
Publication venue
Publication date: 01/01/2006
Field of study

A Cluster Elastic Net for Multivariate Regression

Author: Price Bradley S.
Sherwood Ben
Publication venue
Publication date: 27/03/2018
Field of study

We propose a method for estimating coefficients in multivariate regression when there is a clustering structure to the response variables. The proposed method includes a fusion penalty, to shrink the difference in fitted values from responses in the same cluster, and an L1 penalty for simultaneous variable selection and estimation. The method can be used when the grouping structure of the response variables is known or unknown. When the clustering structure is unknown the method will simultaneously estimate the clusters of the response and the regression coefficients. Theoretical results are presented for the penalized least squares case, including asymptotic results allowing for p >> n. We extend our method to the setting where the responses are binomial variables. We propose a coordinate descent algorithm for both the normal and binomial likelihood, which can easily be extended to other generalized linear model (GLM) settings. Simulations and data examples from business operations and genomics are presented to show the merits of both the least squares and binomial methods.Comment: 37 Pages, 11 Figure

arXiv.org e-Print Archive

KU ScholarWorks