Search CORE

153 research outputs found

Unsupervised Algorithms for Microarray Sample Stratification

Author: Cattelani Luca
Federico Antonio
Fratello Michele
Greco Dario
Pavel Alisa
Scala Giovanni
Serra Angela
Publication venue: Springer, UK
Publication date: 01/01/2022
Field of study

The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Trepo - Institutional Repository of Tampere University

Biclustering analysis of transcriptome big data identifies condition-specific microRNA targets

Author: Chi Sang-Mun
Jo Woobeen
Kim Jinhwan
Kim Seon-Young
Nam Dougu
Nguyen HCT
Park Jiyoung
Yoon Sora
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/05/2019
Field of study

We present a novel approach to identify human microRNA (miRNA) regulatory modules (mRNA targets and relevant cell conditions) by biclustering a large collection of mRNA fold-change data for sequence-specific targets. Bicluster targets were assessed using validated messenger RNA (mRNA) targets and exhibited on an average 17.0% (median 19.4%) improved gain in certainty (sensitivity + specificity). The net gain was further increased up to 32.0% (median 33.4%) by incorporating functional networks of targets. We analyzed cancer-specific biclusters and found that the PI3K/Akt signaling pathway is strongly enriched with targets of a few miRNAs in breast cancer and diffuse large B-cell lymphoma. Indeed, five independent prognostic miRNAs were identified, and repression of bicluster targets and pathway activity by miR-29 was experimentally validated. In total, 29 898 biclusters for 459 human miRNAs were collected in the BiMIR database where biclusters are searchable for miRNAs, tissues, diseases, keywords and target genes

ScholarWorks@UNIST

A multi-objective genetic algorithm for biclustering of gene expression data with probabilistic encoding and overlapping control

Author: Marcozzi Michaël
Publication venue
Publication date: 29/09/2010
Field of study

Repository of the University of Namur

BROCCOLI: overlapping and outlier-robust biclustering through proximal stochastic gradient descent

Author: Ceci M.
Hess S.
Hochstenbach M.
Pio G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Matrix tri-factorization subject to binary constraints is a versatile and powerful framework for the simultaneous clustering of observations and features, also known as biclustering. Applications for biclustering encompass the clustering of high-dimensional data and explorative data mining, where the selection of the most important features is relevant. Unfortunately, due to the lack of suitable methods for the optimization subject to binary constraints, the powerful framework of biclustering is typically constrained to clusterings which partition the set of observations or features. As a result, overlap between clusters cannot be modelled and every item, even outliers in the data, have to be assigned to exactly one cluster. In this paper we propose Broccoli, an optimization scheme for matrix factorization subject to binary constraints, which is based on the theoretically well-founded optimization scheme of proximal stochastic gradient descent. Thereby, we do not impose any restrictions on the obtained clusters. Our experimental evaluation, performed on both synthetic and real-world data, and against 6 competitor algorithms, show reliable and competitive performance, even in presence of a high amount of noise in the data. Moreover, a qualitative analysis of the identified clusters shows that Broccoli may provide meaningful and interpretable clustering structures

Archivio istituzionale della ricerca - Università di Bari

A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer

Author: A Subramanian
C Desmedt
Christine Staiger
D Hanahan
E Lee
F Reyal
G Abraham
GR Mishra
Gunnar W. Klau
HY Chuang
I Ulitsky
IW Taylor
Joaquín Dopazo
K Chin
KR Brown
L Ein-Dor
L Tian
LD Miller
LFA Wessels
LJ van’t Veer
Lodewyk F. A. Wessels
M Kanehisa
Marcus Dittrich
MH van Vliet
MJ van de Vijver
ML Gatza
MT Dittrich
P Dao
Raul Kooter
S Loi
S Ma
SA Chowdhury
Sidney Cadot
Tobias Müller
TSK Prasad
V Popovici
Y Pawitan
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/10/2011
Field of study

Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single gene classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single gene classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single gene classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single gene sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single gene classifiers for predicting outcome in breast cancer

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

VU Research Portal

CWI's Institutional Repository

Directory of Open Access Journals

PubMed Central

Online-Publikations-Server der Universität Würzburg

FigShare