Search CORE

20,047 research outputs found

GenClust: A genetic algorithm for clustering gene expression data

Author: Di Gesú Vito
Giancarlo Raffaele
Lo Bosco Giosué
Raimondi Alessandra
Scaturro Davide
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. RESULTS: GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. CONCLUSION: Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Palermo

Proteogenomic integration reveals therapeutic targets in breast cancer xenografts

Author: Cao Song
Davies Sherri R
Ding Li
Erdmann-Gilmore Petra
et al
Guo Zhanfang
Held Jason M
Hoog Jeremy
Huang Kuan-lin
Li Shunqiang
Ma Cynthia
McLellan Michael D
Niu Beifang
Sanati Souzan
Scott Adam
Snider Jacqueline E
Sun Sam Qiancheng
Townsend R. Reid
Wendl Michael C
Wyczalkowski Matthew A
Ye Kai
Yoon Christopher
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

Digital Commons@Becker

clValid: An R Package for Cluster Validation

Author: Guy Brock
Somnath Datta
Susmita Datta
Vasyl Pihur
Publication venue
Publication date
Field of study

The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.

Research Papers in Economics

Clustering of Cases from Di erent Subtypes of Breast Cancer Using a Hop eld Network Built from Multi-omic Data

Author: Calderón-Achío Olger Kitchion
Publication venue: 'Instituto Tecnologico de Costa Rica'
Publication date: 01/01/2018
Field of study

Tesis de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Computación, 2018Despite scienti c advances, breast cancer still constitutes a worldwide major cause of death among women. Given the great heterogeneity between cases, distinct classi cation schemes have emerged. The intrinsic molecular subtype classi cation (luminal A, luminal B, HER2- enriched and basal-like) accounts for the molecular characteristics and prognosis of tumors, which provides valuable input for taking optimal treatment actions. Also, recent advancements in molecular biology have provided scientists with high quality and diversity of omiclike data, opening up the possibility of creating computational models for improving and validating current subtyping systems. On this study, a Hop eld Network model for breast cancer subtyping and characterization was created using data from The Cancer Genome Atlas repository. Novel aspects include the usage of the network as a clustering mechanism and the integrated use of several molecular types of data (gene mRNA expression, miRNA expression and copy number variation). The results showed clustering capabilities for the network, but even so, trying to derive a biological model from a Hop eld Network might be di cult given the mirror attractor phenomena (every cluster might end up with an opposite). As a methodological aspect, Hop eld was compared with kmeans and OPTICS clustering algorithms. The last one, surprisingly, hints at the possibility of creating a high precision model that di erentiates between luminal, HER2-enriched and basal samples using only 10 genes. The normalization procedure of dividing gene expression values by their corresponding gene copy number appears to have contributed to the results. This opens up the possibility of exploring these kind of prediction models for implementing diagnostic tests at a lower cost

Repositorio Institucional del Instituto Tecnologico de Costa Rica

Noise resistant generalized parametric validity index of clustering for gene expression data

Author: Fa R
Nandi AK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2014
Field of study

This article has been made available through the Brunel Open Access Publishing Fund.Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and β to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements

Crossref

Brunel University Research Archive

Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery

Author: A Strehl
A Weingessel
Asoke K. Nandi
B Abu-Jamous
B Abu-Jamous
B Fischer
Basel Abu-Jamous
D Greene
D Liu
D Stuart
David J. Roberts
E Dimitriadou
E Dimitriadou
FD Gibbons
HG Ayad
JM Pena
K Tumer
KY Yeung
LP Zhao
MBH Rhouma
N Slonim
O Nwamadi
PT Spellman
R Avogadri
R BabusÏka
R Baumgartner
R Fa
R Nilsson
RJ Cho
Rui Fa
S Dudoit
S Haykin
S Vega-Pons
S Vega-Pons
SA Salem
Shyamal D. Peddada
T Pramila
TE Kohonen
X Zhou
Z Yu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc

Public Library of Science (PLOS)

CiteSeerX

Crossref

Jyväskylä University Digital Archive

Directory of Open Access Journals

UCL Discovery

PubMed Central

Brunel University Research Archive

FigShare

Tubular cell and keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I IFN and fibrosis relevant pathways.

Author: Accelerating Medicines Partnership Rheumatoid Arthritis and Systemic Lupus Erythematosus (AMP RA/SLE) Consortium
Belmont H Michael
Bornkamp Nicole
Buyon Jill
Clancy Robert
Der Evan
Goilav Beatrice
Graham Jay A
Guthridge Joel
Izmirly Peter
James Judith
Jordan Nicole
Koenigsberg Mordecai
Kustagi Manjunath
Mokrzycki Michele
Morozov Pavel
Pullman James
Putterman Chaim
Ranabothu Saritha
Raychaudhuri Soumya
Rocca Juan P
Rominieki Helen
Schulte Emma
Slowikowski Kamil
Suryawanshi Hemant
Tuschl Thomas
Wu Ming
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

The molecular and cellular processes that lead to renal damage and to the heterogeneity of lupus nephritis (LN) are not well understood. We applied single-cell RNA sequencing (scRNA-seq) to renal biopsies from patients with LN and evaluated skin biopsies as a potential source of diagnostic and prognostic markers of renal disease. Type I interferon (IFN)-response signatures in tubular cells and keratinocytes distinguished patients with LN from healthy control subjects. Moreover, a high IFN-response signature and fibrotic signature in tubular cells were each associated with failure to respond to treatment. Analysis of tubular cells from patients with proliferative, membranous and mixed LN indicated pathways relevant to inflammation and fibrosis, which offer insight into their histologic differences. In summary, we applied scRNA-seq to LN to deconstruct its heterogeneity and identify novel targets for personalized approaches to therapy

PubMed Central

eScholarship - University of California

Reverse-engineering transcriptional modules from gene expression data

Author: De Smet Riet
Joshi Anagha
Marchal Kathleen
Michoel Tom
Van de Peer Yves
Publication venue: 'Wiley'
Publication date: 01/01/2009
Field of study

"Module networks" are a framework to learn gene regulatory networks from expression data using a probabilistic model in which coregulated genes share the same parameters and conditional distributions. We present a method to infer ensembles of such networks and an averaging procedure to extract the statistically most significant modules and their regulators. We show that the inferred probabilistic models extend beyond the data set used to learn the models.Comment: 5 pages REVTeX, 4 figure

arXiv.org e-Print Archive

Ghent University Academic Bibliography