Search CORE

1,052 research outputs found

Clustering by soft-constraint affinity propagation: Applications to gene-expression data

Author: Alizadeh
Blatt
Braunstein
Golub
M. Leone
M. Weigt
Pomeroy
Sumedha
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2007
Field of study

Motivation: Similarity-measure based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck \cite{Frey07}. In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, {\it e.g.}, in analyzing gene expression data. Results: This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new {\it a priori} free-parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.Comment: 11 pages, supplementary material: http://isiosf.isi.it/~weigt/scap_supplement.pd

arXiv.org e-Print Archive

CiteSeerX

Crossref

Message Passing Clustering with Stochastic Merging Based on Kernel Functions

Author: Ali Hesham H.
Deng Xutao
Geng Huimin
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2005
Field of study

In this paper, we propose a new Stochastic Message Passing Clustering (SMPC) algorithm for clustering biological data based on the Message Passing Clustering (MPC) algorithm, which we introduced in earlier work. MPC has shown its advantage when applied to describing parallel and spontaneous biological processes. SMPC, as a generalized version of MPC, extends the clustering algorithm from a deterministic process to a stochastic process, adding three major advantages. First, in deciding the merging cluster pair, the influences of all clusters are quantified by probabilities, estimated by kernel functions based on their relative distances. Second, the proposed algorithm property resolve the “tie” problem, which often occurs for integer distances as in the case of protein interaction data. Third, clustering can be undone to improve the clustering performance when the algorithm detects objects which don’t have good probabilities inside the cluster and moves them outside. The test results on colon cancer gene-expression data show that SMPC performs better than the deterministic MPC

AIS Electronic Library (AISeL)

Estimating sample-specific regulatory networks

Author: Glass Kimberly
Kuijjer Marieke Lydia
Quackenbush John
Tung Matthew
Yuan GuoCheng
Publication venue
Publication date: 28/06/2018
Field of study

Biological systems are driven by intricate interactions among the complex array of molecules that comprise the cell. Many methods have been developed to reconstruct network models of those interactions. These methods often draw on large numbers of samples with measured gene expression profiles to infer connections between genes (or gene products). The result is an aggregate network model representing a single estimate for the likelihood of each interaction, or "edge," in the network. While informative, aggregate models fail to capture the heterogeneity that is represented in any population. Here we propose a method to reverse engineer sample-specific networks from aggregate network models. We demonstrate the accuracy and applicability of our approach in several data sets, including simulated data, microarray expression data from synchronized yeast cells, and RNA-seq data collected from human lymphoblastoid cell lines. We show that these sample-specific networks can be used to study changes in network topology across time and to characterize shifts in gene regulation that may not be apparent in expression data. We believe the ability to generate sample-specific networks will greatly facilitate the application of network methods to the increasingly large, complex, and heterogeneous multi-omic data sets that are currently being generated, and ultimately support the emerging field of precision network medicine

arXiv.org e-Print Archive

Directory of Open Access Journals

NORA - Norwegian Open Research Archives

Systems biology via redescription and ontologies (I): finding phase changes with applications to malaria temporal data

Author: B Zeeberg
BR Zeeberg
Bud Mishra
J Ernst
Kevin Casey
M Antoniotti
M Ashburner
N Friedman
N Slonim
PT Spellman
R Cilibrasi
Samantha Kleinberg
TM Cover
W Clark
Z Bar-Joseph
Publication venue: Springer Netherlands
Publication date: 01/12/2007
Field of study

Biological systems are complex and often composed of many subtly interacting components. Furthermore, such systems evolve through time and, as the underlying biology executes its genetic program, the relationships between components change and undergo dynamic reorganization. Characterizing these relationships precisely is a challenging task, but one that must be undertaken if we are to understand these systems in sufficient detail. One set of tools that may prove useful are the formal principles of model building and checking, which could allow the biologist to frame these inherently temporal questions in a sufficiently rigorous framework. In response to these challenges, GOALIE (Gene ontology algorithmic logic and information extractor) was developed and has been successfully employed in the analysis of high throughput biological data (e.g. time-course gene-expression microarray data and neural spike train recordings). The method has applications to a wide variety of temporal data, indeed any data for which there exist ontological descriptions. This paper describes the algorithms behind GOALIE and its use in the study of the Intraerythrocytic Developmental Cycle (IDC) of Plasmodium falciparum, the parasite responsible for a deadly form of chloroquine resistant malaria. We focus in particular on the problem of finding phase changes, times of reorganization of transcriptional control

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

A novel computational framework for fast, distributed computing and knowledge integration for microarray gene expression data analysis

Author: Sethi Prerna
Publication venue: Louisiana Tech Digital Commons
Publication date: 01/04/2006
Field of study

The healthcare burden and suffering due to life-threatening diseases such as cancer would be significantly reduced by the design and refinement of computational interpretation of micro-molecular data collected by bioinformaticians. Rapid technological advancements in the field of microarray analysis, an important component in the design of in-silico molecular medicine methods, have generated enormous amounts of such data, a trend that has been increasing exponentially over the last few years. However, the analysis and handling of these data has become one of the major bottlenecks in the utilization of the technology. The rate of collection of these data has far surpassed our ability to analyze the data for novel, non-trivial, and important knowledge. The high-performance computing platform, and algorithms that utilize its embedded computing capacity, has emerged as a leading technology that can handle such data-intensive knowledge discovery applications. In this dissertation, we present a novel framework to achieve fast, robust, and accurate (biologically-significant) multi-class classification of gene expression data using distributed knowledge discovery and integration computational routines, specifically for cancer genomics applications. The research presents a unique computational paradigm for the rapid, accurate, and efficient selection of relevant marker genes, while providing parametric controls to ensure flexibility of its application. The proposed paradigm consists of the following key computational steps: (a) preprocess, normalize the gene expression data; (b) discretize the data for knowledge mining application; (c) partition the data using two proposed methods: partitioning with overlapped windows and adaptive selection; (d) perform knowledge discovery on the partitioned data-spaces for association rule discovery; (e) integrate association rules from partitioned data and knowledge spaces on distributed processor nodes using a novel knowledge integration algorithm; and (f) post-analysis and functional elucidation of the discovered gene rule sets. The framework is implemented on a shared-memory multiprocessor supercomputing environment, and several experimental results are demonstrated to evaluate the algorithms. We conclude with a functional interpretation of the computational discovery routines for enhanced biological physiological discovery from cancer genomics datasets, while suggesting some directions for future research

Louisiana Tech Digital Commons

Development of computations in bioscience and bioinformatics and its application: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB06)

Author: Alexander Statnikov
C Zhang
Chaoyang Zhang
DA Hosack
EJ Perkins
FX Wu
G Lu
G Lu
H Chen
Jun Ni
K Ning
K Yang
K Yang
L Mao
L Qin
L Yin
LR Liang
M Pirooznia
M Stepanova
Q Ling
Q Zhang
R Azuma
S Datta
W Shi
X Huang
XL Li
Y Chen
Y Chen
Y Wang
Youping Deng
ZH Duan
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

The first symposium of computations in bioinformatics and bioscience (SCBB06) was held in Hangzhou, China on June 21–22, 2006. Twenty-six peer-reviewed papers were selected for publication in this special issue of BMC Bioinformatics. These papers cover a broad range of topics including bioinformatics theories, algorithms, applications and tool development. The main technical topics contain gene expression analysis, sequence analysis, genome analysis, phylogenetic analysis, gene function prediction, molecular interaction and system biology, genetics and population study, immune strategy, protein structure prediction and proteomics

Aquila Digital Community

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central