Search CORE

Out-of-core solution of eigenproblems for macromolecular simulations

Author: CH Bischof
G Quintana-Ortí
GH Golub
GS Ayton
I Bahar
IS Dhillon
J Aliaga
JR Lopez-Blanco
JR López-Blanco
L Skjaerven
P Bientinesi
S Toledo
Y Nakatsukasa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We consider the solution of large-scale eigenvalue problems that appear in the motion simulation of complex macromolecules on desktop platforms. To tackle the dimension of the matrices that are involved in these problems, we formulate out-of-core (OOC) variants of the two selected eigensolvers, that basically decouple the performance of the solver from the storage capacity. Furthermore, we contend with the high computational complexity of the solvers by off-loading the arithmetically-intensive parts of the algorithms to a hardware graphics accelerator

Full-text Institutional Repository of the Ruđer Bošković Institute

UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

Author: A Huber
A Prelić
AA Shabalin
AP Gasch
Asoke K. Nandi
B Abu-Jamous
B Abu-Jamous
Basel Abu-Jamous
C Koch
CH Wade
CT Harbison
D Dikicioglu
D Liu
DA Orlando
David J. Roberts
IS Dhillon
J Bahler
J Yang
JK Choi
JK Limb
JM Pena
JM Stuart
KC Li
KC Li
KY Yeung
KY Yeung
L Lazzeroni
LP Zhao
MB Eisen
P Cahan
P Grandi
PC Roberts
PT Spellman
R Fa
R Lletı́a
R Nilsson
RJ Cho
RM Piro
Rui Fa
S Chu
S Fujii
S Sharma
S Vega-Pons
T Hayata
T Murali
T Pramila
TC Fleischer
VA Gennarino
X Liu
Y Cheng
Y Kluger
Z Tao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/06/2015
Field of study

Background: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.The National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004)

Jyväskylä University Digital Archive

Brunel University Research Archive

Mixture of experts models to exploit global sequence similarity on biomolecular sequence labeling

Author: A Paccanaro
AP Dempster
AY Ng
C Caragea
C Caragea
C Yan
Cornelia Caragea
Drena Dobbs
H Berman
IS Dhillon
J Allers
J Davis
J Shi
JH Kim
Jivko Sinapov
M Terribilini
MI Jordan
N Qian
P Baldi
R Duda
S Russell
TG Dietterich
TG Diettrich
TM Mitchell
Vasant Honavar
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Identification of functionally important sites in biomolecular sequences has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. Experimental determination of such sites lags far behind the number of known biomolecular sequences. Hence, there is a need to develop reliable computational methods for identifying functionally important sites from biomolecular sequences. Results: We present a mixture of experts approach to biomolecular sequence labeling that takes into account the global similarity between biomolecular sequences. Our approach combines unsupervised and supervised learning techniques. Given a set of sequences and a similarity measure defined on pairs of sequences, we learn a mixture of experts model by using spectral clustering to learn the hierarchical structure of the model and by using bayesian techniques to combine the predictions of the experts. We evaluate our approach on two biomolecular sequence labeling problems: RNA-protein and DNA-protein interface prediction problems. The results of our experiments show that global sequence similarity can be exploited to improve the performance of classifiers trained to label biomolecular sequence data. Conclusion: The mixture of experts model helps improve the performance of machine learning methods for identifying functionally important sites in biomolecular sequences.This is a proceeding from IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 10 (2009): S4, doi: 10.1186/1471-2105-10-S4-S4. Posted with permission.</p

Digital Repository @ Iowa State University (ISU)

Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms

Author: A Ben-Dor
A Ben-Dor
A Prelic
A Tanay
A Wille
AA Alizadeh
AP Gasch
B Mirkin
Burton Kuan Hui Chia
D Kostka
Golub
I Ulitsky
IS Dhillon
J Ihmels
J Yang
P Broët
R Krishna Murthy Karuturi
S Barkow
SC Madiera
W Ayadi
X Chen
Y Cheng
Y Kluger
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Biclustering is an important analysis procedure to understand the biological mechanisms from microarray gene expression data. Several algorithms have been proposed to identify biclusters, but very little effort was made to compare the performance of different algorithms on real datasets and combine the resultant biclusters into one unified ranking. Results In this paper we propose differential co-expression framework and a differential co-expression scoring function to objectively quantify quality or goodness of a bicluster of genes based on the observation that genes in a bicluster are co-expressed in the conditions belonged to the bicluster and not co-expressed in the other conditions. Furthermore, we propose a scoring function to stratify biclusters into three types of co-expression. We used the proposed scoring functions to understand the performance and behavior of the four well established biclustering algorithms on six real datasets from different domains by combining their output into one unified ranking. Conclusions Differential co-expression framework is useful to provide quantitative and objective assessment of the goodness of biclusters of co-expressed genes and performance of biclustering algorithms in identifying co-expression biclusters. It also helps to combine the biclusters output by different algorithms into one unified ranking i.e. meta-biclustering.</p

Directory of Open Access Journals

arXiv.org e-Print Archive

Metrics matter in community detection

Author: A Clauset
A Decelle
A Lancichinetti
AJ Gates
D Horta
DH Wolpert
DW Matula
F Radicchi
F Shahrokhi
IS Dhillon
J Reichardt
J Zhang
JFC Kingman
JG Young
L Danon
L Hubert
L Peel
M Rosvall
M Rosvall
Marina Meilă
MEJ Newman
P Zhang
Pascal Pons
S Romano
UN Raghavan
VD Blondel
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/01/2019
Field of study

We present a critical evaluation of normalized mutual information (NMI) as an evaluation metric for community detection. NMI exaggerates the leximin method's performance on weak communities: Does leximin, in finding the trivial singletons clustering, truly outperform eight other community detection methods? Three NMI improvements from the literature are AMI, rrNMI, and cNMI. We show equivalences under relevant random models, and for evaluating community detection, we advise one-sided AMI under the

\mathbb{M}_{\mathrm{all}}

model (all partitions of

n

nodes). This work seeks (1) to start a conversation on robust measurements, and (2) to advocate evaluations which do not give "free lunch"

Systematic gene function prediction from gene expression data by using a fuzzy nearest-cluster method

Author: A Clare
A Mateos
AV Lukashin
D Dembele
D Horn
D-W Kim
HW Mewes
IS Dhillon
JL DeRisi
MB Eisen
MG Walker
MP Brown
O Troyanskaya
P Tamayo
PT Spellman
R Sharan
R Steuer
RO Duda
S Chu
S Dudoit
S Tavazoie
See-Kiong Ng
VN Vapnik
Xiao-Li Li
Y Xu
Yin-Chet Tan
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Quantitative simultaneous monitoring of the expression levels of thousands of genes under various experimental conditions is now possible using microarray experiments. However, there are still gaps toward whole-genome functional annotation of genes using the gene expression data. RESULTS: In this paper, we propose a novel technique called Fuzzy Nearest Clusters for genome-wide functional annotation of unclassified genes. The technique consists of two steps: an initial hierarchical clustering step to detect homogeneous co-expressed gene subgroups or clusters in each possibly heterogeneous functional class; followed by a classification step to predict the functional roles of the unclassified genes based on their corresponding similarities to the detected functional clusters. CONCLUSION: Our experimental results with yeast gene expression data showed that the proposed method can accurately predict the genes' functions, even those with multiple functional roles, and the prediction performance is most independent of the underlying heterogeneity of the complex functional classes, as compared to the other conventional gene function prediction approaches

ScholarBank@NUS

Kernels on Graphs as Proximity Measures

Author: C Lenart
D Boley
D Liben-Nowell
E Estrada
E Estrada
E Estrada
F Chung
F Fouss
F Sommer
I Kivimäki
IJ Schoenberg
IJ Schoenberg
IS Dhillon
IS Dhillon
J Shawe-Taylor
K Avrachenkov
K Avrachenkov
K-R Müller
L Backstrom
L Katz
O Chapelle
P Chebotarev
P Chebotarev
P Chebotarev
PY Chebotarev
PY Chebotarev
PY Chebotarev
RA Horn
SJ Kirkland
SVN Vishwanathan
U Luxburg von
V Ivashkin
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 15/06/2017
Field of study

International audienceKernels and, broadly speaking, similarity measures on graphs are extensively used in graph-based unsupervised and semi-supervised learning algorithms as well as in the link prediction problem. We analytically study proximity and distance properties of various kernels and similarity measures on graphs. This can potentially be useful for recommending the adoption of one or another similarity measure in a machine learning method. Also, we numerically compare various similarity measures in the context of spectral clustering and observe that normalized heat-type similarity measures with log modification generally perform the best

INRIA a CCSD electronic archive server

Recipes for sparse LDA of horizontal data

Author: A Marshall
A Montanari
A Rencher
B Flury
B Flury
BG Osborne
C Hage
D Bragoli
DG Calò
DM Witten
GH Golub
H Shin
IS Dhillon
IT Jolliffe
J Duchene
J Duintjer Tebbens
J Fan
JC Gower
JC Gower
L Clemmensen
M Ng
M Vichi
M Zou
ME Timmerman
N Boumal
N Hao
NA Campbell
NT Trendafilov
NT Trendafilov
NT Trendafilov
NT Trendafilov
NT Trendafilov
P Bickel
P-A Absil
R Tibshirani
RA Fisher
S Mussard
T Cai
T Hastie
TP Conrads
W Gander
WJ Krzanowski
WJ Krzanowski
WJ Krzanowski
WJ Krzanowski
WJ Krzanowski
Z Wen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Many important modern applications require analyzing data with more variables than observations, called for short horizontal. In such situation the classical Fisher’s linear discriminant analysis (LDA) does not possess solution because the within-group scatter matrix is singular. Moreover, the number of the variables is usually huge and the classical type of solutions (discriminant functions) are difficult to interpret as they involve all available variables. Nowadays, the aim is to develop fast and reliable algorithms for sparse LDA of horizontal data. The resulting discriminant functions depend on very few original variables, which facilitates their interpretation. The main theoretical and numerical challenge is how to cope with the singularity of the within-group scatter matrix. This work aims at classifying the existing approaches according to the way they tackle this singularity issue, and suggest new ones

Open Research Online

A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data

Author: A Ben-Dor
A Dharan
A Prelic
A Schliep
A Tanay
A Yip
B Pontes
C Cano
C Gallo
DD Lewis
EL Lehmann
F Angiulli
F Divina
GF Berriz
H Turner
H Wang
IS Dhillon
J Liu
J Yang
JA Hartigan
Jin-Kao Hao
JS Aguilar-Ruiz
K Bryan
K Cheng
L Lazzeroni
L Teng
Mourad Elloumi
R Agrawal
R Balasubramaniyan
S Barkow
S Bergmann
S Bleuler
S Mitra
S Tavazoie
SC Madeira
SC Madeira
SD Peddada
T Hofmann
U Maulik
W Gaul
Wassim Ayadi
X Liu
Y Cheng
Y Cheng
Y Christinat
Y Luan
Y Okada
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called <it>biclustering</it>. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. Methods We introduce <it>BiMine</it>, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, <it>BiMine </it>relies on a new evaluation function called <it>Average Spearman's rho </it>(ASR). Second, <it>BiMine </it>uses a new tree structure, called <it>Bicluster Enumeration Tree </it>(BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, <it>BiMine </it>introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters. Results The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that <it>BiMine </it>competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.</p

Directory of Open Access Journals