Search CORE

43 research outputs found

GIBA: a clustering tool for detecting protein complexes

Author: AC Gavin
AC Gavin
AD King
AH Tong
AJ Enright
B Snel
Charalampos N Moschopoulos
CN Moschopoulos
D Stoll
E Hartuv
E Sprinzak
GD Bader
Georgios A Pavlopoulos
I Xenarios
M Koyuturk
NJ Krogan
O Puig
P Shannon
Reinhard Schneider
RP Sear
S Brohee
SH Yook
Sophia Kossida
Spiridon D Likothanassis
T Ito
V Spirin
WG Willats
X-L Li
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: During the last years, high throughput experimental methods have been developed which generate large datasets of protein - protein interactions (PPIs). However, due to the experimental methodologies these datasets contain errors mainly in terms of false positive data sets and reducing therefore the quality of any derived information. Typically these datasets can be modeled as graphs, where vertices represent proteins and edges the pairwise PPIs, making it easy to apply automated clustering methods to detect protein complexes or other biological significant functional groupings. Methods: In this paper, a clustering tool, called GIBA (named by the first characters of its developers' nicknames), is presented. GIBA implements a two step procedure to a given dataset of protein-protein interaction data. First, a clustering algorithm is applied to the interaction data, which is then followed by a filtering step to generate the final candidate list of predicted complexes. Results: The efficiency of GIBA is demonstrated through the analysis of 6 different yeast protein interaction datasets in comparison to four other available algorithms. We compared the results of the different methods by applying five different performance measurement metrices. Moreover, the parameters of the methods that constitute the filter have been checked on how they affect the final results. Conclusion: GIBA is an effective and easy to use tool for the detection of protein complexes out of experimentally measured protein - protein interaction networks. The results show that GIBA has superior prediction accuracy than previously published methods

Crossref

Springer

Springer - Publisher Connector

PubMed Central

Open Repository and Bibliography - Luxembourg

Discovery and Expansion of Gene Modules by Seeking Isolated Groups in a Random Graph Process

Author: A Beyer
C Schluter
E Hartuv
EA Winzeler
Elizabeth Conibear
Eshel Ben-Jacob
G Giaever
G Milligan
GD Bader
Jennifer Bryan
Jochen Brumm
L Kiemer
MA Wong
O Rinner
P Shannon
R Tibshirani
RF Ling
RO Duda
S Brohee
S van Dongen
SR Collins
TI Lee
W Huber
W Lee
W Stuetzle
W Stuetzle
Wyeth W. Wasserman
Publication venue: Public Library of Science
Publication date: 09/10/2008
Field of study

BACKGROUND: A central problem in systems biology research is the identification and extension of biological modules-groups of genes or proteins participating in a common cellular process or physical complex. As a result, there is a persistent need for practical, principled methods to infer the modular organization of genes from genome-scale data. RESULTS: We introduce a novel approach for the identification of modules based on the persistence of isolated gene groups within an evolving graph process. First, the underlying genomic data is summarized in the form of ranked gene-gene relationships, thereby accommodating studies that quantify the relevant biological relationship directly or indirectly. Then, the observed gene-gene relationship ranks are viewed as the outcome of a random graph process and candidate modules are given by the identifiable subgraphs that arise during this process. An isolation index is computed for each module, which quantifies the statistical significance of its survival time. CONCLUSIONS: The Miso (module isolation) method predicts gene modules from genomic data and the associated isolation index provides a module-specific measure of confidence. Improving on existing alternative, such as graph clustering and the global pruning of dendrograms, this index offers two intuitively appealing features: (1) the score is module-specific; and (2) different choices of threshold correlate logically with the resulting performance, i.e. a stringent cutoff yields high quality predictions, but low sensitivity. Through the analysis of yeast phenotype data, the Miso method is shown to outperform existing alternatives, in terms of the specificity and sensitivity of its predictions

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

Author: AK Jain
AP Gasch
D Dembele
D Hand
E Hartuv
Enzo Medico
EP Xing
I Yanai
J Handl
JC Bezdek
JD Hughes
JT Chi
KY Yeung
KY Yeung
KY Yeung
Limin Fu
MB Eisen
N Belacel
NR Garge
P Tamayo
PT Spellman
Q Sheng
RD Pascual-Marqui
S Tavazioie
ST Roweis
T Kohonen
V Di Gesu
W Zhang
Y Qu
YD Chen
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. RESULTS: The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License. CONCLUSION: The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer

Author: A Alizadeh
A Ben-Hur
A Jain
A Kapp
AD Gordon
AK Jain
B Everitt
B Mirkin
CV Rijsbergen
Davide Scaturro
E Fowlkes
E Hartuv
Filippo Utro
GJ McLachlan
GW Milligan
I Priness
J Handl
JA Hartigan
JA Rice
JN Breckenridge
KY Yeung
L Hubert
L Kaufman
M Yan
P Hansen
PT Spellman
R Shamir
R Tibshirani
Raffaele Giancarlo
S Datta
S Dudoit
S Monti
T Hastie
V Di Gesú
W Krzanowski
X Wen
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Palermo

Medical record linkage in health information systems by approximate string matching and clustering

Author: A Baxter
A Ben-Dor
AE Monge
AE Monge
AK McCallum
Antoine Buemi
AP Dempster
B Everitt
C Quantin
E Hartuv
EH Porter
Erik A Sauleau
G Navarro
G Navarro
H Kawaji
HB Newcombe
HB Newcombe
I Fellegi
J Hartigan
JA Hylthon
Jean-Philippe Paumier
M Fortini
M Hernandez
M Pavan
MA Jaro
MA Jaro
P Eades
P Sellers
R Baeza-Yates
R Sharan
R Sharan
T Fruchterman
T Kamada
T Vintsyuk
TF Smith
TR Belin
V Levenhstein
W Cohen
WE Winkler
WE Winkler
WE Yancey
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of information and to permit cross-linkage among stand-alone and clustered databases. Furthermore, we need to assist human decision making, by computing a value reflecting identity proximity. METHODS: The proposed method is in three steps. The first step is to standardise and to index elementary identity fields, using blocking variables, in order to speed up information analysis. The second is to match similar pair records, relying on a global similarity value taken from the Porter-Jaro-Winkler algorithm. And the third is to create clusters of coherent related records, using graph drawing, agglomerative clustering methods and partitioning methods. RESULTS: The batch analysis of 300,000 "supposedly" distinct identities isolates 240,000 true unique records, 24,000 duplicates (clusters composed of 2 records) and 3,000 clusters whose size is greater than or equal to 3 records. CONCLUSION: Duplicate-free databases, used in conjunction with relevant indexes and similarity values, allow immediate (i.e.: real-time) proximity detection when inserting a new identity

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The relative vertex clustering value - a new criterion for the fast discovery of functional modules in protein interaction networks

Author: A Clauset
Alioune Ngom
C Pizzuti
CV Mering
D Huang
E Becker
E Hartuv
F Brucker
F Luo
F Radicchi
G Dennis
HN Chua
I Xenarios
JP Bagrow
LH Hartwell
M Girvan
M Li
MS Rahman
N Zaki
P Pei
R-S Wang
S Chen
S Fortunato
S Yook
TV Laarhoven
V Spirin
VD Blondel
XL Li
Zina M Ibrahim
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Interlog protein network: an evolutionary benchmark of protein interaction networks for the evaluation of clustering algorithms

Author: A Vazquez
A Vespignani
A Zhang
AD King
AG Ngounou Wetie
AJ Enright
AL Barabási
AR Carvunis
AW Rives
BH Junker
C Luonan
C Mering von
C Pizzuti
CH Wu
D Croft
D Dotan-Cohen
D Jiang
D Luciani
D Szklarczyk
D Yu
E Hartuv
EY Chen
GD Bader
H Zhou
HTT Phan
HW Mewes
HW Mewes
HW Mewes
J Hou
JB Pereira-Leal
JL Sevilla
K Maciag
L Chen
L Plessis du
LH Hartwell
M Altaf-Ul-Amin
M Ashburner
M Blatt
M Jafari
M Jafari
M Kanehisa
ME Futschik
Mehdi Mirzaie
Mehdi Sadeghi
Mohieddin Jafari
MP Samanta
P Braun
P Khatri
P Tieri
PW Lord
R Dunn
R Guimerà
R Jansen
R Lambiotte
R Sharan
R Sharan
R Srivas
RJGB Campello
S Bandyopadhyay
S Brohée
T Dobzhansky
V Arnau
V Spirin
VD Blondel
YR Cho
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Spreading Activation Model for Connectivity Based Clustering

Author: C.N. Ziegler
E. Hartuv
L. Ramaswamy
M.R. Quillian
R.L. Atkinson
Z. Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref

Partitioning Biological Networks into Highly Connected Clusters with Maximum Edge Coverage

Author: B.J. Parker
C. Stark
D. Aloise
D. Jiang
E. Hartuv
E. Hartuv
E.I. Boyle
G. Chartrand
H. Liu
J.Z. Wang
M. Koyutürk
P. Ronhovde
R. Shamir
T.Z. Berardini
Publication venue
Publication date: 01/01/2013
Field of study

Abstract. We introduce the combinatorial optimization problem Highly Connected Deletion, which asks for removing as few edges as possible from a graph such that the resulting graph consists of highly connected components. We show that Highly Connected Deletion is NP-hard and provide a fixed-parameter algorithm and a kernelization. We propose exact and heuristic solution strategies, based on polynomial-time data reduction rules and integer linear programming with column generation. The data reduction typically identifies 85 % of the edges that need to be deleted for an optimal solution; the column generation method can then optimally solve protein interaction networks with up to 5 000 vertices and 12 000 edges.

CiteSeerX

Crossref

Experiments on graph clustering algorithms

Author: A.K. Jain
A.K. Jain
C. Zahn
D. Harel
D. Wagner
E. Hartuv
F.R.K. Chung
G. Ausiello
M.R. Garey
Publication venue: Springer-Verlag
Publication date: 01/01/2003
Field of study

A promising approach to graph clustering is based on the intuitive notion of intra-cluster density vs. inter-cluster sparsity. While both formalizations and algorithms focusing on particular aspects of this rather vague concept have been proposed no conclusive argument on their appropriateness has been given. As a first step towards understanding the consequences of particular conceptions, we conducted an experimental evaluation of graph clustering approaches. By combining proven techniques from graph partitioning and geometric clustering, we also introduce a new approach that compares favorably

KOPS - The Institutional Repository of the University of Konstanz

CiteSeerX

Crossref