Search CORE

39 research outputs found

GIBA: a clustering tool for detecting protein complexes

Author: AC Gavin
AC Gavin
AD King
AH Tong
AJ Enright
B Snel
Charalampos N Moschopoulos
CN Moschopoulos
D Stoll
E Hartuv
E Sprinzak
GD Bader
Georgios A Pavlopoulos
I Xenarios
M Koyuturk
NJ Krogan
O Puig
P Shannon
Reinhard Schneider
RP Sear
S Brohee
SH Yook
Sophia Kossida
Spiridon D Likothanassis
T Ito
V Spirin
WG Willats
X-L Li
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: During the last years, high throughput experimental methods have been developed which generate large datasets of protein - protein interactions (PPIs). However, due to the experimental methodologies these datasets contain errors mainly in terms of false positive data sets and reducing therefore the quality of any derived information. Typically these datasets can be modeled as graphs, where vertices represent proteins and edges the pairwise PPIs, making it easy to apply automated clustering methods to detect protein complexes or other biological significant functional groupings. Methods: In this paper, a clustering tool, called GIBA (named by the first characters of its developers' nicknames), is presented. GIBA implements a two step procedure to a given dataset of protein-protein interaction data. First, a clustering algorithm is applied to the interaction data, which is then followed by a filtering step to generate the final candidate list of predicted complexes. Results: The efficiency of GIBA is demonstrated through the analysis of 6 different yeast protein interaction datasets in comparison to four other available algorithms. We compared the results of the different methods by applying five different performance measurement metrices. Moreover, the parameters of the methods that constitute the filter have been checked on how they affect the final results. Conclusion: GIBA is an effective and easy to use tool for the detection of protein complexes out of experimentally measured protein - protein interaction networks. The results show that GIBA has superior prediction accuracy than previously published methods

Crossref

Springer

Springer - Publisher Connector

PubMed Central

Open Repository and Bibliography - Luxembourg

Discovery and Expansion of Gene Modules by Seeking Isolated Groups in a Random Graph Process

Author: A Beyer
C Schluter
E Hartuv
EA Winzeler
Elizabeth Conibear
Eshel Ben-Jacob
G Giaever
G Milligan
GD Bader
Jennifer Bryan
Jochen Brumm
L Kiemer
MA Wong
O Rinner
P Shannon
R Tibshirani
RF Ling
RO Duda
S Brohee
S van Dongen
SR Collins
TI Lee
W Huber
W Lee
W Stuetzle
W Stuetzle
Wyeth W. Wasserman
Publication venue: Public Library of Science
Publication date: 09/10/2008
Field of study

BACKGROUND: A central problem in systems biology research is the identification and extension of biological modules-groups of genes or proteins participating in a common cellular process or physical complex. As a result, there is a persistent need for practical, principled methods to infer the modular organization of genes from genome-scale data. RESULTS: We introduce a novel approach for the identification of modules based on the persistence of isolated gene groups within an evolving graph process. First, the underlying genomic data is summarized in the form of ranked gene-gene relationships, thereby accommodating studies that quantify the relevant biological relationship directly or indirectly. Then, the observed gene-gene relationship ranks are viewed as the outcome of a random graph process and candidate modules are given by the identifiable subgraphs that arise during this process. An isolation index is computed for each module, which quantifies the statistical significance of its survival time. CONCLUSIONS: The Miso (module isolation) method predicts gene modules from genomic data and the associated isolation index provides a module-specific measure of confidence. Improving on existing alternative, such as graph clustering and the global pruning of dendrograms, this index offers two intuitively appealing features: (1) the score is module-specific; and (2) different choices of threshold correlate logically with the resulting performance, i.e. a stringent cutoff yields high quality predictions, but low sensitivity. Through the analysis of yeast phenotype data, the Miso method is shown to outperform existing alternatives, in terms of the specificity and sensitivity of its predictions

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer

Author: A Alizadeh
A Ben-Hur
A Jain
A Kapp
AD Gordon
AK Jain
B Everitt
B Mirkin
CV Rijsbergen
Davide Scaturro
E Fowlkes
E Hartuv
Filippo Utro
GJ McLachlan
GW Milligan
I Priness
J Handl
JA Hartigan
JA Rice
JN Breckenridge
KY Yeung
L Hubert
L Kaufman
M Yan
P Hansen
PT Spellman
R Shamir
R Tibshirani
Raffaele Giancarlo
S Datta
S Dudoit
S Monti
T Hastie
V Di Gesú
W Krzanowski
X Wen
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Palermo

Medical record linkage in health information systems by approximate string matching and clustering

Author: A Baxter
A Ben-Dor
AE Monge
AE Monge
AK McCallum
Antoine Buemi
AP Dempster
B Everitt
C Quantin
E Hartuv
EH Porter
Erik A Sauleau
G Navarro
G Navarro
H Kawaji
HB Newcombe
HB Newcombe
I Fellegi
J Hartigan
JA Hylthon
Jean-Philippe Paumier
M Fortini
M Hernandez
M Pavan
MA Jaro
MA Jaro
P Eades
P Sellers
R Baeza-Yates
R Sharan
R Sharan
T Fruchterman
T Kamada
T Vintsyuk
TF Smith
TR Belin
V Levenhstein
W Cohen
WE Winkler
WE Winkler
WE Yancey
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of information and to permit cross-linkage among stand-alone and clustered databases. Furthermore, we need to assist human decision making, by computing a value reflecting identity proximity. METHODS: The proposed method is in three steps. The first step is to standardise and to index elementary identity fields, using blocking variables, in order to speed up information analysis. The second is to match similar pair records, relying on a global similarity value taken from the Porter-Jaro-Winkler algorithm. And the third is to create clusters of coherent related records, using graph drawing, agglomerative clustering methods and partitioning methods. RESULTS: The batch analysis of 300,000 "supposedly" distinct identities isolates 240,000 true unique records, 24,000 duplicates (clusters composed of 2 records) and 3,000 clusters whose size is greater than or equal to 3 records. CONCLUSION: Duplicate-free databases, used in conjunction with relevant indexes and similarity values, allow immediate (i.e.: real-time) proximity detection when inserting a new identity

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The relative vertex clustering value - a new criterion for the fast discovery of functional modules in protein interaction networks

Author: A Clauset
Alioune Ngom
C Pizzuti
CV Mering
D Huang
E Becker
E Hartuv
F Brucker
F Luo
F Radicchi
G Dennis
HN Chua
I Xenarios
JP Bagrow
LH Hartwell
M Girvan
M Li
MS Rahman
N Zaki
P Pei
R-S Wang
S Chen
S Fortunato
S Yook
TV Laarhoven
V Spirin
VD Blondel
XL Li
Zina M Ibrahim
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Interlog protein network: an evolutionary benchmark of protein interaction networks for the evaluation of clustering algorithms

Author: A Vazquez
A Vespignani
A Zhang
AD King
AG Ngounou Wetie
AJ Enright
AL Barabási
AR Carvunis
AW Rives
BH Junker
C Luonan
C Mering von
C Pizzuti
CH Wu
D Croft
D Dotan-Cohen
D Jiang
D Luciani
D Szklarczyk
D Yu
E Hartuv
EY Chen
GD Bader
H Zhou
HTT Phan
HW Mewes
HW Mewes
HW Mewes
J Hou
JB Pereira-Leal
JL Sevilla
K Maciag
L Chen
L Plessis du
LH Hartwell
M Altaf-Ul-Amin
M Ashburner
M Blatt
M Jafari
M Jafari
M Kanehisa
ME Futschik
Mehdi Mirzaie
Mehdi Sadeghi
Mohieddin Jafari
MP Samanta
P Braun
P Khatri
P Tieri
PW Lord
R Dunn
R Guimerà
R Jansen
R Lambiotte
R Sharan
R Sharan
R Srivas
RJGB Campello
S Bandyopadhyay
S Brohée
T Dobzhansky
V Arnau
V Spirin
VD Blondel
YR Cho
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Partitioning Biological Networks into Highly Connected Clusters with Maximum Edge Coverage

Author: B.J. Parker
C. Stark
D. Aloise
D. Jiang
E. Hartuv
E. Hartuv
E.I. Boyle
G. Chartrand
H. Liu
J.Z. Wang
M. Koyutürk
P. Ronhovde
R. Shamir
T.Z. Berardini
Publication venue
Publication date: 01/01/2013
Field of study

Abstract. We introduce the combinatorial optimization problem Highly Connected Deletion, which asks for removing as few edges as possible from a graph such that the resulting graph consists of highly connected components. We show that Highly Connected Deletion is NP-hard and provide a fixed-parameter algorithm and a kernelization. We propose exact and heuristic solution strategies, based on polynomial-time data reduction rules and integer linear programming with column generation. The data reduction typically identifies 85 % of the edges that need to be deleted for an optimal solution; the column generation method can then optimally solve protein interaction networks with up to 5 000 vertices and 12 000 edges.

CiteSeerX

Crossref

Experiments on graph clustering algorithms

Author: A.K. Jain
A.K. Jain
C. Zahn
D. Harel
D. Wagner
E. Hartuv
F.R.K. Chung
G. Ausiello
M.R. Garey
Publication venue: Springer-Verlag
Publication date: 01/01/2003
Field of study

A promising approach to graph clustering is based on the intuitive notion of intra-cluster density vs. inter-cluster sparsity. While both formalizations and algorithms focusing on particular aspects of this rather vague concept have been proposed no conclusive argument on their appropriateness has been given. As a first step towards understanding the consequences of particular conceptions, we conducted an experimental evaluation of graph clustering approaches. By combining proven techniques from graph partitioning and geometric clustering, we also introduce a new approach that compares favorably

KOPS - The Institutional Repository of the University of Konstanz

CiteSeerX

Crossref

Communities in Graphs

Author: D.W. Matula
D.W. Matula
E. Hartuv
H. Nagamochi
Q. Feng
R. Albert
R. Albert
R. Albert
R. Kumar
S. Chakrabarti
Publication venue: Springer
Publication date: 01/01/2002
Field of study

Many applications, like the retrieval of information from the WWW, require or are improved by the detection of sets of closely related vertices in graphs. Depending on the application, many approaches are possible. In this paper we present a purely graph-theoretical approach, independent of the represented data. Based on the edge-connectivity of subgraphs, a tree of subgraphs is constructed, such that the children of a node are pairwise disjoint and contained in their parent. We describe a polynomial algorithm for the construction of the tree and present two heuristics, constructing the correct result in signi cantly decreased time. Furthermore we give a short description of possible applications in the elds of information retrieval, clustering and graph drawing. 1

CiteSeerX

Crossref