Search CORE

70 research outputs found

Local Guarantees in Graph Cuts and Clustering

Author: A Ben-Dor
A Wirth
AA Schäffer
D Monderer
DS Johnson
ED Demaine
G Christodoulou
HP Kriegel
N Ailon
N Ailon
N Bansal
N Bansal
P Symeonidis
V Filkov
Z Svitkina
Publication venue
Publication date: 02/04/2017
Field of study

Correlation Clustering is an elegant model that captures fundamental graph cut problems such as Min

s-t

Cut, Multiway Cut, and Multicut, extensively studied in combinatorial optimization. Here, we are given a graph with edges labeled

+

-

and the goal is to produce a clustering that agrees with the labels as much as possible:

+

edges within clusters and

-

edges across clusters. The classical approach towards Correlation Clustering (and other graph cut problems) is to optimize a global objective. We depart from this and study local objectives: minimizing the maximum number of disagreements for edges incident on a single node, and the analogous max min agreements objective. This naturally gives rise to a family of basic min-max graph cut problems. A prototypical representative is Min Max

s-t

Cut: find an

s-t

cut minimizing the largest number of cut edges incident on any node. We present the following results:

(1)

O(\sqrt{n})

-approximation for the problem of minimizing the maximum total weight of disagreement edges incident on any node (thus providing the first known approximation for the above family of min-max graph cut problems),

(2)

a remarkably simple

7

-approximation for minimizing local disagreements in complete graphs (improving upon the previous best known approximation of

48

), and

(3)

1/(2+\varepsilon)

-approximation for maximizing the minimum total weight of agreement edges incident on any node, hence improving upon the

1/(4+\varepsilon)

-approximation that follows from the study of approximate pure Nash equilibria in cut and party affiliation games

arXiv.org e-Print Archive

Crossref

Model-based probabilistic frequent itemset mining

Author: Bernecker T
Cheng R
Cheung DW
Kriegel HP
Lee SD
Renz M
Verhein F
Wang L
Zuefle A
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the {Mathematical expression} highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets. © 2012 The Author(s).published_or_final_versio

Springer - Publisher Connector

HKU Scholars Hub

Considerations about multistep community detection

Author: A Broder
A Clauset
A Lancichinetti
AL Barabási
BH Good
FD Malliaros
HP Kriegel
J Reichardt
JC Bezdek
L Danon
M Belkin
M Girvan
ME Newman
ME Newman
ME Newman
P Krapivsky
R Kannan
S Fortunato
S Fortunato
TF Chan
VD Blondel
W Zhang
Publication venue
Publication date: 27/02/2014
Field of study

The problem and implications of community detection in networks have raised a huge attention, for its important applications in both natural and social sciences. A number of algorithms has been developed to solve this problem, addressing either speed optimization or the quality of the partitions calculated. In this paper we propose a multi-step procedure bridging the fastest, but less accurate algorithms (coarse clustering), with the slowest, most effective ones (refinement). By adopting heuristic ranking of the nodes, and classifying a fraction of them as `critical', a refinement step can be restricted to this subset of the network, thus saving computational time. Preliminary numerical results are discussed, showing improvement of the final partition.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca- Università del Salento

Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction

Author: A D’aspremont
B Efron
C Fyfe
CHQ Ding
CM Bishop
D Cai
D Tao
D Tao
D Tao
Dacheng Tao
DB Graham
DL Donoho
E Candes
EJ Candes
GH Golub
GM James
H Hotelling
H Zou
H Zou
H Zou
H Zou
HP Kriegel
J Fan
J Fan
J Lv
JB Tenenbaum
M Belkin
M Belkin
PJ Phillips
PN Belhumeur
R Tibshirani
RA Fisher
S Yan
ST Roweis
T Li
T Zhang
Tianyi Zhou
X He
X Li
Xindong Wu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/07/2010
Field of study

It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al. \cite{LARS}), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: 1) the local geometry of samples is well preserved for low dimensional data representation, 2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, 3) the projection matrix of MEN improves the parsimony in computation, 4) the elastic net penalty reduces the over-fitting problem, and 5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms.Comment: 33 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

OPUS - University of Technology Sydney

Ontology of core data mining entities

Author: A Bernstein
A Golbraikh
A Karalic
B Smith
B Smith
B Smith
C Silla
C Vens
D Demšar
D Kocev
D Kocev
D Qi
D Young
DJ Hand
F Serban
G Madjarov
G Tsoumakas
GH Bakir
H Mannila
HP Kriegel
I Slavkov
J Vanschoren
K Button
Larisa Soldatova
LN Soldatova
M Courtot
M Ford
M Žáková
MA Avery
MA Avery
MF López
O Spjuth
P Robinson
Panče Panov
Q Yang
R Caruana
R Guha
R Guha
RD King
RD King
RR Brinkman
Sašo Džeroski
T Dietterich
V Podpečan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/07/2014
Field of study

In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

Crossref

Brunel University Research Archive

The dispensability of (merely) intentional objects

Author: A Pautz
B Loar
CJ Ducasse
CP Siewert
D Pitt
D Williams
FC Jackson
G Harman
G Priest
G Strawson
H Field
H Putnam
H-N Castañeda
HP Grice
J Perry
N Salmon
NJ Block
R Chisholm
RB Brandom
S Schiffer
T Crane
T Horgan
T Parsons
U Kriegel
U Kriegel
Uriah Kriegel
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The detection and location estimation of disasters using Twitter and the identification of Non-Governmental Organisations using crowdsourcing

Author: B Jongman
CC Aggarwal
CH Lee
F Atefeh
HM Saleem
HP Kriegel
J Capdevila
J Sander
J Weng
JP De Albuquerque
K Sparck Jones
M Kremer
M Sokolova
NV Chawla
O Ozdikis
PM Landwehr
R Díaz-Uriarte
R Rifkin
S Unankard
SM Omohundro
T Cheng
T Sakaki
TBN Hoang
W Sherchan
WH Smith
X Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/07/2020
Field of study

status: publishe

Lirias

Crossref

Edinburgh Research Explorer

Learning in parallel universes

Author: A Bender
B Bustos
B Bustos
B Wiswedel
Bernd Wiswedel
F Klawonn
Frank Höppner
HP Kriegel
JC Bezdek
JH Friedman
Michael R. Berthold
R Krishnapuram
R Krishnapuram
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

Author: A Mccallum
AC McHardy
B Fei
B Mirkin
B Slabbinck
B Slabbinck
B Slabbinck
Bernard De Baets
BM Hansen
Bram Slabbinck
C Kunitsky
C Vens
D Hutsebaut
D Koller
DE Stead
E Pruesse
E Stackebrandt
E Stackebrandt
F Wu
HP Kriegel
I Letunic
IH Witten
J Felsenstein
J Felsenstein
J Rousu
JP Euzéby
JP Parker
JS Lee
KT Konstantinidis
KT Konstantinidis
L Breiman
L Breiman
LG Wayne
M Heyndrickx
M Vancanneyt
N Cesa-Bianchi
N Diaz
N Saitou
P Dawyndt
P Dawyndt
P Kämpfer
Paul De Vos
Peter Dawyndt
R Craig
R Kohavi
RO Duda
RR Sokal
S Cheong
S Dumais
SP Lapage
T Hastie
T Hofmann
Willem Waegeman
Z Barutcuoglu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central

Using diffusion distances for flexible molecular shape comparison

Author: A Elad
A Meyer
AM Bronstein
B Li
BB Masek
BB Masek
C Chennubhotla
G Nicola
Guo-Qin Zheng
H Ling
HP Kriegel
I Sommer
J Grant
J Grant
JB Tenenbaum
JS Yeh
K Lasker
Karthik Ramani
KL Damm
L Mak
L Sael
L Sael
M Connolly
M Connolly
M Kazhdan
M Mahmoudi
MM Bronstein
N Echols
O Dror
P Ballester
P Ballester
P Daras
P Shilane
P Willett
Qi Li
R Coifman
R Coifman
R Coifman
R Gal
R Morris
R Osada
R Rustamov
R Rustamov
R Singh
R Zauhar
S Grandison
S Jayanti
S Lafon
T Funkhouser
V Jain
William Benjamin
Y Fang
Y Lipman
YS Liu
YS Liu
Yu-Shen Liu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Many molecules are flexible and undergo significant shape deformation as part of their function, and yet most existing molecular shape comparison (MSC) methods treat them as rigid bodies, which may lead to incorrect shape recognition. Results In this paper, we present a new shape descriptor, named Diffusion Distance Shape Descriptor (DDSD), for comparing 3D shapes of flexible molecules. The diffusion distance in our work is considered as an average length of paths connecting two landmark points on the molecular shape in a sense of inner distances. The diffusion distance is robust to flexible shape deformation, in particular to topological changes, and it reflects well the molecular structure and deformation without explicit decomposition. Our DDSD is stored as a histogram which is a probability distribution of diffusion distances between all sample point pairs on the molecular surface. Finally, the problem of flexible MSC is reduced to comparison of DDSD histograms. Conclusions We illustrate that DDSD is insensitive to shape deformation of flexible molecules and more effective at capturing molecular structures than traditional shape descriptors. The presented algorithm is robust and does not require any prior knowledge of the flexible regions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Purdue E-Pubs