Search CORE

3,206 research outputs found

Dissimilarity Clustering by Hierarchical Multi-Level Refinement

Author: Conan-Guez Brieuc
Rossi Fabrice
Publication venue
Publication date: 01/01/2012
Field of study

We introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multi-level heuristic refinement. The method is computationally efficient and achieves better quantization errors than theComment: 20-th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2012), Bruges : Belgium (2012

arXiv.org e-Print Archive

CiteSeerX

HAL-Paris1

Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs

Author: BENTLEY J.L.
BUCHBERGER B.
David Eppstein
DURAN B. S.
GOTOH O.
MATIAS Y.
SUPOWIT K.J.
YIANILOS P.N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1998
Field of study

We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log^2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Groebner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp. 619-628. For source code and experimental results, see http://www.ics.uci.edu/~eppstein/projects/pairs

arXiv.org e-Print Archive

CiteSeerX

Crossref

A new hierarchical clustering algorithm to identify non-overlapping like-minded communities

Author: Adhya Hindol
Deepak Talasila Sai
Gullapalli Bhanuteja
Kejriwal Shyamal
Shannigrahi Saswata
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2016
Field of study

A network has a non-overlapping community structure if the nodes of the network can be partitioned into disjoint sets such that each node in a set is densely connected to other nodes inside the set and sparsely connected to the nodes out- side it. There are many metrics to validate the efficacy of such a structure, such as clustering coefficient, betweenness, centrality, modularity and like-mindedness. Many methods have been proposed to optimize some of these metrics, but none of these works well on the recently introduced metric like-mindedness. To solve this problem, we propose a be- havioral property based algorithm to identify communities that optimize the like-mindedness metric and compare its performance on this metric with other behavioral data based methodologies as well as community detection methods that rely only on structural data. We execute these algorithms on real-life datasets of Filmtipset and Twitter and show that our algorithm performs better than the existing algorithms with respect to the like-mindedness metric

arXiv.org e-Print Archive

Crossref

Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches

Author: A Clauset
A Clauset
A Friggeri
A Lancichinetti
A Lancichinetti
A Van Raan
Alexander Struck
B Ball
C Lee
C Lee
D Sullivan
F Havemann
F Havemann
F Janssens
F Janssens
F Radicchi
Frank Havemann
G Tibély
H Small
IV Marshakova
J Baumes
J Baumes
J Gläser
J Xie
Jochen Gläser
M Rosvall
M Sales-Pardo
M Zitt
Michael Heinz
O Amsterdamska
O Mitesser
R Klavans
Renaud Lambiotte
S Fortunato
S Ghosh
S Gregory
S Gregory
T Evans
V Blondel
W Zachary
X Wang
Y Ahn
Y Kim
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/07/2011
Field of study

We implemented three recently proposed approaches to the identification of overlapping and hierarchical substructures in graphs and applied the corresponding algorithms to a network of 492 information-science papers coupled via their cited sources. The thematic substructures obtained and overlaps produced by the three hierarchical cluster algorithms were compared to a content-based categorisation, which we based on the interpretation of titles and keywords. We defined sets of papers dealing with three topics located on different levels of aggregation: h-index, webometrics, and bibliometrics. We identified these topics with branches in the dendrograms produced by the three cluster algorithms and compared the overlapping topics they detected with one another and with the three pre-defined paper sets. We discuss the advantages and drawbacks of applying the three approaches to paper networks in research fields.Comment: 18 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Steganographer Identification

Author: Breunig
Chen
Cortes
Erdogmus
Filler
Filler
Filler
Fridrich
Fridrich
Fridrich
Fridrich
Gretton
Guo
Hetzl
Holub
Holub
Holub
Holub
Ker
Ker
Ker
Ker
Ker
Ker
Ker
Ker
Kodovsky
Li
Li
Liu
Muandet
Pearson
Pevny
Pevný
Pevný
Pevný
Pevný
Rokach
Sahu
Sallee
Scholkopf
Shi
Song
Westfeld
Wu
Wu
Wu
Publication venue
Publication date: 16/04/2019
Field of study

Conventional steganalysis detects the presence of steganography within single objects. In the real-world, we may face a complex scenario that one or some of multiple users called actors are guilty of using steganography, which is typically defined as the Steganographer Identification Problem (SIP). One might use the conventional steganalysis algorithms to separate stego objects from cover objects and then identify the guilty actors. However, the guilty actors may be lost due to a number of false alarms. To deal with the SIP, most of the state-of-the-arts use unsupervised learning based approaches. In their solutions, each actor holds multiple digital objects, from which a set of feature vectors can be extracted. The well-defined distances between these feature sets are determined to measure the similarity between the corresponding actors. By applying clustering or outlier detection, the most suspicious actor(s) will be judged as the steganographer(s). Though the SIP needs further study, the existing works have good ability to identify the steganographer(s) when non-adaptive steganographic embedding was applied. In this chapter, we will present foundational concepts and review advanced methodologies in SIP. This chapter is self-contained and intended as a tutorial introducing the SIP in the context of media steganography.Comment: A tutorial with 30 page

arXiv.org e-Print Archive

Crossref

Delete or merge regressors for linear model selection

Author: Maj-Kańska Aleksandra
Pokarowski Piotr
Prochenka Agnieszka
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2015
Field of study

We consider a problem of linear model selection in the presence of both continuous and categorical predictors. Feasible models consist of subsets of numerical variables and partitions of levels of factors. A new algorithm called delete or merge regressors (DMR) is presented which is a stepwise backward procedure involving ranking the predictors according to squared t-statistics and choosing the final model minimizing BIC. In the article we prove consistency of DMR when the number of predictors tends to infinity with the sample size and describe a simulation study using a pertaining R package. The results indicate significant advantage in time complexity and selection accuracy of our algorithm over Lasso-based methods described in the literature. Moreover, a version of DMR for generalized linear models is proposed

arXiv.org e-Print Archive

Crossref

A cost function for similarity-based hierarchical clustering

Author: Eldridge J.
Jardine N.
McDiarmid C.
Neal R.
Sokal R.
Publication venue
Publication date: 16/10/2015
Field of study

The development of algorithms for hierarchical clustering has been hampered by a shortage of precise objective functions. To help address this situation, we introduce a simple cost function on hierarchies over a set of points, given pairwise similarities between those points. We show that this criterion behaves sensibly in canonical instances and that it admits a top-down construction procedure with a provably good approximation ratio

arXiv.org e-Print Archive

Crossref