Search CORE

97,434 research outputs found

Temporal Ordered Clustering in Dynamic Networks: Unsupervised and Semi-supervised Learning Algorithms

Author: Sreedharan Jithin K.
Szpankowski Wojciech
Turowski Krzysztof
Publication venue
Publication date: 06/08/2020
Field of study

In temporal ordered clustering, given a single snapshot of a dynamic network in which nodes arrive at distinct time instants, we aim at partitioning its nodes into

K

ordered clusters

\mathcal{C}_1 \prec \cdots \prec \mathcal{C}_K

such that for

i<j

, nodes in cluster

\mathcal{C}_i

arrived before nodes in cluster

\mathcal{C}_j

, with

K

being a data-driven parameter and not known upfront. Such a problem is of considerable significance in many applications ranging from tracking the expansion of fake news to mapping the spread of information. We first formulate our problem for a general dynamic graph, and propose an integer programming framework that finds the optimal clustering, represented as a strict partial order set, achieving the best precision (i.e., fraction of successfully ordered node pairs) for a fixed density (i.e., fraction of comparable node pairs). We then develop a sequential importance procedure and design unsupervised and semi-supervised algorithms to find temporal ordered clusters that efficiently approximate the optimal solution. To illustrate the techniques, we apply our methods to the vertex copying (duplication-divergence) model which exhibits some edge-case challenges in inferring the clusters as compared to other network models. Finally, we validate the performance of the proposed algorithms on synthetic and real-world networks.Comment: 14 pages, 9 figures, and 3 tables. This version is submitted to a journal. A shorter version of this work is published in the proceedings of IEEE International Symposium on Information Theory (ISIT), 2020. The first two authors contributed equall

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Semi-Supervised Learning of Hidden Markov Models via a Homotopy Method

Author: Carin Lawrence
Ji Shihao
Watson Layne T.
Publication venue
Publication date: 01/01/2006
Field of study

Hidden Markov model (HMM) classifier design is considered for analysis of sequential data, incorporating both labeled and unlabeled data for training; the balance between labeled and unlabeled data is controlled by an allocation parameter lambda in [0, 1), where lambda = 0 corresponds to purely supervised HMM learning (based only on the labeled data) and lambda = 1 corresponds to unsupervised HMM-based clustering (based only on the unlabeled data). The associated estimation problem can typically be reduced to solving a set of fixed point equations in the form of a “natural-parameter homotopy”. This paper applies a homotopy method to track a continuous path of solutions, starting from a local supervised solution (lambda = 0) to a local unsupervised solution (lambda = 1). The homotopy method is guaranteed to track with probability one from lambda = 0 to lambda = 1 if the lambda = 0 solution is unique; this condition is not satisfied for the HMM, since the maximum likelihood supervised solution (lambda = 0) is characterized by many local optimal solutions. A modified form of the homotopy map for HMMs assures a track from lambda = 0 to lambda = 1. Following this track leads to a formulation for selecting lambda in [0, 1) for a semi-supervised solution, and it also provides a tool for selection from among multiple (local optimal) supervised solutions. The results of applying the proposed method to measured and synthetic sequential data verify its robustness and feasibility compared to the conventional EM approach for semi-supervised HMM training

Computer Science Technical Reports @Virginia Tech

Natural clustering: the modularity approach

Author: Anderson E
D Marinazzo
L Angelini
M Pellicoro
Ott T
S Stramaglia
Wiggins S
Publication venue: 'IOP Publishing'
Publication date: 01/01/2007
Field of study

We show that modularity, a quantity introduced in the study of networked systems, can be generalized and used in the clustering problem as an indicator for the quality of the solution. The introduction of this measure arises very naturally in the case of clustering algorithms that are rooted in Statistical Mechanics and use the analogy with a physical system.Comment: 11 pages, 5 figure enlarged versio

arXiv.org e-Print Archive

Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially.

Author: Aghamolaei S.
Awasthi P.
Ceccarello M.
Charikar M.
Henzinger M.
Malkomes G.
Mikolov T.
Mitzemacher M.
Munro J.
Zaharia M.
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2019
Field of study

Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular k-center variant which, given a set S of points from some metric space and a parameter k0, the algorithms yield solutions whose approximation ratios are a mere additive term \u3f5 away from those achievable by the best known polynomial-time sequential algorithms, a result that substantially improves upon the state of the art. Our algorithms are rather simple and adapt to the intrinsic complexity of the dataset, captured by the doubling dimension D of the metric space. Specifically, our analysis shows that the algorithms become very space-efficient for the important case of small (constant) D. These theoretical results are complemented with a set of experiments on real-world and synthetic datasets of up to over a billion points, which show that our algorithms yield better quality solutions over the state of the art while featuring excellent scalability, and that they also lend themselves to sequential implementations much faster than existing ones

Archivio istituzionale della ricerca - Università di Padova

Bayesian Agglomerative Clustering with Coalescents

Author: Daumé III Hal
Roy Daniel
Teh Yee Whye
Publication venue
Publication date: 01/01/2009
Field of study

We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.Comment: NIPS 200

arXiv.org e-Print Archive

Oxford University Research Archive