Search CORE

47 research outputs found

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

Author: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
Ahmadian S
Norouzi-Fard A
Svensson O
Ward J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/08/2017
Field of study

bibsource: dblp computer science bibliography, http://dblp.org biburl: http://dblp.org/rec/bib/conf/focs/AhmadianNSW17 timestamp: Thu, 16 Nov 2017 15:01:42 +0100 bdsk-url-1: https://doi.org/10.1109/FOCS.2017.15 bdsk-url-2: http://dx.doi.org/10.1109/FOCS.2017.15bibsource: dblp computer science bibliography, http://dblp.org biburl: http://dblp.org/rec/bib/conf/focs/AhmadianNSW17 timestamp: Thu, 16 Nov 2017 15:01:42 +0100 bdsk-url-1: https://doi.org/10.1109/FOCS.2017.15 bdsk-url-2: http://dx.doi.org/10.1109/FOCS.2017.1

Queen Mary Research Online

Anomaly detection on flight route using similarity and grouping approach based-on automatic dependent surveillance-broadcast

Author: Buliali Joko Lianto
Ginardi Raden Venantius Hari
Pusadan Mohammad Yazdi
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 30/11/2019
Field of study

Flight anomaly detection is used to determine the abnormal state data on the flight route. This study focused on two groups: general aviation habits (C1)and anomalies (C2). Groups C1 and C2 are obtained through similarity test with references. The methods used are: 1) normalizing the training data form, 2) forming the training segment 3) calculating the log-likelihood value and determining the maximum log-likelihood (C1) and minimum log-likelihood (C2) values, 4) determining the percentage of data based on criteria C1 and C2 by grouping SVM, KNN, and K-means and 5) Testing with log-likelihood ratio. The results achieved in each segment are Log-likelihood value in C1Latitude is -15.97 and C1Longitude is -16.97. On the other hand, Log-likelihood value in C2Latitude is -19.3 (maximum) and -20.3 (minimum), and log-likelihood value in C2Longitude is -21.2 (maximum) and -24.8 (minimum). The largest percentage value in C1 is 96%, while the largest in C2 is 10%. Thus, the highest potential anomaly data is 10%, and the smallest is 3%. Also, there are performance tests based on F-measure to get accuracy and precision

International Journal of Advances in Intelligent Informatics

International Journal of Advances in Intelligent Informatics (IJAIN)

Fully Dynamic Consistent Facility Location

Author: Cohen-Addad Vincent
Hjuler Niklas Oskar D.
Parotsidis Nikos
Saulpic David
SCHWIEGELSHOHN CHRIS RENE
Publication venue: H. Wallach and H. Larochelle and A. Beygelzimer and F. d'Alch'e-Buc and E. Fox and R. Garnett
Publication date: 01/01/2019
Field of study

We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, k-median and k-means has been raised in a recent paper by Hubert-Chan et al. [WWW'18] and also appears as a natural follow-up on the online model with recourse studied by Lattanzi and Vassilvitskii [ICML'17] (i.e.: in insertion-only streams). In this paper, we focus on general metric spaces and mainly on the facility location problem. We give an arguably simple algorithm that maintains a constant factor approximation, with O(n log n) update time, and total recourse O(n). This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to O(n^2) update time, and O(n^2) total recourse. These bounds are nearly optimal: in general metric space, inserting a point take O(n) times to describe the distances to other points, and we give a simple lower bound of O(n) for the recourse. Moreover, we generalize this result for the k-medians and k-means problems: our algorithm maintains a constant factor approximation in time O˜(n+k^2). We complement our analysis with experiments showing that the cost of the solution maintained by our algorithm at any time t is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time t while having a much better running time

Archivio della ricerca- Università di Roma La Sapienza

Better Guarantees for $k$ -Means and Euclidean $k$ -Median by Primal-Dual Algorithms

Author: Ahmadian S
Norouzi-Fard A
Svensson O
Ward J
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

Clustering is a classic topic in optimization with

k

-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best-known algorithm for

k

-means in Euclidean space with a provable guarantee is a simple local search heuristic yielding an approximation guarantee of

9+\epsilon

, a ratio that is known to be tight with respect to such methods. We overcome this barrier by presenting a new primal-dual approach that allows us to (1) exploit the geometric structure of

k

-means and (2) satisfy the hard constraint that at most

k

clusters are selected without deteriorating the approximation guarantee. Our main result is a 6.357-approximation algorithm with respect to the standard linear programming (LP) relaxation. Our techniques are quite general, and we also show improved guarantees for

k

-median in Euclidean metrics and for a generalization of

k

-means in which the underlying metric is not required to be Euclidean

Infoscience - École polytechnique fédérale de Lausanne

Queen Mary Research Online