47 research outputs found

    Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

    Get PDF
    bibsource: dblp computer science bibliography, http://dblp.org biburl: http://dblp.org/rec/bib/conf/focs/AhmadianNSW17 timestamp: Thu, 16 Nov 2017 15:01:42 +0100 bdsk-url-1: https://doi.org/10.1109/FOCS.2017.15 bdsk-url-2: http://dx.doi.org/10.1109/FOCS.2017.15bibsource: dblp computer science bibliography, http://dblp.org biburl: http://dblp.org/rec/bib/conf/focs/AhmadianNSW17 timestamp: Thu, 16 Nov 2017 15:01:42 +0100 bdsk-url-1: https://doi.org/10.1109/FOCS.2017.15 bdsk-url-2: http://dx.doi.org/10.1109/FOCS.2017.1

    Anomaly detection on flight route using similarity and grouping approach based-on automatic dependent surveillance-broadcast

    Get PDF
    Flight anomaly detection is used to determine the abnormal state data on the flight route. This study focused on two groups: general aviation habits (C1)and anomalies (C2). Groups C1 and C2 are obtained through similarity test with references. The methods used are: 1) normalizing the training data form, 2) forming the training segment 3) calculating the log-likelihood value and determining the maximum log-likelihood (C1) and minimum log-likelihood (C2) values, 4) determining the percentage of data based on criteria C1 and C2 by grouping SVM, KNN, and K-means and 5) Testing with log-likelihood ratio. The results achieved in each segment are Log-likelihood value in C1Latitude is -15.97 and C1Longitude is -16.97. On the other hand, Log-likelihood value in C2Latitude is -19.3 (maximum) and -20.3 (minimum), and log-likelihood value in C2Longitude is -21.2 (maximum) and -24.8 (minimum). The largest percentage value in C1 is 96%, while the largest in C2 is 10%. Thus, the highest potential anomaly data is 10%, and the smallest is 3%. Also, there are performance tests based on F-measure to get accuracy and precision

    Fully Dynamic Consistent Facility Location

    Get PDF
    We consider classic clustering problems in fully dynamic data streams, where data elements can be both inserted and deleted. In this context, several parameters are of importance: (1) the quality of the solution after each insertion or deletion, (2) the time it takes to update the solution, and (3) how different consecutive solutions are. The question of obtaining efficient algorithms in this context for facility location, k-median and k-means has been raised in a recent paper by Hubert-Chan et al. [WWW'18] and also appears as a natural follow-up on the online model with recourse studied by Lattanzi and Vassilvitskii [ICML'17] (i.e.: in insertion-only streams). In this paper, we focus on general metric spaces and mainly on the facility location problem. We give an arguably simple algorithm that maintains a constant factor approximation, with O(n log n) update time, and total recourse O(n). This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to O(n^2) update time, and O(n^2) total recourse. These bounds are nearly optimal: in general metric space, inserting a point take O(n) times to describe the distances to other points, and we give a simple lower bound of O(n) for the recourse. Moreover, we generalize this result for the k-medians and k-means problems: our algorithm maintains a constant factor approximation in time OËś(n+k^2). We complement our analysis with experiments showing that the cost of the solution maintained by our algorithm at any time t is very close to the cost of a solution obtained by quickly recomputing a solution from scratch at time t while having a much better running time

    Better Guarantees for kk-Means and Euclidean kk-Median by Primal-Dual Algorithms

    No full text
    Clustering is a classic topic in optimization with kk-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best-known algorithm for kk-means in Euclidean space with a provable guarantee is a simple local search heuristic yielding an approximation guarantee of 9+ϵ9+\epsilon, a ratio that is known to be tight with respect to such methods. We overcome this barrier by presenting a new primal-dual approach that allows us to (1) exploit the geometric structure of kk-means and (2) satisfy the hard constraint that at most kk clusters are selected without deteriorating the approximation guarantee. Our main result is a 6.357-approximation algorithm with respect to the standard linear programming (LP) relaxation. Our techniques are quite general, and we also show improved guarantees for kk-median in Euclidean metrics and for a generalization of kk-means in which the underlying metric is not required to be Euclidean
    corecore