Search CORE

35,907 research outputs found

Identifying uncertain galaxy morphologies using unsupervised learning

Author: Edwards K.
Gaber M.
Publication venue
Publication date: 09/06/2013
Field of study

Portsmouth University Research Portal (Pure)

Improvements on the k-center problem for uncertain data

Author: Alipour Sharareh
Jafari Amir
Publication venue
Publication date: 30/08/2017
Field of study

In real applications, there are situations where we need to model some problems based on uncertain data. This leads us to define an uncertain model for some classical geometric optimization problems and propose algorithms to solve them. In this paper, we study the

k

-center problem, for uncertain input. In our setting, each uncertain point

P_i

is located independently from other points in one of several possible locations

\{P_{i,1},\dots, P_{i,z_i}\}

in a metric space with metric

d

, with specified probabilities and the goal is to compute

k

-centers

\{c_1,\dots, c_k\}

that minimize the following expected cost

Ecost(c_1,\dots, c_k)=\sum_{R\in \Omega} prob(R)\max_{i=1,\dots, n}\min_{j=1,\dots k} d(\hat{P}_i,c_j)

here

\Omega

is the probability space of all realizations

R=\{\hat{P}_1,\dots, \hat{P}_n\}

of given uncertain points and

prob(R)=\prod_{i=1}^n prob(\hat{P}_i).

In restricted assigned version of this problem, an assignment

A:\{P_1,\dots, P_n\}\rightarrow \{c_1,\dots, c_k\}

is given for any choice of centers and the goal is to minimize

Ecost_A(c_1,\dots, c_k)=\sum_{R\in \Omega} prob(R)\max_{i=1,\dots, n} d(\hat{P}_i,A(P_i)).

In unrestricted version, the assignment is not specified and the goal is to compute

k

centers

\{c_1,\dots, c_k\}

and an assignment

A

that minimize the above expected cost. We give several improved constant approximation factor algorithms for the assigned versions of this problem in a Euclidean space and in a general metric space. Our results significantly improve the results of \cite{guh} and generalize the results of \cite{wang} to any dimension. Our approach is to replace a certain center point for each uncertain point and study the properties of these certain points. The proposed algorithms are efficient and simple to implement

arXiv.org e-Print Archive

Crossref

A Short Survey on Data Clustering Algorithms

Author: Wong Ka-Chun
Publication venue
Publication date: 25/11/2015
Field of study

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end

arXiv.org e-Print Archive

Crossref

Graph Summarization

Author: Bonifati Angela
Dumbrava Stefania
Kondylakis Haridimos
Publication venue
Publication date: 01/04/2020
Field of study

The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Density-based projected clustering of data streams

Author: Gaber M.
Hassani M.
Seidl T.
Spaus P.
Publication venue
Publication date: 01/01/2012
Field of study

Portsmouth University Research Portal (Pure)

Publikationsserver der RWTH Aachen University