10,322 research outputs found

    Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially.

    Get PDF
    Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular k-center variant which, given a set S of points from some metric space and a parameter k0, the algorithms yield solutions whose approximation ratios are a mere additive term \u3f5 away from those achievable by the best known polynomial-time sequential algorithms, a result that substantially improves upon the state of the art. Our algorithms are rather simple and adapt to the intrinsic complexity of the dataset, captured by the doubling dimension D of the metric space. Specifically, our analysis shows that the algorithms become very space-efficient for the important case of small (constant) D. These theoretical results are complemented with a set of experiments on real-world and synthetic datasets of up to over a billion points, which show that our algorithms yield better quality solutions over the state of the art while featuring excellent scalability, and that they also lend themselves to sequential implementations much faster than existing ones

    An objective based classification of aggregation techniques for wireless sensor networks

    No full text
    Wireless Sensor Networks have gained immense popularity in recent years due to their ever increasing capabilities and wide range of critical applications. A huge body of research efforts has been dedicated to find ways to utilize limited resources of these sensor nodes in an efficient manner. One of the common ways to minimize energy consumption has been aggregation of input data. We note that every aggregation technique has an improvement objective to achieve with respect to the output it produces. Each technique is designed to achieve some target e.g. reduce data size, minimize transmission energy, enhance accuracy etc. This paper presents a comprehensive survey of aggregation techniques that can be used in distributed manner to improve lifetime and energy conservation of wireless sensor networks. Main contribution of this work is proposal of a novel classification of such techniques based on the type of improvement they offer when applied to WSNs. Due to the existence of a myriad of definitions of aggregation, we first review the meaning of term aggregation that can be applied to WSN. The concept is then associated with the proposed classes. Each class of techniques is divided into a number of subclasses and a brief literature review of related work in WSN for each of these is also presented

    Research on Approximate Bayesian Computation

    Get PDF
    This thesis presents the development of a new numerical algorithm for statistical inference problems that require sampling from distributions which are intractable. We propose to develop our sampling algorithm based on a class of Monte Carlo methods, Approximate Bayesian Computation (ABC), which are specifically designed to deal with this type of likelihood-free inference. ABC has become a fundamental tool for the analysis of complex models when the likelihood function is computationally intractable or challenging to mathematically specify. The central theme of our approach is to enhance the current ABC algorithms by exploiting the structure of the mathematical models via derivative information. We introduce Progressive Correction of Gaussian Components (PCGC) as a computationally efficient algorithm for generating proposal distributions in our ABC sampler. We demonstrate on two examples that our new ABC algorithm has an acceptance rate that is one to two orders of magnitude better than the basic ABC rejection sampling
    corecore