29,177 research outputs found

    Optimal strategies for observation of active galactic nuclei variability with Imaging Atmospheric Cherenkov Telescopes

    Full text link
    Variable emission is one of the defining characteristic of active galactic nuclei (AGN). While providing precious information on the nature and physics of the sources, variability is often challenging to observe with time- and field-of-view-limited astronomical observatories such as Imaging Atmospheric Cherenkov Telescopes (IACTs). In this work, we address two questions relevant for the observation of sources characterized by AGN-like variability: what is the most time-efficient way to detect such sources, and what is the observational bias that can be introduced by the choice of the observing strategy when conducting blind surveys of the sky. Different observing strategies are evaluated using simulated light curves and realistic instrument response functions of the Cherenkov Telescope Array (CTA), a future gamma-ray observatory. We show that strategies that makes use of very small observing windows, spread over large periods of time, allows for a faster detection of the source, and are less influenced by the variability properties of the sources, as compared to strategies that concentrate the observing time in a small number of large observing windows. Although derived using CTA as an example, our conclusions are conceptually valid for any IACTs facility, and in general, to all observatories with small field of view and limited duty cycle.Comment: 14 pages, 11 figure

    Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce

    Full text link
    The kernel kk-means is an effective method for data clustering which extends the commonly-used kk-means algorithm to work on a similarity matrix over complex data structures. The kernel kk-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel kk-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. In this paper, we are defining a family of kernel-based low-dimensional embeddings that allows for scaling kernel kk-means on MapReduce via an efficient and unified parallelization strategy. Afterwards, we propose two methods for low-dimensional embedding that adhere to our definition of the embedding family. Exploiting the proposed parallelization strategy, we present two scalable MapReduce algorithms for kernel kk-means. We demonstrate the effectiveness and efficiency of the proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data Mining (SDM), 201

    High-performance Kernel Machines with Implicit Distributed Optimization and Randomization

    Full text link
    In order to fully utilize "big data", it is often required to use "big models". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the underlying statistical dependencies. Kernel methods fit this need well, as they constitute a versatile and principled statistical methodology for solving a wide range of non-parametric modelling problems. However, their high computational costs (in storage and time) pose a significant barrier to their widespread adoption in big data applications. We propose an algorithmic framework and high-performance implementation for massive-scale training of kernel-based statistical models, based on combining two key technical ingredients: (i) distributed general purpose convex optimization, and (ii) the use of randomization to improve the scalability of kernel methods. Our approach is based on a block-splitting variant of the Alternating Directions Method of Multipliers, carefully reconfigured to handle very large random feature matrices, while exploiting hybrid parallelism typically found in modern clusters of multicore machines. Our implementation supports a variety of statistical learning tasks by enabling several loss functions, regularization schemes, kernels, and layers of randomized approximations for both dense and sparse datasets, in a highly extensible framework. We evaluate the ability of our framework to learn models on data from applications, and provide a comparison against existing sequential and parallel libraries.Comment: Work presented at MMDS 2014 (June 2014) and JSM 201
    • …
    corecore