29,177 research outputs found
Optimal strategies for observation of active galactic nuclei variability with Imaging Atmospheric Cherenkov Telescopes
Variable emission is one of the defining characteristic of active galactic
nuclei (AGN). While providing precious information on the nature and physics of
the sources, variability is often challenging to observe with time- and
field-of-view-limited astronomical observatories such as Imaging Atmospheric
Cherenkov Telescopes (IACTs). In this work, we address two questions relevant
for the observation of sources characterized by AGN-like variability: what is
the most time-efficient way to detect such sources, and what is the
observational bias that can be introduced by the choice of the observing
strategy when conducting blind surveys of the sky. Different observing
strategies are evaluated using simulated light curves and realistic instrument
response functions of the Cherenkov Telescope Array (CTA), a future gamma-ray
observatory. We show that strategies that makes use of very small observing
windows, spread over large periods of time, allows for a faster detection of
the source, and are less influenced by the variability properties of the
sources, as compared to strategies that concentrate the observing time in a
small number of large observing windows. Although derived using CTA as an
example, our conclusions are conceptually valid for any IACTs facility, and in
general, to all observatories with small field of view and limited duty cycle.Comment: 14 pages, 11 figure
Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce
The kernel -means is an effective method for data clustering which extends
the commonly-used -means algorithm to work on a similarity matrix over
complex data structures. The kernel -means algorithm is however
computationally very complex as it requires the complete data matrix to be
calculated and stored. Further, the kernelized nature of the kernel -means
algorithm hinders the parallelization of its computations on modern
infrastructures for distributed computing. In this paper, we are defining a
family of kernel-based low-dimensional embeddings that allows for scaling
kernel -means on MapReduce via an efficient and unified parallelization
strategy. Afterwards, we propose two methods for low-dimensional embedding that
adhere to our definition of the embedding family. Exploiting the proposed
parallelization strategy, we present two scalable MapReduce algorithms for
kernel -means. We demonstrate the effectiveness and efficiency of the
proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data
Mining (SDM), 201
High-performance Kernel Machines with Implicit Distributed Optimization and Randomization
In order to fully utilize "big data", it is often required to use "big
models". Such models tend to grow with the complexity and size of the training
data, and do not make strong parametric assumptions upfront on the nature of
the underlying statistical dependencies. Kernel methods fit this need well, as
they constitute a versatile and principled statistical methodology for solving
a wide range of non-parametric modelling problems. However, their high
computational costs (in storage and time) pose a significant barrier to their
widespread adoption in big data applications.
We propose an algorithmic framework and high-performance implementation for
massive-scale training of kernel-based statistical models, based on combining
two key technical ingredients: (i) distributed general purpose convex
optimization, and (ii) the use of randomization to improve the scalability of
kernel methods. Our approach is based on a block-splitting variant of the
Alternating Directions Method of Multipliers, carefully reconfigured to handle
very large random feature matrices, while exploiting hybrid parallelism
typically found in modern clusters of multicore machines. Our implementation
supports a variety of statistical learning tasks by enabling several loss
functions, regularization schemes, kernels, and layers of randomized
approximations for both dense and sparse datasets, in a highly extensible
framework. We evaluate the ability of our framework to learn models on data
from applications, and provide a comparison against existing sequential and
parallel libraries.Comment: Work presented at MMDS 2014 (June 2014) and JSM 201
- …