40,769 research outputs found
Semantic Stability in Social Tagging Streams
One potential disadvantage of social tagging systems is that due to the lack
of a centralized vocabulary, a crowd of users may never manage to reach a
consensus on the description of resources (e.g., books, users or songs) on the
Web. Yet, previous research has provided interesting evidence that the tag
distributions of resources may become semantically stable over time as more and
more users tag them. At the same time, previous work has raised an array of new
questions such as: (i) How can we assess the semantic stability of social
tagging systems in a robust and methodical way? (ii) Does semantic
stabilization of tags vary across different social tagging systems and
ultimately, (iii) what are the factors that can explain semantic stabilization
in such systems? In this work we tackle these questions by (i) presenting a
novel and robust method which overcomes a number of limitations in existing
methods, (ii) empirically investigating semantic stabilization processes in a
wide range of social tagging systems with distinct domains and properties and
(iii) detecting potential causes for semantic stabilization, specifically
imitation behavior, shared background knowledge and intrinsic properties of
natural language. Our results show that tagging streams which are generated by
a combination of imitation dynamics and shared background knowledge exhibit
faster and higher semantic stability than tagging streams which are generated
via imitation dynamics or natural language streams alone
The Atacama Cosmology Telescope: Data Characterization and Map Making
We present a description of the data reduction and mapmaking pipeline used
for the 2008 observing season of the Atacama Cosmology Telescope (ACT). The
data presented here at 148 GHz represent 12% of the 90 TB collected by ACT from
2007 to 2010. In 2008 we observed for 136 days, producing a total of 1423 hours
of data (11 TB for the 148 GHz band only), with a daily average of 10.5 hours
of observation. From these, 1085 hours were devoted to a 850 deg^2 stripe (11.2
hours by 9.1 deg) centered on a declination of -52.7 deg, while 175 hours were
devoted to a 280 deg^2 stripe (4.5 hours by 4.8 deg) centered at the celestial
equator. We discuss sources of statistical and systematic noise, calibration,
telescope pointing, and data selection. Out of 1260 survey hours and 1024
detectors per array, 816 hours and 593 effective detectors remain after data
selection for this frequency band, yielding a 38% survey efficiency. The total
sensitivity in 2008, determined from the noise level between 5 Hz and 20 Hz in
the time-ordered data stream (TOD), is 32 micro-Kelvin sqrt{s} in CMB units.
Atmospheric brightness fluctuations constitute the main contaminant in the data
and dominate the detector noise covariance at low frequencies in the TOD. The
maps were made by solving the least-squares problem using the Preconditioned
Conjugate Gradient method, incorporating the details of the detector and noise
correlations. Cross-correlation with WMAP sky maps, as well as analysis from
simulations, reveal that our maps are unbiased at multipoles ell > 300. This
paper accompanies the public release of the 148 GHz southern stripe maps from
2008. The techniques described here will be applied to future maps and data
releases.Comment: 20 pages, 18 figures, 6 tables, an ACT Collaboration pape
Sequential Quantiles via Hermite Series Density Estimation
Sequential quantile estimation refers to incorporating observations into
quantile estimates in an incremental fashion thus furnishing an online estimate
of one or more quantiles at any given point in time. Sequential quantile
estimation is also known as online quantile estimation. This area is relevant
to the analysis of data streams and to the one-pass analysis of massive data
sets. Applications include network traffic and latency analysis, real time
fraud detection and high frequency trading. We introduce new techniques for
online quantile estimation based on Hermite series estimators in the settings
of static quantile estimation and dynamic quantile estimation. In the static
quantile estimation setting we apply the existing Gauss-Hermite expansion in a
novel manner. In particular, we exploit the fact that Gauss-Hermite
coefficients can be updated in a sequential manner. To treat dynamic quantile
estimation we introduce a novel expansion with an exponentially weighted
estimator for the Gauss-Hermite coefficients which we term the Exponentially
Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing
sequential quantile estimation algorithms in that they allow arbitrary
quantiles (as opposed to pre-specified quantiles) to be estimated at any point
in time. In doing so we provide a solution to online distribution function and
online quantile function estimation on data streams. In particular we derive an
analytical expression for the CDF and prove consistency results for the CDF
under certain conditions. In addition we analyse the associated quantile
estimator. Simulation studies and tests on real data reveal the Gauss-Hermite
based algorithms to be competitive with a leading existing algorithm.Comment: 43 pages, 9 figures. Improved version incorporating referee comments,
as appears in Electronic Journal of Statistic
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on
untrimmed videos using convolutional neural networks. Our algorithm learns from
video-level class labels and predicts temporal intervals of human actions with
no requirement of temporal localization annotations. We design our network to
identify a sparse subset of key segments associated with target actions in a
video using an attention module and fuse the key segments through adaptive
temporal pooling. Our loss function is comprised of two terms that minimize the
video-level action classification error and enforce the sparsity of the segment
selection. At inference time, we extract and score temporal proposals using
temporal class activations and class-agnostic attentions to estimate the time
intervals that correspond to target actions. The proposed algorithm attains
state-of-the-art results on the THUMOS14 dataset and outstanding performance on
ActivityNet1.3 even with its weak supervision.Comment: Accepted to CVPR 201
Globular Cluster Streams as Galactic High-Precision Scales - The Poster Child Palomar 5
Using the example of the tidal stream of the Milky Way globular cluster
Palomar 5 (Pal 5), we demonstrate how observational data on streams can be
efficiently reduced in dimensionality and modeled in a Bayesian framework. Our
approach combines detection of stream overdensities by a
Difference-of-Gaussians process with fast streakline models, a continuous
likelihood function built from these models, and inference with MCMC. By
generating model streams, we show that the geometry of the Pal 5
debris yields powerful constraints on the solar position and motion, the Milky
Way and Pal 5 itself. All 10 model parameters were allowed to vary over large
ranges without additional prior information. Using only SDSS data and a few
radial velocities from the literature, we find that the distance of the Sun
from the Galactic Center is kpc, and the transverse velocity is
km/s. Both estimates are in excellent agreement with independent
measurements of these quantities. Assuming a standard disk and bulge model, we
determine the Galactic mass within Pal 5's apogalactic radius of 19 kpc to be
M. Moreover, we find the potential of the
dark halo with a flattening of to be essentially
spherical within the radial range that is effectively probed by Pal 5. We also
determine Pal 5's mass, distance and proper motion independently from other
methods, which enables us to perform vital cross-checks. We conclude that with
more observational data and by using additional prior information, the
precision of this method can be significantly increased.Comment: 28 pages, 14 figures, submitted to ApJ (revised version), comments
welcom
- …