2,751 research outputs found
A Generalized Coupon Collector Problem
This paper provides analysis to a generalized version of the coupon collector
problem, in which the collector gets distinct coupons each run and she
chooses the one that she has the least so far. On the asymptotic case when the
number of coupons goes to infinity, we show that on average runs are needed to collect sets
of coupons. An efficient exact algorithm is also developed for any finite case
to compute the average needed runs exactly. Numerical examples are provided to
verify our theoretical predictions.Comment: 20 pages, 6 figures, preprin
Approximating the Distribution of the Median and other Robust Estimators on Uncertain Data
Robust estimators, like the median of a point set, are important for data
analysis in the presence of outliers. We study robust estimators for
locationally uncertain points with discrete distributions. That is, each point
in a data set has a discrete probability distribution describing its location.
The probabilistic nature of uncertain data makes it challenging to compute such
estimators, since the true value of the estimator is now described by a
distribution rather than a single point. We show how to construct and estimate
the distribution of the median of a point set. Building the approximate support
of the distribution takes near-linear time, and assigning probability to that
support takes quadratic time. We also develop a general approximation technique
for distributions of robust estimators with respect to ranges with bounded VC
dimension. This includes the geometric median for high dimensions and the
Siegel estimator for linear regression.Comment: Full version of a paper to appear at SoCG 201
Discriminative Segmental Cascades for Feature-Rich Phone Recognition
Discriminative segmental models, such as segmental conditional random fields
(SCRFs) and segmental structured support vector machines (SSVMs), have had
success in speech recognition via both lattice rescoring and first-pass
decoding. However, such models suffer from slow decoding, hampering the use of
computationally expensive features, such as segment neural networks or other
high-order features. A typical solution is to use approximate decoding, either
by beam pruning in a single pass or by beam pruning to generate a lattice
followed by a second pass. In this work, we study discriminative segmental
models trained with a hinge loss (i.e., segmental structured SVMs). We show
that beam search is not suitable for learning rescoring models in this
approach, though it gives good approximate decoding performance when the model
is already well-trained. Instead, we consider an approach inspired by
structured prediction cascades, which use max-marginal pruning to generate
lattices. We obtain a high-accuracy phonetic recognition system with several
expensive feature types: a segment neural network, a second-order language
model, and second-order phone boundary features
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks
We present a novel learning-based approach to estimate the
direction-of-arrival (DOA) of a sound source using a convolutional recurrent
neural network (CRNN) trained via regression on synthetic data and Cartesian
labels. We also describe an improved method to generate synthetic data to train
the neural network using state-of-the-art sound propagation algorithms that
model specular as well as diffuse reflections of sound. We compare our model
against three other CRNNs trained using different formulations of the same
problem: classification on categorical labels, and regression on spherical
coordinate labels. In practice, our model achieves up to 43% decrease in
angular error over prior methods. The use of diffuse reflection results in 34%
and 41% reduction in angular prediction errors on LOCATA and SOFA datasets,
respectively, over prior methods based on image-source methods. Our method
results in an additional 3% error reduction over prior schemes that use
classification based networks, and we use 36% fewer network parameters
Improving Image Classification with Location Context
With the widespread availability of cellphones and cameras that have GPS
capabilities, it is common for images being uploaded to the Internet today to
have GPS coordinates associated with them. In addition to research that tries
to predict GPS coordinates from visual features, this also opens up the door to
problems that are conditioned on the availability of GPS coordinates. In this
work, we tackle the problem of performing image classification with location
context, in which we are given the GPS coordinates for images in both the train
and test phases. We explore different ways of encoding and extracting features
from the GPS coordinates, and show how to naturally incorporate these features
into a Convolutional Neural Network (CNN), the current state-of-the-art for
most image classification and recognition problems. We also show how it is
possible to simultaneously learn the optimal pooling radii for a subset of our
features within the CNN framework. To evaluate our model and to help promote
research in this area, we identify a set of location-sensitive concepts and
annotate a subset of the Yahoo Flickr Creative Commons 100M dataset that has
GPS coordinates with these concepts, which we make publicly available. By
leveraging location context, we are able to achieve almost a 7% gain in mean
average precision
Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret
The problem of distributed learning and channel access is considered in a
cognitive network with multiple secondary users. The availability statistics of
the channels are initially unknown to the secondary users and are estimated
using sensing decisions. There is no explicit information exchange or prior
agreement among the secondary users. We propose policies for distributed
learning and access which achieve order-optimal cognitive system throughput
(number of successful secondary transmissions) under self play, i.e., when
implemented at all the secondary users. Equivalently, our policies minimize the
regret in distributed learning and access. We first consider the scenario when
the number of secondary users is known to the policy, and prove that the total
regret is logarithmic in the number of transmission slots. Our distributed
learning and access policy achieves order-optimal regret by comparing to an
asymptotic lower bound for regret under any uniformly-good learning and access
policy. We then consider the case when the number of secondary users is fixed
but unknown, and is estimated through feedback. We propose a policy in this
scenario whose asymptotic sum regret which grows slightly faster than
logarithmic in the number of transmission slots.Comment: Submitted to IEEE JSAC on Advances in Cognitive Radio Networking and
Communications, Dec. 2009, Revised May 201
Disclosure Frequency Induced Myopia and the Decision to be Public
This study examines whether disclosure frequency induced myopia influences the types of firms that go public and their choice of listing exchanges if they decide to do so. We find that the incentive to stay private in order to avoid disclosure frequency induced myopia creates a downward kink in the relation between the length of the cash conversion cycle and the proportion of public firms at the industry level around the time frame that corresponds to the mandatory reporting interval. Second, at the firm level, public firms with longer cash conversion cycles relative to industry peers are more likely to list on exchanges that require less frequent mandatory disclosure to minimize disclosure frequency induced myopia. Furthermore, when the mandatory reporting frequency increased from semi-annual to quarterly, we observe a sharper decline in the percentage of public firms from industries whose cash conversion cycles are between one quarter and two quarters relative to those from other industries both in the United States and in the United Kingdom
- …