2,751 research outputs found

    A Generalized Coupon Collector Problem

    Full text link
    This paper provides analysis to a generalized version of the coupon collector problem, in which the collector gets dd distinct coupons each run and she chooses the one that she has the least so far. On the asymptotic case when the number of coupons nn goes to infinity, we show that on average nlognd+nd(m1)loglogn+O(mn)\frac{n\log n}{d} + \frac{n}{d}(m-1)\log\log{n}+O(mn) runs are needed to collect mm sets of coupons. An efficient exact algorithm is also developed for any finite case to compute the average needed runs exactly. Numerical examples are provided to verify our theoretical predictions.Comment: 20 pages, 6 figures, preprin

    Approximating the Distribution of the Median and other Robust Estimators on Uncertain Data

    Get PDF
    Robust estimators, like the median of a point set, are important for data analysis in the presence of outliers. We study robust estimators for locationally uncertain points with discrete distributions. That is, each point in a data set has a discrete probability distribution describing its location. The probabilistic nature of uncertain data makes it challenging to compute such estimators, since the true value of the estimator is now described by a distribution rather than a single point. We show how to construct and estimate the distribution of the median of a point set. Building the approximate support of the distribution takes near-linear time, and assigning probability to that support takes quadratic time. We also develop a general approximation technique for distributions of robust estimators with respect to ranges with bounded VC dimension. This includes the geometric median for high dimensions and the Siegel estimator for linear regression.Comment: Full version of a paper to appear at SoCG 201

    Discriminative Segmental Cascades for Feature-Rich Phone Recognition

    Full text link
    Discriminative segmental models, such as segmental conditional random fields (SCRFs) and segmental structured support vector machines (SSVMs), have had success in speech recognition via both lattice rescoring and first-pass decoding. However, such models suffer from slow decoding, hampering the use of computationally expensive features, such as segment neural networks or other high-order features. A typical solution is to use approximate decoding, either by beam pruning in a single pass or by beam pruning to generate a lattice followed by a second pass. In this work, we study discriminative segmental models trained with a hinge loss (i.e., segmental structured SVMs). We show that beam search is not suitable for learning rescoring models in this approach, though it gives good approximate decoding performance when the model is already well-trained. Instead, we consider an approach inspired by structured prediction cascades, which use max-marginal pruning to generate lattices. We obtain a high-accuracy phonetic recognition system with several expensive feature types: a segment neural network, a second-order language model, and second-order phone boundary features

    Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks

    Full text link
    We present a novel learning-based approach to estimate the direction-of-arrival (DOA) of a sound source using a convolutional recurrent neural network (CRNN) trained via regression on synthetic data and Cartesian labels. We also describe an improved method to generate synthetic data to train the neural network using state-of-the-art sound propagation algorithms that model specular as well as diffuse reflections of sound. We compare our model against three other CRNNs trained using different formulations of the same problem: classification on categorical labels, and regression on spherical coordinate labels. In practice, our model achieves up to 43% decrease in angular error over prior methods. The use of diffuse reflection results in 34% and 41% reduction in angular prediction errors on LOCATA and SOFA datasets, respectively, over prior methods based on image-source methods. Our method results in an additional 3% error reduction over prior schemes that use classification based networks, and we use 36% fewer network parameters

    Improving Image Classification with Location Context

    Full text link
    With the widespread availability of cellphones and cameras that have GPS capabilities, it is common for images being uploaded to the Internet today to have GPS coordinates associated with them. In addition to research that tries to predict GPS coordinates from visual features, this also opens up the door to problems that are conditioned on the availability of GPS coordinates. In this work, we tackle the problem of performing image classification with location context, in which we are given the GPS coordinates for images in both the train and test phases. We explore different ways of encoding and extracting features from the GPS coordinates, and show how to naturally incorporate these features into a Convolutional Neural Network (CNN), the current state-of-the-art for most image classification and recognition problems. We also show how it is possible to simultaneously learn the optimal pooling radii for a subset of our features within the CNN framework. To evaluate our model and to help promote research in this area, we identify a set of location-sensitive concepts and annotate a subset of the Yahoo Flickr Creative Commons 100M dataset that has GPS coordinates with these concepts, which we make publicly available. By leveraging location context, we are able to achieve almost a 7% gain in mean average precision

    Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret

    Get PDF
    The problem of distributed learning and channel access is considered in a cognitive network with multiple secondary users. The availability statistics of the channels are initially unknown to the secondary users and are estimated using sensing decisions. There is no explicit information exchange or prior agreement among the secondary users. We propose policies for distributed learning and access which achieve order-optimal cognitive system throughput (number of successful secondary transmissions) under self play, i.e., when implemented at all the secondary users. Equivalently, our policies minimize the regret in distributed learning and access. We first consider the scenario when the number of secondary users is known to the policy, and prove that the total regret is logarithmic in the number of transmission slots. Our distributed learning and access policy achieves order-optimal regret by comparing to an asymptotic lower bound for regret under any uniformly-good learning and access policy. We then consider the case when the number of secondary users is fixed but unknown, and is estimated through feedback. We propose a policy in this scenario whose asymptotic sum regret which grows slightly faster than logarithmic in the number of transmission slots.Comment: Submitted to IEEE JSAC on Advances in Cognitive Radio Networking and Communications, Dec. 2009, Revised May 201

    Disclosure Frequency Induced Myopia and the Decision to be Public

    Get PDF
    This study examines whether disclosure frequency induced myopia influences the types of firms that go public and their choice of listing exchanges if they decide to do so. We find that the incentive to stay private in order to avoid disclosure frequency induced myopia creates a downward kink in the relation between the length of the cash conversion cycle and the proportion of public firms at the industry level around the time frame that corresponds to the mandatory reporting interval. Second, at the firm level, public firms with longer cash conversion cycles relative to industry peers are more likely to list on exchanges that require less frequent mandatory disclosure to minimize disclosure frequency induced myopia. Furthermore, when the mandatory reporting frequency increased from semi-annual to quarterly, we observe a sharper decline in the percentage of public firms from industries whose cash conversion cycles are between one quarter and two quarters relative to those from other industries both in the United States and in the United Kingdom
    corecore