89,411 research outputs found

    The capacity of multilevel threshold functions

    Get PDF
    Lower and upper bounds for the capacity of multilevel threshold elements are estimated, using two essentially different enumeration techniques. It is demonstrated that the exact number of multilevel threshold functions depends strongly on the relative topology of the input set. The results correct a previously published estimate and indicate that adding threshold levels enhances the capacity more than adding variables

    Bottom-k and Priority Sampling, Set Similarity and Subset Sums with Minimal Independence

    Full text link
    We consider bottom-k sampling for a set X, picking a sample S_k(X) consisting of the k elements that are smallest according to a given hash function h. With this sample we can estimate the relative size f=|Y|/|X| of any subset Y as |S_k(X) intersect Y|/k. A standard application is the estimation of the Jaccard similarity f=|A intersect B|/|A union B| between sets A and B. Given the bottom-k samples from A and B, we construct the bottom-k sample of their union as S_k(A union B)=S_k(S_k(A) union S_k(B)), and then the similarity is estimated as |S_k(A union B) intersect S_k(A) intersect S_k(B)|/k. We show here that even if the hash function is only 2-independent, the expected relative error is O(1/sqrt(fk)). For fk=Omega(1) this is within a constant factor of the expected relative error with truly random hashing. For comparison, consider the classic approach of kxmin-wise where we use k hash independent functions h_1,...,h_k, storing the smallest element with each hash function. For kxmin-wise there is an at least constant bias with constant independence, and it is not reduced with larger k. Recently Feigenblat et al. showed that bottom-k circumvents the bias if the hash function is 8-independent and k is sufficiently large. We get down to 2-independence for any k. Our result is based on a simply union bound, transferring generic concentration bounds for the hashing scheme to the bottom-k sample, e.g., getting stronger probability error bounds with higher independence. For weighted sets, we consider priority sampling which adapts efficiently to the concrete input weights, e.g., benefiting strongly from heavy-tailed input. This time, the analysis is much more involved, but again we show that generic concentration bounds can be applied.Comment: A short version appeared at STOC'1

    Empirical Bayes selection of wavelet thresholds

    Full text link
    This paper explores a class of empirical Bayes methods for level-dependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavy-tailed density. The mixing weight, or sparsity parameter, for each level of the transform is chosen by marginal maximum likelihood. If estimation is carried out using the posterior median, this is a random thresholding procedure; the estimation can also be carried out using other thresholding rules with the same threshold. Details of the calculations needed for implementing the procedure are included. In practice, the estimates are quick to compute and there is software available. Simulations on the standard model functions show excellent performance, and applications to data drawn from various fields of application are used to explore the practical performance of the approach. By using a general result on the risk of the corresponding marginal maximum likelihood approach for a single sequence, overall bounds on the risk of the method are found subject to membership of the unknown function in one of a wide range of Besov classes, covering also the case of f of bounded variation. The rates obtained are optimal for any value of the parameter p in (0,\infty], simultaneously for a wide range of loss functions, each dominating the L_q norm of the \sigmath derivative, with \sigma\ge0 and 0<q\le2.Comment: Published at http://dx.doi.org/10.1214/009053605000000345 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore