490 research outputs found

    Covariate assisted screening and estimation

    Full text link
    Consider a linear model Y=Xβ+zY=X\beta+z, where X=Xn,pX=X_{n,p} and zN(0,In)z\sim N(0,I_n). The vector β\beta is unknown but is sparse in the sense that most of its coordinates are 00. The main interest is to separate its nonzero coordinates from the zero ones (i.e., variable selection). Motivated by examples in long-memory time series (Fan and Yao [Nonlinear Time Series: Nonparametric and Parametric Methods (2003) Springer]) and the change-point problem (Bhattacharya [In Change-Point Problems (South Hadley, MA, 1992) (1994) 28-56 IMS]), we are primarily interested in the case where the Gram matrix G=XXG=X'X is nonsparse but sparsifiable by a finite order linear filter. We focus on the regime where signals are both rare and weak so that successful variable selection is very challenging but is still possible. We approach this problem by a new procedure called the covariate assisted screening and estimation (CASE). CASE first uses a linear filtering to reduce the original setting to a new regression model where the corresponding Gram (covariance) matrix is sparse. The new covariance matrix induces a sparse graph, which guides us to conduct multivariate screening without visiting all the submodels. By interacting with the signal sparsity, the graph enables us to decompose the original problem into many separated small-size subproblems (if only we know where they are!). Linear filtering also induces a so-called problem of information leakage, which can be overcome by the newly introduced patching technique. Together, these give rise to CASE, which is a two-stage screen and clean [Fan and Song Ann. Statist. 38 (2010) 3567-3604; Wasserman and Roeder Ann. Statist. 37 (2009) 2178-2201] procedure, where we first identify candidates of these submodels by patching and screening, and then re-examine each candidate to remove false positives.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1243 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Upper Bounds on the Capacity of Binary Channels with Causal Adversaries

    Full text link
    In this work we consider the communication of information in the presence of a causal adversarial jammer. In the setting under study, a sender wishes to communicate a message to a receiver by transmitting a codeword (x1,...,xn)(x_1,...,x_n) bit-by-bit over a communication channel. The sender and the receiver do not share common randomness. The adversarial jammer can view the transmitted bits xix_i one at a time, and can change up to a pp-fraction of them. However, the decisions of the jammer must be made in a causal manner. Namely, for each bit xix_i the jammer's decision on whether to corrupt it or not must depend only on xjx_j for jij \leq i. This is in contrast to the "classical" adversarial jamming situations in which the jammer has no knowledge of (x1,...,xn)(x_1,...,x_n), or knows (x1,...,xn)(x_1,...,x_n) completely. In this work, we present upper bounds (that hold under both the average and maximal probability of error criteria) on the capacity which hold for both deterministic and stochastic encoding schemes.Comment: To appear in the IEEE Transactions on Information Theory; shortened version appeared at ISIT 201

    Tight Bounds on List-Decodable and List-Recoverable Zero-Rate Codes

    Full text link
    In this work, we consider the list-decodability and list-recoverability of codes in the zero-rate regime. Briefly, a code C[q]n\mathcal{C} \subseteq [q]^n is (p,,L)(p,\ell,L)-list-recoverable if for all tuples of input lists (Y1,,Yn)(Y_1,\dots,Y_n) with each Yi[q]Y_i \subseteq [q] and Yi=|Y_i|=\ell the number of codewords cCc \in \mathcal{C} such that ciYic_i \notin Y_i for at most pnpn choices of i[n]i \in [n] is less than LL; list-decoding is the special case of =1\ell=1. In recent work by Resch, Yuan and Zhang~(ICALP~2023) the zero-rate threshold for list-recovery was determined for all parameters: that is, the work explicitly computes p:=p(q,,L)p_*:=p_*(q,\ell,L) with the property that for all ϵ>0\epsilon>0 (a) there exist infinite families positive-rate (pϵ,,L)(p_*-\epsilon,\ell,L)-list-recoverable codes, and (b) any (p+ϵ,,L)(p_*+\epsilon,\ell,L)-list-recoverable code has rate 00. In fact, in the latter case the code has constant size, independent on nn. However, the constant size in their work is quite large in 1/ϵ1/\epsilon, at least C(1ϵ)O(qL)|\mathcal{C}|\geq (\frac{1}{\epsilon})^{O(q^L)}. Our contribution in this work is to show that for all choices of q,q,\ell and LL with q3q \geq 3, any (p+ϵ,,L)(p_*+\epsilon,\ell,L)-list-recoverable code must have size Oq,,L(1/ϵ)O_{q,\ell,L}(1/\epsilon), and furthermore this upper bound is complemented by a matching lower bound Ωq,,L(1/ϵ)\Omega_{q,\ell,L}(1/\epsilon). This greatly generalizes work by Alon, Bukh and Polyanskiy~(IEEE Trans.\ Inf.\ Theory~2018) which focused only on the case of binary alphabet (and thus necessarily only list-decoding). We remark that we can in fact recover the same result for q=2q=2 and even LL, as obtained by Alon, Bukh and Polyanskiy: we thus strictly generalize their work.Comment: Abstract shortened to meet the arXiv requiremen

    On the Measurement of Privacy as an Attacker's Estimation Error

    Get PDF
    A wide variety of privacy metrics have been proposed in the literature to evaluate the level of protection offered by privacy enhancing-technologies. Most of these metrics are specific to concrete systems and adversarial models, and are difficult to generalize or translate to other contexts. Furthermore, a better understanding of the relationships between the different privacy metrics is needed to enable more grounded and systematic approach to measuring privacy, as well as to assist systems designers in selecting the most appropriate metric for a given application. In this work we propose a theoretical framework for privacy-preserving systems, endowed with a general definition of privacy in terms of the estimation error incurred by an attacker who aims to disclose the private information that the system is designed to conceal. We show that our framework permits interpreting and comparing a number of well-known metrics under a common perspective. The arguments behind these interpretations are based on fundamental results related to the theories of information, probability and Bayes decision.Comment: This paper has 18 pages and 17 figure

    Quickest Sequence Phase Detection

    Full text link
    A phase detection sequence is a length-nn cyclic sequence, such that the location of any length-kk contiguous subsequence can be determined from a noisy observation of that subsequence. In this paper, we derive bounds on the minimal possible kk in the limit of nn\to\infty, and describe some sequence constructions. We further consider multiple phase detection sequences, where the location of any length-kk contiguous subsequence of each sequence can be determined simultaneously from a noisy mixture of those subsequences. We study the optimal trade-offs between the lengths of the sequences, and describe some sequence constructions. We compare these phase detection problems to their natural channel coding counterparts, and show a strict separation between the fundamental limits in the multiple sequence case. Both adversarial and probabilistic noise models are addressed.Comment: To appear in the IEEE Transactions on Information Theor

    Secure Multiterminal Source Coding with Side Information at the Eavesdropper

    Full text link
    The problem of secure multiterminal source coding with side information at the eavesdropper is investigated. This scenario consists of a main encoder (referred to as Alice) that wishes to compress a single source but simultaneously satisfying the desired requirements on the distortion level at a legitimate receiver (referred to as Bob) and the equivocation rate --average uncertainty-- at an eavesdropper (referred to as Eve). It is further assumed the presence of a (public) rate-limited link between Alice and Bob. In this setting, Eve perfectly observes the information bits sent by Alice to Bob and has also access to a correlated source which can be used as side information. A second encoder (referred to as Charlie) helps Bob in estimating Alice's source by sending a compressed version of its own correlated observation via a (private) rate-limited link, which is only observed by Bob. For instance, the problem at hands can be seen as the unification between the Berger-Tung and the secure source coding setups. Inner and outer bounds on the so called rates-distortion-equivocation region are derived. The inner region turns to be tight for two cases: (i) uncoded side information at Bob and (ii) lossless reconstruction of both sources at Bob --secure distributed lossless compression. Application examples to secure lossy source coding of Gaussian and binary sources in the presence of Gaussian and binary/ternary (resp.) side informations are also considered. Optimal coding schemes are characterized for some cases of interest where the statistical differences between the side information at the decoders and the presence of a non-zero distortion at Bob can be fully exploited to guarantee secrecy.Comment: 26 pages, 16 figures, 2 table
    corecore