3 research outputs found
Support Estimation with Sampling Artifacts and Errors
The problem of estimating the support of a distribution is of great
importance in many areas of machine learning, computer science, physics and
biology. Most of the existing work in this domain has focused on settings that
assume perfectly accurate sampling approaches, which is seldom true in
practical data science. Here we introduce the first known approach to support
estimation in the presence of sampling artifacts and errors where each sample
is assumed to arise from a Poisson repeat channel which simultaneously captures
repetitions and deletions of samples. The proposed estimator is based on
regularized weighted Chebyshev approximations, with weights governed by
evaluations of so-called Touchard (Bell) polynomials. The supports in the
presence of sampling artifacts are calculated using discretized semi-infite
programming methods. The estimation approach is tested on synthetic and textual
data, as well as on GISAID data collected to address a new problem in
computational biology: mutational support estimation in genes of the SARS-Cov-2
virus. In the later setting, the Poisson channel captures the fact that many
individuals are tested multiple times for the presence of viral RNA, thereby
leading to repeated samples, while other individual's results are not recorded
due to test errors. For all experiments performed, we observed significant
improvements of our integrated methods compared to those obtained through
adequate modifications of state-of-the-art noiseless support estimation
methods
Sharp Analytical Capacity Upper Bounds for Sticky and Related Channels
We study natural examples of binary channels with synchronization errors.
These include the duplication channel, which independently outputs a given bit
once or twice, and geometric channels that repeat a given bit according to a
geometric rule, with or without the possibility of bit deletion. We apply the
general framework of Cheraghchi (STOC 2018) to obtain sharp analytical upper
bounds on the capacity of these channels. Previously, upper bounds were known
via numerical computations involving the computation of finite approximations
of the channels by a computer and then using the obtained numerical results to
upper bound the actual capacity. While leading to sharp numerical results,
further progress on the full understanding of the channel capacity inherently
remains elusive using such methods. Our results can be regarded as a major step
towards a complete understanding of the capacity curves. Quantitatively, our
upper bounds sharply approach, and in some cases surpass, the bounds that were
previously only known by purely numerical methods. Among our results, we
notably give a completely analytical proof that, when the number of repetitions
per bit is geometric (supported on ) with mean growing to
infinity, the channel capacity remains substantially bounded away from .Comment: 37 pages, 12 figures. Fixed some typos and reorganized parts of
Section
An Overview of Capacity Results for Synchronization Channels
Synchronization channels, such as the well-known deletion channel, are
surprisingly harder to analyze than memoryless channels, and they are a source
of many fundamental problems in information theory and theoretical computer
science.
One of the most basic open problems regarding synchronization channels is the
derivation of an exact expression for their capacity. Unfortunately, most of
the classic information-theoretic techniques at our disposal fail spectacularly
when applied to synchronization channels. Therefore, new approaches must be
considered to tackle this problem. This survey gives an account of the great
effort made over the past few decades to better understand the (broadly
defined) capacity of synchronization channels, including both the main results
and the novel techniques underlying them. Besides the usual notion of channel
capacity, we also discuss the zero-error capacity of synchronization channels.Comment: 40 pages, 11 figures. Corrected some typos and a reference. Survey,
comments are welcom