13 research outputs found
On the Reliability Function of Distributed Hypothesis Testing Under Optimal Detection
The distributed hypothesis testing problem with full side-information is
studied. The trade-off (reliability function) between the two types of error
exponents under limited rate is studied in the following way. First, the
problem is reduced to the problem of determining the reliability function of
channel codes designed for detection (in analogy to a similar result which
connects the reliability function of distributed lossless compression and
ordinary channel codes). Second, a single-letter random-coding bound based on a
hierarchical ensemble, as well as a single-letter expurgated bound, are derived
for the reliability of channel-detection codes. Both bounds are derived for a
system which employs the optimal detection rule. We conjecture that the
resulting random-coding bound is ensemble-tight, and consequently optimal
within the class of quantization-and-binning schemes
A Relationship between Quantization and Distribution Rates of Digitally Fingerprinted Data
This paper considers a fingerprinting system where distinct Gaussian fingerprints are embedded inrespective copies of an -dimensional i.i.d. Gaussian image.Copies are distributed to customers in digital form, using bits per image dimension.By means of a coding theorem, a rate regionfor the pair is established such that (i) theaverage quadratic distortion between the original imageand each distributed copy does not exceed a specified level;and (ii) the error probability in decoding the embedded fingerprintin the distributed copy approaches zero asymptotically in
A General Formula for the Mismatch Capacity
The fundamental limits of channels with mismatched decoding are addressed. A
general formula is established for the mismatch capacity of a general channel,
defined as a sequence of conditional distributions with a general decoding
metrics sequence. We deduce an identity between the Verd\'{u}-Han general
channel capacity formula, and the mismatch capacity formula applied to Maximum
Likelihood decoding metric. Further, several upper bounds on the capacity are
provided, and a simpler expression for a lower bound is derived for the case of
a non-negative decoding metric. The general formula is specialized to the case
of finite input and output alphabet channels with a type-dependent metric. The
closely related problem of threshold mismatched decoding is also studied, and a
general expression for the threshold mismatch capacity is obtained. As an
example of threshold mismatch capacity, we state a general expression for the
erasures-only capacity of the finite input and output alphabet channel. We
observe that for every channel there exists a (matched) threshold decoder which
is capacity achieving. Additionally, necessary and sufficient conditions are
stated for a channel to have a strong converse. Csisz\'{a}r and Narayan's
conjecture is proved for bounded metrics, providing a positive answer to the
open problem introduced in [1], i.e., that the "product-space" improvement of
the lower random coding bound, , is indeed the mismatch
capacity of the discrete memoryless channel . We conclude by presenting an
identity between the threshold capacity and in the DMC
case
Digital Watermarking, Fingerprinting and Compression: An Information-Theoretic Perspective
The ease with which digital data can be duplicated and distributed over the media and the Internethas raised many concerns about copyright infringement.In many situations, multimedia data (e.g., images, music, movies, etc) are illegally circulated, thus violatingintellectual property rights. In an attempt toovercome this problem, watermarking has been suggestedin the literature as the most effective means for copyright protection and authentication. Watermarking is the procedure whereby information (pertaining to owner and/or copyright) is embedded into host data, such that it is:(i) hidden, i.e., not perceptually visible; and(ii) recoverable, even after a (possibly malicious) degradation of the protected work. In this thesis,we prove some theoretical results that establish the fundamental limits of a general class of watermarking schemes. The main focus of this thesis is the problem ofjoint watermarking and compression of images, whichcan be briefly described as follows: due to bandwidth or storage constraints, a watermarked image is distributed in quantized form, using bits per image dimension, and is subject to some additional degradation (possibly due to malicious attacks). The hidden message carries bits per image dimension. Our main result is the determination of the region of allowable rates , such that: (i) an average distortion constraint between the original and the watermarked/compressed image is satisfied, and (ii) the hidden message is detected from the degraded image with very high probability. Using notions from information theory, we prove coding theorems that establish the rate regionin the following cases: (a) general i.i.d. image distributions,distortion constraints and memoryless attacks, (b) memoryless attacks combined with collusion (for fingerprinting applications), and (c) general---not necessarily stationary or ergodic---Gaussian image distributions and attacks, and average quadratic distortion constraints. Moreover, we prove a multi-user version of a result by Costa on the capacity of a Gaussian channel with known interference at the encoder
Making Machines Learn. Applications of Cultural Analytics to the Humanities
The digitization of several million books by Google in 2011 meant the popularization of a new kind of humanities research powered by the treatment of cultural objects as data. Culturomics, as it is called, was born, and other initiatives resonated with such a methodological approach, as is the case with the recently formed Digital Humanities or Cultural Analytics. Intrinsically, these new quantitative approaches to culture all borrow from techniques and methods developed under the wing of the exact sciences, such as computer science, machine learning or statistics. There are numerous examples of studies that take advantage of the possibilities that treating objects as data has to offer for the understanding of the human. This new data science that is now applied to the current trends in culture can also be replicated to study more traditional humanities. Led by proper intellectual inquiry, an adequate use of technology may bring answers to questions intractable by other means, or add evidence to long held assumptions based on a canon built from few examples. This dissertation argues in favor of such approach. Three different case studies are considered. First, in the more general sense of the big and smart data, we collected and analyzed more than 120,000 pictures of paintings from all periods of art history, to gain a clear insight on how the beauty of depicted faces, in the framework of neuroscience and evolutionary theory, has changed over time. A second study covers the nuances of modes of emotions employed by the Spanish Golden Age playwright Calderón de la Barca to empathize with his audience. By means of sentiment analysis, a technique strongly supported by machine learning, we shed some light into the different fictional characters, and how they interact and convey messages otherwise invisible to the public. The last case is a study of non-traditional authorship attribution techniques applied to the forefather of the modern novel, the Lazarillo de Tormes. In the end, we conclude that the successful application of cultural analytics and computer science techniques to traditional humanistic endeavours has been enriching and validating