6,166 research outputs found
A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics
From the output produced by a memoryless deletion channel from a uniformly
random input of known length , one obtains a posterior distribution on the
channel input. The difference between the Shannon entropy of this distribution
and that of the uniform prior measures the amount of information about the
channel input which is conveyed by the output of length , and it is natural
to ask for which outputs this is extremized. This question was posed in a
previous work, where it was conjectured on the basis of experimental data that
the entropy of the posterior is minimized and maximized by the constant strings
and and the alternating strings
and respectively. In the present
work we confirm the minimization conjecture in the asymptotic limit using
results from hidden word statistics. We show how the analytic-combinatorial
methods of Flajolet, Szpankowski and Vall\'ee for dealing with the hidden
pattern matching problem can be applied to resolve the case of fixed output
length and , by obtaining estimates for the entropy in
terms of the moments of the posterior distribution and establishing its
minimization via a measure of autocorrelation.Comment: 11 pages, 2 figure
Clustering, Hamming Embedding, Generalized LSH and the Max Norm
We study the convex relaxation of clustering and hamming embedding, focusing
on the asymmetric case (co-clustering and asymmetric hamming embedding),
understanding their relationship to LSH as studied by (Charikar 2002) and to
the max-norm ball, and the differences between their symmetric and asymmetric
versions.Comment: 17 page
On empirical methodology, constraints, and hierarchy in artificial grammar learning
This paper considers the AGL literature from a psycholinguistic perspective. It first presents a taxonomy of the experimental familiarization test procedures used, which is followed by a consideration of shortcomings and potential improvements of the empirical methodology. It then turns to reconsidering the issue of grammar learning from the point of view of acquiring constraints, instead of the traditional AGL approach in terms of acquiring sets of rewrite rules. This is, in particular, a natural way of handling long‐distance dependences. The final section addresses an underdeveloped issue in the AGL literature, namely how to detect latent hierarchical structure in AGL response patterns
In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology
This paper investigates the ability of neural network architectures to
effectively learn diachronic phonological generalizations in a multilingual
setting. We employ models using three different types of language embedding
(dense, sigmoid, and straight-through). We find that the Straight-Through model
outperforms the other two in terms of accuracy, but the Sigmoid model's
language embeddings show the strongest agreement with the traditional
subgrouping of the Slavic languages. We find that the Straight-Through model
has learned coherent, semi-interpretable information about sound change, and
outline directions for future research
Blind Reconciliation
Information reconciliation is a crucial procedure in the classical
post-processing of quantum key distribution (QKD). Poor reconciliation
efficiency, revealing more information than strictly needed, may compromise the
maximum attainable distance, while poor performance of the algorithm limits the
practical throughput in a QKD device. Historically, reconciliation has been
mainly done using close to minimal information disclosure but heavily
interactive procedures, like Cascade, or using less efficient but also less
interactive -just one message is exchanged- procedures, like the ones based in
low-density parity-check (LDPC) codes. The price to pay in the LDPC case is
that good efficiency is only attained for very long codes and in a very narrow
range centered around the quantum bit error rate (QBER) that the code was
designed to reconcile, thus forcing to have several codes if a broad range of
QBER needs to be catered for. Real world implementations of these methods are
thus very demanding, either on computational or communication resources or
both, to the extent that the last generation of GHz clocked QKD systems are
finding a bottleneck in the classical part. In order to produce compact, high
performance and reliable QKD systems it would be highly desirable to remove
these problems. Here we analyse the use of short-length LDPC codes in the
information reconciliation context using a low interactivity, blind, protocol
that avoids an a priori error rate estimation. We demonstrate that 2x10^3 bits
length LDPC codes are suitable for blind reconciliation. Such codes are of high
interest in practice, since they can be used for hardware implementations with
very high throughput.Comment: 22 pages, 8 figure
- …