22 research outputs found
Multiset and Set Decipherable Codes
We extend some results of Lempel and Restivo on multiset decipherable codes to set decipherable codes
Note on Decipherability of Three-Word Codes
The theory of uniquely decipherable (UD) codes has been widely developed in connection
with automata theory, combinatorics on words, formal languages, and monoid theory.
Recently, the concepts of multiset decipherable (MSD) and set decipherable (SD) codes were
developed to handle some special problems in the transmission of information. Unique
decipherability is a vital requirement in a wide range of coding applications where distinct
sequences of code words carry different information. However, in several applications,
it is necessary or desirable to communicate a description of a sequence of events where
the information of interest is the set of possible events, including multiplicity, but where
the order of occurrences is irrelevant. Suitable codes for these communication purposes
need not possess the UD property, but the weaker MSD property. In other applications,
the information of interest may be the presence or absence of possible events. The SD
property is adequate for such codes. Lempel (1986) showed that the UD and MSD properties
coincide for two-word codes and conjectured that every three-word MSD code is a UD
code. Guzmán (1995) showed that the UD, MSD, and SD properties coincide for two-word
codes and conjectured that these properties coincide for three-word codes. In an earlier
paper (2001), Blanchet-Sadri answered both conjectures positively for all three-word codes
{c1,c2,c3} satisfying |c1| = |c2| = |c3|. In this note, we answer both conjectures positively
for other special three-word codes. Our procedures are based on techniques related to
dominoes
Testing decipherability of directed figure codes with domino graphs
Various kinds of decipherability of codes, weaker than unique decipherability, have been studied since mid-1980s. We consider decipherability
of directed gure codes, where directed gures are de ned as labelled polyomi-
noes with designated start and end points, equipped with catenation operation
that may use a merging function to resolve possible con
icts. This setting ex-
tends decipherability questions from words to 2D structures. In the present
paper we develop a (variant of) domino graph that will allow us to decide some
of the decipherability kinds by searching the graph for speci c paths. Thus the
main result characterizes directed gure decipherability by graph properties
Testing decipherability of directed figure codes with domino graphs
Various kinds of decipherability of codes, weaker than unique de- cipherability, have been studied since mid-1980s. We consider decipherability of directed figure codes, where directed figures are defined as labelled polyomi- noes with designated start and end points, equipped with catenation operation that may use a merging function to resolve possible conflicts. This setting ex- tends decipherability questions from words to 2D structures. In the present paper we develop a (variant of) domino graph that will allow us to decide some of the decipherability kinds by searching the graph for specific paths. Thus the main result characterizes directed figure decipherability by graph properties
Unique Decipherability in Formal Languages
We consider several language-theoretic aspects of various notions of unique decipherability (or unique factorization) in formal languages. Given a language L at some position within the Chomsky hierarchy, we investigate the language of words UD(L) in L^* that have unique factorization over L. We also consider similar notions for weaker forms of unique decipherability, such as numerically decipherable words ND(L), multiset decipherable words MSD(L) and set decipherable words SD(L). Although these notions of unique factorization have been considered before, it appears that the languages of words having these properties have not been positioned in the Chomsky hierarchy up until now. We show that UF(L), ND(L), MSD(L) and SD(L) need not be context-free if L is context-free. In fact ND(L) and MSD(L) need not be context-free even if L is finite, although UD(L) and SD(L) are regular in this case. We show that if L is context-sensitive, then so are UD(L), ND(L), MSD(L) and SD(L). We also prove that the membership problem (resp., emptiness problem) for these classes is PSPACE-complete (resp., undecidable). We finally determine upper and lower bounds on the length of the shortest word of L^* not having the various forms of unique decipherability into elements of L
Optimal coding and the origins of Zipfian laws
The problem of compression in standard information theory consists of
assigning codes as short as possible to numbers. Here we consider the problem
of optimal coding -- under an arbitrary coding scheme -- and show that it
predicts Zipf's law of abbreviation, namely a tendency in natural languages for
more frequent words to be shorter. We apply this result to investigate optimal
coding also under so-called non-singular coding, a scheme where unique
segmentation is not warranted but codes stand for a distinct number. Optimal
non-singular coding predicts that the length of a word should grow
approximately as the logarithm of its frequency rank, which is again consistent
with Zipf's law of abbreviation. Optimal non-singular coding in combination
with the maximum entropy principle also predicts Zipf's rank-frequency
distribution. Furthermore, our findings on optimal non-singular coding
challenge common beliefs about random typing. It turns out that random typing
is in fact an optimal coding process, in stark contrast with the common
assumption that it is detached from cost cutting considerations. Finally, we
discuss the implications of optimal coding for the construction of a compact
theory of Zipfian laws and other linguistic laws.Comment: in press in the Journal of Quantitative Linguistics; definition of
concordant pair corrected, proofs polished, references update
On instantaneous codes
Maximal instantaneous codes are characterized by the property that they allow unique parsing of every infinite string. The sequence of codeword lengths of a maximal instantaneous code, sequenced in lexicographic order of the codewords, completely determines the code itself. Any increasing, decreasing or unimodal reordering of such a sequence again corresponds to a maximal instantaneous code. Lexicographic length sequences are characterized by a family of Kraft-type equalities
Codes, orderings, and partial words
Codes play an important role in the study of the combinatorics of words. In this paper, we introduce pcodes that play a role in the study of combinatorics ofpartial words. Partial words are strings over a finite alphabet that may contain a number of “do not know” symbols. Pcodes are defined in terms of the compatibility relation that considers two strings over the same alphabet that are equal except for a number of insertions and/or deletions of symbols. We describe various ways of defining and analyzing pcodes. In particular, many pcodes can be obtained as antichains with respect to certain partial orderings. Using a technique related to dominoes, we show that the pcode property is decidable