9 research outputs found
Recommended from our members
Codes for Synchronization in Channels and Sources with Edits
Edit channels are a class of communication channels where the output of the channel is
an edited version of the input. The edits are considered to be deletions and insertions.
DNA-based data storage system is one of the motivations for this model. This thesis
studies various problems related to edit channel and also edit synchronization problem.
Varshamov-Tenengolts (VT) codes are first introduced. These codes can correct a
single deletion or insertion and have a linear-time decoder. The problem of efficient
encoding of the non-binary version of VT codes is addressed, where a simple linear-time
encoding method to systematically map binary message sequences onto VT codewords
is proposed.
Another model that is studied is segmented edit channels, where we have the
additional assumption that the channel input sequence is implicitly divided into
segments such that at most one edit can occur within a segment. A code construction
is proposed for this model based on subsets of VT codes chosen with pre-determined
prefxes and/or sufxes. Also an upper bound is derived on the rate of any zero-error
code for the segmented edit channel in terms of the segment length. This upper bound
shows that the rate scaling of the proposed codes as the segment length increases is
the same as that of the maximal code.
Edit synchronization is another problem studied in this thesis. In this model, there
are two remote nodes (encoder and decoder), each having a binary sequence. The
sequence X, available at the encoder, is the updated sequence and differs from Y
(available at the decoder) by a small number of edits. The goal is to construct a message
M, to be sent via a one-way error-free link, such that the decoder can reconstruct X
using M and Y. A coding scheme is devised for this one-way synchronization model.
The scheme is based on multiple layers of VT codes combined with off-the-shelf linear
error-correcting codes and uses a list decoder.
Motivated by the sequence reconstruction problem from traces in DNA-based storage, the problem of designing codes for the deletion channel when multiple observations
(or traces) are available to the decoder is considered. A simple binary and non-binary
code is proposed that splits the codeword into blocks and employs a VT code in each
block. The availability of multiple traces helps the decoder to identify deletion-free
copies of a block, and to avoid mis-synchronization while decoding. The encoding
complexity of the proposed scheme is linear in the codeword length; the decoding
complexity is linear in the codeword length and quadratic in the number of deletions
and the number of traces. The list decoding technique for the proposed code is also
considered
On location, domination and information retrieval
The thesis is divided into two main branches: identifying and locatingdominating codes, and information retrieval. The former topics are motivated by the aim to locate objects in sensor networks (or other similar applications) and the latter one by the need to retrieve information in memories such as DNA data storage systems. Albeit the underlying applications, the study on these topics mainly belongs to discrete mathematics; more specically, to the elds of coding and graph theory.
The sensor networks are usually represented by graphs where vertices represent the monitored locations and edges the connections between the locations. Moreover, the locations of the sensors are determined by a code. Furthermore, the desired properties of the sensor network are deeply linked with the properties of the underlying code.
The number of errors in reading the data is abundant in the DNA data storage systems. In particular, there can occur more errors than a reasonable error-correcting code can handle. However, this problem is somewhat oset by the possibility to obtain multiple approximations of the same information from the data storage. Hence, the information retrieval process can be modelled by the Levenshtein's channel model, where a message is sent through multiple noisy channels and multiple outputs are received. In the rst two papers of the thesis, we introduce and study the new concepts of self- and solid-locating-dominating codes as a natural analogy to self-identifying codes with respect to locating-dominating codes. The rst paper introduces these new codes and considers them in some graphs such as the Hamming graphs. Then, in the second paper, we broaden our view on the topic by considering graph theoretical questions. We give optimal codes in multiple dierent graph classes and some more general results using concepts such as the Dilworth number and graph complements. The third paper focuses on the q-ary Hamming spaces. In particular, we disprove a conjecture proposed by Goddard and Wash related to identifying codes. In the fourth paper, we return to self- and solid-locating-dominating codes and give optimal codes in some graph classes and consider their densities in innite graphs.
In the fth paper, we consider information retrieval in memories; in particular, the Levenshtein's channel model. In the channel model, we transmit some codeword belonging to the binary Hamming space through multiple identical channels. With the help of multiple dierent outputs, we give a list of codewords which may have been sent. In the paper, we study the number of channels required to have a rather small (constant) list size when the properties of the channels, the code and the dimension of the Hamming space are xed. In particular, we give an exact relation between the number of channels and the asymptotic value of the maximum list size.Väitöskirja käsittelee kahta aihetta: identioivia ja paikantavia peittokoodeja sekä tiedon noutamista muistista. Ensimmäisen aiheen motivaationa on objektien paikantaminen sensoriverkoista (sekä muut samankaltaiset sovellukset) ja jälkimmäisen tiedonnouto DNA-muisteista. Näiden aiheiden tutkimus kuuluu diskreettiin matematiikkaan, täsmällisemmin koodaus- ja graa-teoriaan.
Sensoriverkkoja kuvataan yleensä graafeilla, joissa solmut esittävät tarkkailtuja kohteita ja viivat yhteyksiä näiden kohteiden välillä. Edelleen sensorien paikat määräytyvät annetun koodin perusteella. Tästä johtuen sensoriverkon halutut ominaisuudet pohjautuvat vahvasti alla olevaan koodiin. Luettaessa tietoa DNA-muisteista tapahtuvien virheiden määrä saattaa olla erittäin suuri; erityisesti suurempi kuin kiinnitetyn virheitä korjaavan koodin korjauskyky. Toisaalta tilanne ei ole aivan näin ongelmallinen, sillä DNA-muisteista voidaan saada useita eri arvioita muistiin tallennetusta tiedosta. Näistä syistä johtuen tietojen noutamista DNA-muisteista voidaan mallintaa käyttäen Levenshteinin kanavamallia. Kanavamallissa yksi viesti lähetetään useiden häiriöisten kanavien kautta ja näin vastaanotetaan useita viestejä (yksi jokaisesta kanavasta).
Väitöskirjan kahdessa ensimmäisessä julkaisussa esitellään ja tutkitaan uusia paikantavien peittokoodien luokkia, jotka pohjautuvat aiemmin tutkittuihin itse-identioiviin koodeihin. Ensimmäisessä julkaisussa on esitelty nämä koodiluokat sekä tutkittu niitä joissain graafeissa kuten Hammingin graafeissa. Tämän jälkeen toisessa julkaisussa käsitellään yleisiä graa-teoreettisia kysymyksiä. Julkaisussa esitetään optimaaliset koodit useille graaperheille sekä joitain yleisempiä tuloksia käyttäen mm. Dilworthin lukua sekä graakomplementteja. Kolmas julkaisu keskittyy q-arisiin Hammingin avaruuksiin. Erityisesti julkaisussa todistetaan vääräksi Goddardin ja Washin aiemmin esittämä identioivia koodeja koskeva otaksuma. Neljäs artikkeli käsittelee jo kahdessa ensimmäisessä artikkelissa esiteltyjä paikantavien peittokoodien luokkia. Artikkeli esittää optimaalisia koodeja useille graaperheille sekä käsittelee äärettömiä graafeja.
Viides artikkeli käsittelee tiedonnoutoa ja erityisesti Levenshteinin kanavamallia. Kanavamallissa binääriseen Hammingin avaruuteen kuuluva koodisana lähetetään useiden identtisten kanavien läpi. Näistä kanavista vastaanotetaan useita eri arvioita lähetetystä koodisanasta ja rakennetaan lista mahdollisesti lähetetyistä sanoista. Artikkelissa tutkitaan kuinka monta kanavaa tarvitaan, jotta tämän listan koko on pieni (vakio), kun kanavien ominaisuudet, koodi ja Hammingin avaruuden dimensio on kiinnitetty. Erityisesti löydetään täsmällinen suhde kanavien lukumäärän ja asymptoottisesti maksimaalisen listan koon välille
Coding against synchronisation and related errors
In this thesis, we study aspects of coding against synchronisation errors, such as deletions and replications, and related errors. Synchronisation errors are a source of fundamental open problems in information theory, because they introduce correlations between output symbols even when input symbols are independently distributed. We focus on random errors, and consider two complementary problems:
We study the optimal rate of reliable information transmission through channels with synchronisation and related errors (the channel capacity). Unlike simpler error models, the capacity of such channels is unknown. We first consider the geometric sticky channel, which replicates input bits according to a geometric distribution. Previously, bounds on its capacity were known only via numerical methods, which do not aid our conceptual understanding of this quantity. We derive sharp analytical capacity upper bounds which approach, and sometimes surpass, numerical bounds. This opens the door to a mathematical treatment of its capacity. We consider also the geometric deletion channel, combining deletions and geometric replications. We derive analytical capacity upper bounds, and notably prove that the capacity is bounded away from the maximum when the deletion probability is small, meaning that this channel behaves differently than related well-studied channels in this regime. Finally, we adapt techniques developed to handle synchronisation errors to derive improved upper bounds and structural results on the capacity of the discrete-time Poisson channel, a model of optical communication.
Motivated by portable DNA-based storage and trace reconstruction, we introduce and study the coded trace reconstruction problem, where the goal is to design efficiently encodable high-rate codes whose codewords can be efficiently reconstructed from few reads corrupted by deletions. Remarkably, we design such n-bit codes with rate 1-O(1/log n) that require exponentially fewer reads than average-case trace reconstruction algorithms.Open Acces