14 research outputs found

    Maximum Likelihood Associative Memories

    Full text link
    Associative memories are structures that store data in such a way that it can later be retrieved given only a part of its content -- a sort-of error/erasure-resilience property. They are used in applications ranging from caches and memory management in CPUs to database engines. In this work we study associative memories built on the maximum likelihood principle. We derive minimum residual error rates when the data stored comes from a uniform binary source. Second, we determine the minimum amount of memory required to store the same data. Finally, we bound the computational complexity for message retrieval. We then compare these bounds with two existing associative memory architectures: the celebrated Hopfield neural networks and a neural network architecture introduced more recently by Gripon and Berrou

    Codes for Graph Erasures

    Full text link
    Motivated by systems where the information is represented by a graph, such as neural networks, associative memories, and distributed systems, we present in this work a new class of codes, called codes over graphs. Under this paradigm, the information is stored on the edges of an undirected graph, and a code over graphs is a set of graphs. A node failure is the event where all edges in the neighborhood of the failed node have been erased. We say that a code over graphs can tolerate ρ\rho node failures if it can correct the erased edges of any ρ\rho failed nodes in the graph. While the construction of such codes can be easily accomplished by MDS codes, their field size has to be at least O(n2)O(n^2), when nn is the number of nodes in the graph. In this work we present several constructions of codes over graphs with smaller field size. In particular, we present optimal codes over graphs correcting two node failures over the binary field, when the number of nodes in the graph is a prime number. We also present a construction of codes over graphs correcting ρ\rho node failures for all ρ\rho over a field of size at least (n+1)/21(n+1)/2-1, and show how to improve this construction for optimal codes when ρ=2,3\rho=2,3.Comment: To appear in IEEE International Symposium on Information Theor

    Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

    Full text link
    DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

    On location, domination and information retrieval

    Get PDF
    The thesis is divided into two main branches: identifying and locatingdominating codes, and information retrieval. The former topics are motivated by the aim to locate objects in sensor networks (or other similar applications) and the latter one by the need to retrieve information in memories such as DNA data storage systems. Albeit the underlying applications, the study on these topics mainly belongs to discrete mathematics; more specically, to the elds of coding and graph theory. The sensor networks are usually represented by graphs where vertices represent the monitored locations and edges the connections between the locations. Moreover, the locations of the sensors are determined by a code. Furthermore, the desired properties of the sensor network are deeply linked with the properties of the underlying code. The number of errors in reading the data is abundant in the DNA data storage systems. In particular, there can occur more errors than a reasonable error-correcting code can handle. However, this problem is somewhat oset by the possibility to obtain multiple approximations of the same information from the data storage. Hence, the information retrieval process can be modelled by the Levenshtein's channel model, where a message is sent through multiple noisy channels and multiple outputs are received. In the rst two papers of the thesis, we introduce and study the new concepts of self- and solid-locating-dominating codes as a natural analogy to self-identifying codes with respect to locating-dominating codes. The rst paper introduces these new codes and considers them in some graphs such as the Hamming graphs. Then, in the second paper, we broaden our view on the topic by considering graph theoretical questions. We give optimal codes in multiple dierent graph classes and some more general results using concepts such as the Dilworth number and graph complements. The third paper focuses on the q-ary Hamming spaces. In particular, we disprove a conjecture proposed by Goddard and Wash related to identifying codes. In the fourth paper, we return to self- and solid-locating-dominating codes and give optimal codes in some graph classes and consider their densities in innite graphs. In the fth paper, we consider information retrieval in memories; in particular, the Levenshtein's channel model. In the channel model, we transmit some codeword belonging to the binary Hamming space through multiple identical channels. With the help of multiple dierent outputs, we give a list of codewords which may have been sent. In the paper, we study the number of channels required to have a rather small (constant) list size when the properties of the channels, the code and the dimension of the Hamming space are xed. In particular, we give an exact relation between the number of channels and the asymptotic value of the maximum list size.Väitöskirja käsittelee kahta aihetta: identioivia ja paikantavia peittokoodeja sekä tiedon noutamista muistista. Ensimmäisen aiheen motivaationa on objektien paikantaminen sensoriverkoista (sekä muut samankaltaiset sovellukset) ja jälkimmäisen tiedonnouto DNA-muisteista. Näiden aiheiden tutkimus kuuluu diskreettiin matematiikkaan, täsmällisemmin koodaus- ja graa-teoriaan. Sensoriverkkoja kuvataan yleensä graafeilla, joissa solmut esittävät tarkkailtuja kohteita ja viivat yhteyksiä näiden kohteiden välillä. Edelleen sensorien paikat määräytyvät annetun koodin perusteella. Tästä johtuen sensoriverkon halutut ominaisuudet pohjautuvat vahvasti alla olevaan koodiin. Luettaessa tietoa DNA-muisteista tapahtuvien virheiden määrä saattaa olla erittäin suuri; erityisesti suurempi kuin kiinnitetyn virheitä korjaavan koodin korjauskyky. Toisaalta tilanne ei ole aivan näin ongelmallinen, sillä DNA-muisteista voidaan saada useita eri arvioita muistiin tallennetusta tiedosta. Näistä syistä johtuen tietojen noutamista DNA-muisteista voidaan mallintaa käyttäen Levenshteinin kanavamallia. Kanavamallissa yksi viesti lähetetään useiden häiriöisten kanavien kautta ja näin vastaanotetaan useita viestejä (yksi jokaisesta kanavasta). Väitöskirjan kahdessa ensimmäisessä julkaisussa esitellään ja tutkitaan uusia paikantavien peittokoodien luokkia, jotka pohjautuvat aiemmin tutkittuihin itse-identioiviin koodeihin. Ensimmäisessä julkaisussa on esitelty nämä koodiluokat sekä tutkittu niitä joissain graafeissa kuten Hammingin graafeissa. Tämän jälkeen toisessa julkaisussa käsitellään yleisiä graa-teoreettisia kysymyksiä. Julkaisussa esitetään optimaaliset koodit useille graaperheille sekä joitain yleisempiä tuloksia käyttäen mm. Dilworthin lukua sekä graakomplementteja. Kolmas julkaisu keskittyy q-arisiin Hammingin avaruuksiin. Erityisesti julkaisussa todistetaan vääräksi Goddardin ja Washin aiemmin esittämä identioivia koodeja koskeva otaksuma. Neljäs artikkeli käsittelee jo kahdessa ensimmäisessä artikkelissa esiteltyjä paikantavien peittokoodien luokkia. Artikkeli esittää optimaalisia koodeja useille graaperheille sekä käsittelee äärettömiä graafeja. Viides artikkeli käsittelee tiedonnoutoa ja erityisesti Levenshteinin kanavamallia. Kanavamallissa binääriseen Hammingin avaruuteen kuuluva koodisana lähetetään useiden identtisten kanavien läpi. Näistä kanavista vastaanotetaan useita eri arvioita lähetetystä koodisanasta ja rakennetaan lista mahdollisesti lähetetyistä sanoista. Artikkelissa tutkitaan kuinka monta kanavaa tarvitaan, jotta tämän listan koko on pieni (vakio), kun kanavien ominaisuudet, koodi ja Hammingin avaruuden dimensio on kiinnitetty. Erityisesti löydetään täsmällinen suhde kanavien lukumäärän ja asymptoottisesti maksimaalisen listan koon välille

    Applied Advanced Error Control Coding for General Purpose Representation and Association Machine Systems

    Get PDF
    General-Purpose Representation and Association Machine (GPRAM) is proposed to be focusing on computations in terms of variation and flexibility, rather than precision and speed. GPRAM system has a vague representation and has no predefined tasks. With several important lessons learned from error control coding, neuroscience and human visual system, we investigate several types of error control codes, including Hamming code and Low-Density Parity Check (LDPC) codes, and extend them to different directions. While in error control codes, solely XOR logic gate is used to connect different nodes. Inspired by bio-systems and Turbo codes, we suggest and study non-linear codes with expanded operations, such as codes including AND and OR gates which raises the problem of prior-probabilities mismatching. Prior discussions about critical challenges in designing codes and iterative decoding for non-equiprobable symbols may pave the way for a more comprehensive understanding of bio-signal processing. The limitation of XOR operation in iterative decoding with non-equiprobable symbols is described and can be potentially resolved by applying quasi-XOR operation and intermediate transformation layer. Constructing codes for non-equiprobable symbols with the former approach cannot satisfyingly perform with regarding to error correction capability. Probabilistic messages for sum-product algorithm using XOR, AND, and OR operations with non-equiprobable symbols are further computed. The primary motivation for the constructing codes is to establish the GPRAM system rather than to conduct error control coding per se. The GPRAM system is fundamentally developed by applying various operations with substantial over-complete basis. This system is capable of continuously achieving better and simpler approximations for complex tasks. The approaches of decoding LDPC codes with non-equiprobable binary symbols are discussed due to the aforementioned prior-probabilities mismatching problem. The traditional Tanner graph should be modified because of the distinction of message passing to information bits and to parity check bits from check nodes. In other words, the message passing along two directions are identical in conventional Tanner graph, while the message along the forward direction and backward direction are different in our case. A method of optimizing signal constellation is described, which is able to maximize the channel mutual information. A simple Image Processing Unit (IPU) structure is proposed for GPRAM system, to which images are inputted. The IPU consists of a randomly constructed LDPC code, an iterative decoder, a switch, and scaling and decision device. The quality of input images has been severely deteriorated for the purpose of mimicking visual information variability (VIV) experienced in human visual systems. The IPU is capable of (a) reliably recognizing digits from images of which quality is extremely inadequate; (b) achieving similar hyper-acuity performance comparing to human visual system; and (c) significantly improving the recognition rate with applying randomly constructed LDPC code, which is not specifically optimized for the tasks
    corecore