253 research outputs found
Tree models :algorithms and information theoretic properties
La tesis estudia propiedades fundamentales y algoritmos relacionados con modelos árbol. Estos modelos requieren una cantidad relativamente pequeña de parámetros para representar fuentes de memoria finita (Markov) sobre alfabetos finitos, cuando el largo de la cantidad de sÃmbolos pasados necesaria para determinar la distribución de probabilidad condicional del siguiente sÃmbolo no es fija, sino que depende del contexto en el cual ocurre el sÃmbolo. La tesis define estructuras combinatorias como árboles de contexto generalizados y sus clausuras FSM (del inglés finite state machine), y aplica estas estructuras para describir la primera implementación en tiempo lineal de codificación y decodificación de la versión semi-predictiva del algoritmo Context, un esquema doblemente universal que alcanza una tasa de convergencia óptima a la entropÃa en la clases de modelos árbol. La tesis analiza luego clases de tipo para modelos árbol, extendiendo el método de tipos previamente estudiado para modelos FSM. Se deriva una fórmula exacta para la cardinalidad de una clase de tipo para una secuencia de largo n dada, asà como una estimación asintótica del valor esperado del logaritmo del tamaño de una clase de tipo, y una estimación asintótica del número de clases de tipo diferentes para secuencias de un largo dado. Estos resultados asintóticos se derivan con la ayuda del nuevo concepto de extensión canónica mÃnima de un árbol de contexto, un objeto combinatorio fundamental que se encuentra entre el árbol original y su clausura FSM. Como aplicaciones de las nuevas propiedades descubiertas para modelos árbol, se presentan algoritmos de codificación enumerativa doblemente universales y esquemas de simulación universal para secuencias individuales. Finalmente, la tesis presenta algunos problemas abiertos y direcciones para investigaciones futuras en esta área
Implementación de un algoritmo de codificación universal eficiente usando árboles de contexto
En este trabajo se propone una nueva variante del algoritmo context [28] usando codificacion semi-predictiva y se desarrolla una implementacion eficiente del mismo. Para manejar los problemas relacionados con la gran cantidad de parametros tıpicos de los archivos de texto y ejecutables y mejorar el nivel de compresi´on del algoritmo, se adaptaron algunas t´ecnicas principalmente de algoritmos tipo PPM [8]. El procedimiento de podado y el estimador de probabilidad fueron adaptados para tener en cuenta estos cambios. Ademas se uso una version modificada del algoritmo wotd [11] para podar el ´arbol durante su construccion reduciendo ası los requerimientos de memoria durante la fase de codificacion del algoritmo. Finalmente se realizo una comparacion experimental entre el algoritmo desarrollado aquı y algunas de las mejores implementaciones de otros algoritmos de codificacion conocidos
Computational Security Subject to Source Constraints, Guesswork and Inscrutability
Guesswork forms the mathematical framework for
quantifying computational security subject to brute-force determination
by query. In this paper, we consider guesswork
subject to a per-symbol Shannon entropy budget. We introduce
inscrutability rate to quantify the asymptotic difficulty of guessing
U out of V secret strings drawn from the string-source and
prove that the inscrutability rate of any string-source supported
on a finite alphabet X, if it exists, lies between the per-symbol
Shannon entropy constraint and log |X|. We show that for a
stationary string-source, the inscrutability rate of guessing any
fraction (1 - ϵ) of the V strings for any fixed ϵ > 0, as V
grows, approaches the per-symbol Shannon entropy constraint
(which is equal to the Shannon entropy rate for the stationary
string-source). This corresponds to the minimum inscrutability
rate among all string-sources with the same per-symbol Shannon
entropy. We further prove that the inscrutability rate of any
finite-order Markov string-source with hidden statistics remains
the same as the unhidden case, i.e., the asymptotic value of hiding
the statistics per each symbol is vanishing. On the other hand, we
show that there exists a string-source that achieves the upper limit
on the inscrutability rate, i.e., log |X|, under the same Shannon
entropy budget
The analysis of enumerative source codes and their use in Burrows‑Wheeler compression algorithms
In the late 20th century the reliable and efficient transmission, reception and storage of information proved to be central to the most successful economies all over the world. The Internet, once a classified project accessible to a selected few, is now part of the everyday lives of a large part of the human population, and as such the efficient storage of information is an important part of the information economy. The improvement of the information storage density of optical and electronic media has been remarkable, but the elimination of redundancy in stored data and the reliable reconstruction of the original data is still a desired goal. The field of source coding is concerned with the compression of redundant data and its reliable decompression. The arithmetic source code, which was independently proposed by J. J. Rissanen and R. Pasco in 1976, revolutionized the field of source coding. Compression algorithms that use an arithmetic code to encode redundant data are typically more effective and computationally more efficient than compression algorithms that use earlier source codes such as extended Huffman codes. The arithmetic source code is also more flexible than earlier source codes, and is frequently used in adaptive compression algorithms. The arithmetic code remains the source code of choice, despite having been introduced more than 30 years ago. The problem of effectively encoding data from sources with known statistics (i.e. where the probability distribution of the source data is known) was solved with the introduction of the arithmetic code. The probability distribution of practical data is seldomly available to the source encoder, however. The source coding of data from sources with unknown statistics is a more challenging problem, and remains an active research topic. Enumerative source codes were introduced by T. J. Lynch and L. D. Davisson in the 1960s. These lossless source codes have the remarkable property that they may be used to effectively encode source sequences from certain sources without requiring any prior knowledge of the source statistics. One drawback of these source codes is the computationally complex nature of their implementations. Several years after the introduction of enumerative source codes, J. G. Cleary and I. H. Witten proved that approximate enumerative source codes may be realized by using an arithmetic code. Approximate enumerative source codes are significantly less complex than the original enumerative source codes, but are less effective than the original codes. Researchers have become more interested in arithmetic source codes than enumerative source codes since the publication of the work by Cleary and Witten. This thesis concerns the original enumerative source codes and their use in Burrows–Wheeler compression algorithms. A novel implementation of the original enumerative source code is proposed. This implementation has a significantly lower computational complexity than the direct implementation of the original enumerative source code. Several novel enumerative source codes are introduced in this thesis. These codes include optimal fixed–to–fixed length source codes with manageable computational complexity. A generalization of the original enumerative source code, which includes more complex data sources, is proposed in this thesis. The generalized source code uses the Burrows–Wheeler transform, which is a low–complexity algorithm for converting the redundancy of sequences from complex data sources to a more accessible form. The generalized source code effectively encodes the transformed sequences using the original enumerative source code. It is demonstrated and proved mathematically that this source code is universal (i.e. the code has an asymptotic normalized average redundancy of zero bits). AFRIKAANS : Die betroubare en doeltreffende versending, ontvangs en berging van inligting vorm teen die einde van die twintigste eeu die kern van die mees suksesvolle ekonomie¨e in die wˆereld. Die Internet, eens op ’n tyd ’n geheime projek en toeganklik vir slegs ’n klein groep verbruikers, is vandag deel van die alledaagse lewe van ’n groot persentasie van die mensdom, en derhalwe is die doeltreffende berging van inligting ’n belangrike deel van die inligtingsekonomie. Die verbetering van die bergingsdigteid van optiese en elektroniese media is merkwaardig, maar die uitwissing van oortolligheid in gebergde data, asook die betroubare herwinning van oorspronklike data, bly ’n doel om na te streef. Bronkodering is gemoeid met die kompressie van oortollige data, asook die betroubare dekompressie van die data. Die rekenkundige bronkode, wat onafhanklik voorgestel is deur J. J. Rissanen en R. Pasco in 1976, het ’n revolusie veroorsaak in die bronkoderingsveld. Kompressiealgoritmes wat rekenkundige bronkodes gebruik vir die kodering van oortollige data is tipies meer doeltreffend en rekenkundig meer effektief as kompressiealgoritmes wat vroe¨ere bronkodes, soos verlengde Huffman kodes, gebruik. Rekenkundige bronkodes, wat gereeld in aanpasbare kompressiealgoritmes gebruik word, is ook meer buigbaar as vroe¨ere bronkodes. Die rekenkundige bronkode bly na 30 jaar steeds die bronkode van eerste keuse. Die probleem om data wat afkomstig is van bronne met bekende statistieke (d.w.s. waar die waarskynlikheidsverspreiding van die brondata bekend is) doeltreffend te enkodeer is opgelos deur die instelling van rekenkundige bronkodes. Die bronenkodeerder het egter selde toegang tot die waarskynlikheidsverspreiding van praktiese data. Die bronkodering van data wat afkomstig is van bronne met onbekende statistieke is ’n groter uitdaging, en bly steeds ’n aktiewe navorsingsveld. T. J. Lynch and L. D. Davisson het tel–bronkodes in die 1960s voorgestel. Tel– bronkodes het die merkwaardige eienskap dat bronsekwensies van sekere bronne effektief met hierdie foutlose kodes ge¨enkodeer kan word, sonder dat die bronenkodeerder enige vooraf kennis omtrent die statistieke van die bron hoef te besit. Een nadeel van tel–bronkodes is die ho¨e rekenkompleksiteit van hul implementasies. J. G. Cleary en I. H. Witten het verskeie jare na die instelling van tel–bronkodes bewys dat benaderde tel–bronkodes gerealiseer kan word deur die gebruik van rekenkundige bronkodes. Benaderde tel–bronkodes het ’n laer rekenkompleksiteit as tel–bronkodes, maar benaderde tel–bronkodes is minder doeltreffend as die oorspronklike tel–bronkodes. Navorsers het sedert die werk van Cleary en Witten meer belangstelling getoon in rekenkundige bronkodes as tel–bronkodes. Hierdie tesis is gemoeid met die oorspronklike tel–bronkodes en die gebruik daarvan in Burrows–Wheeler kompressiealgoritmes. ’n Nuwe implementasie van die oorspronklike tel–bronkode word voorgestel. Die voorgestelde implementasie het ’n beduidende laer rekenkompleksiteit as die direkte implementasie van die oorspronklike tel–bronkode. Verskeie nuwe tel–bronkodes, insluitende optimale vaste–tot–vaste lengte tel–bronkodes met beheerbare rekenkompleksiteit, word voorgestel. ’n Veralgemening van die oorspronklike tel–bronkode, wat meer komplekse databronne insluit as die oorspronklike tel–bronkode, word voorgestel in hierdie tesis. The veralgemeende tel–bronkode maak gebruik van die Burrows–Wheeler omskakeling. Die Burrows–Wheeler omskakeling is ’n lae–kompleksiteit algoritme wat die oortolligheid van bronsekwensies wat afkomstig is van komplekse databronne omskakel na ’n meer toeganklike vorm. Die veralgemeende bronkode enkodeer die omgeskakelde sekwensies effektief deur die oorspronklike tel–bronkode te gebruik. Die universele aard van hierdie bronkode word gedemonstreer en wiskundig bewys (d.w.s. dit word bewys dat die kode ’n asimptotiese genormaliseerde gemiddelde oortolligheid van nul bisse het). CopyrightDissertation (MEng)--University of Pretoria, 2010.Electrical, Electronic and Computer Engineeringunrestricte
Recommended from our members
A Neural Signal Processor for Low-Latency Spike Inference
This thesis describes the development of a system that can assign identities to a population of single-units, in multi-electrode recordings, at single-spike resolution with low-latency. The system has two parts. The first is a Field-Programmable Gate Array (FPGA)-based Neural Signal Processor (NSP) that receives raw input and generates labelled spikes as output, a process referred to as real-time spike inference. The second is a piece of software (Spiketag) that runs on a PC, communicates with the NSP, and generates a spike-sorted model to guide the real-time spike inference. The NSP provides clocks and control signals to five 32-channel INTAN RHD2132 chips to manage the acquisition of 160 channels of raw neural data. In parallel, the NSP further filters, detects and extracts extracellular spike waveforms from the raw neural data recorded by tetrodes or silicon probes and assigns single-unit identity to each detected spike. A set of Python application programming interfaces (APIs) was developed in Spiketag to enable the communication between the NSP and the PC. These APIs allow the NSP to obtain a model from the PC, which holds parameters such as reference channels, spike detection thresholds, spike feature transformation matrix and vector quantized clusters generated by spike sorting a short recording session. Using the spike-sorted model, the NSP performs data acquisition and real-time spike inference simultaneously. Algorithmic modules were implemented in the FPGA and pipelined to compute during 40 ms acquisition intervals. At the output end of the FPGA NSP, the real-time assigned single-unit identity (spike-id) is packaged with the timestamp, the electrode group, and the spike features as a spike-id packet. Spike-id packets are asynchronously transmitted through a low-latency Peripheral Component Interconnect Express (PCIe) interface to the PC, producing the real-time spike trains. The real-time spike trains can be used for further processing, such as real-time decoding. Several types of ground-truth data, including intracellular/extracellular paired recordings, synthesized
tetrode extracellular waveforms with ground-truth spike timing and high-channel-count silicon probe recordings with ground-truth animal positions during navigation were used to validate the low-latency (1 ms) and high-accuracy (as high as state-of-the-art offline sorting and decoding algorithms) of the NSP’s real-time spike inference and the NSP-based
real-time population decoding performance
- …