457 research outputs found
Data Discovery and Anomaly Detection Using Atypicality: Theory
A central question in the era of 'big data' is what to do with the enormous
amount of information. One possibility is to characterize it through
statistics, e.g., averages, or classify it using machine learning, in order to
understand the general structure of the overall data. The perspective in this
paper is the opposite, namely that most of the value in the information in some
applications is in the parts that deviate from the average, that are unusual,
atypical. We define what we mean by 'atypical' in an axiomatic way as data that
can be encoded with fewer bits in itself rather than using the code for the
typical data. We show that this definition has good theoretical properties. We
then develop an implementation based on universal source coding, and apply this
to a number of real world data sets.Comment: 40 page
Sub-micron technology development and system-on-chip (Soc) design - data compression core
Data compression removes redundancy from the source data and thereby increases storage capacity of a storage medium or efficiency of data transmission in a communication link. Although several data compression techniques have been implemented in hardware, they are not flexible enough to be embedded in more complex applications. Data compression software meanwhile cannot support the demand of high-speed computing applications. Due to these deficiencies, in this project we develop a parameterized lossless universal data compression IP core for high-speed applications. The design of the core is based on the combination of Lempel-Ziv-Storer-Szymanski (LZSS) compression algorithm and Huffman coding. The resulting IP core offers a data-independent throughput that can process a symbol in every clock cycle. The design is described in parameterized VHDL code to enable a user to make a suitable compromise between resource constraints, operation speed and compression saving, so that it can be adapted for any target application. In implementation on Altera FLEX10KE FPGA device, the design offers a performance of 800 Mbps with an operating frequency of 50 MHz. This IP core is suitable for high-speed computing applications or for storage systems
Quantifying hidden order out of equilibrium
While the equilibrium properties, states, and phase transitions of
interacting systems are well described by statistical mechanics, the lack of
suitable state parameters has hindered the understanding of non-equilibrium
phenomena in diverse settings, from glasses to driven systems to biology. The
length of a losslessly compressed data file is a direct measure of its
information content: The more ordered the data is, the lower its information
content and the shorter the length of its encoding can be made. Here, we
describe how data compression enables the quantification of order in
non-equilibrium and equilibrium many-body systems, both discrete and
continuous, even when the underlying form of order is unknown. We consider
absorbing state models on and off-lattice, as well as a system of active
Brownian particles undergoing motility-induced phase separation. The technique
reliably identifies non-equilibrium phase transitions, determines their
character, quantitatively predicts certain critical exponents without prior
knowledge of the order parameters, and reveals previously unknown ordering
phenomena. This technique should provide a quantitative measure of organization
in condensed matter and other systems exhibiting collective phase transitions
in and out of equilibrium
- …