Search CORE

77,247 research outputs found

ENGINEERING COMPRESSED STATIC FUNCTIONS AND MINIMAL PERFECT HASH FUNCTIONS

Author: M. Genuzio
Publication venue: Università degli Studi di Milano
Publication date: 27/02/2018
Field of study

\emph{Static functions} are data structures meant to store arbitrary mappings from finite sets to integers; that is, given universe of items

U

, a set of

n \in \mathbb{N}

pairs

(k_i,v_i)

where

k_i \in S \subset U, |S|=n

, and

v_i \in \{0, 1, \ldots, m-1\} , m \in \mathbb{N}

, a static function will retrieve

v_i

given

k_i

(usually, in constant time). When every key is mapped into a different value this function is called \emph{perfect hash function} and when

n=m

the data structure yields an injective numbering

S\to \lbrace0,1, \ldots n-1 \rbrace

; this mapping is called a \emph{minimal perfect hash function}. Big data brought back one of the most critical challenges that computer scientists have been tackling during the last fifty years, that is, analyzing big amounts of data that do not fit in main memory. While for small keysets these mappings can be easily implemented using hash tables, this solution does not scale well for bigger sets. Static functions and MPHFs break the information-theoretical lower bound of storing the set

S

because they are allowed to return \emph{any} value if the queried key is not in the original keyset. The classical constructions technique for static functions can achieve just

O(nb)

bits space, where

b=\log(m)

, and the one for MPHFs

O(n)

bits of space (always with constant access time). All these features make static functions and MPHFs powerful techniques when handling, for instance, large sets of strings, and they are essential building blocks of space-efficient data structures such as (compressed) full-text indexes, monotone MPHFs, Bloom filter-like data structures, and prefix-search data structures. The biggest challenge of this construction technique involves lowering the multiplicative constants hidden inside the asymptotic space bounds while keeping feasible construction times. In this thesis, we take advantage of the recent result in random linear systems theory regarding the ratio between the number of variables and number of the equations, and in perfect hash data structures, to achieve practical static functions with the lowest space bounds so far, and construction time comparable with widely used techniques. The new results, however, require solving linear systems that require more than a simple triangulation process, as it happens in current state-of-the-art solutions. The main challenge in making such structures usable is mitigating the cubic running time of Gaussian elimination at construction time. To this purpose, we introduce novel techniques based on \emph{broadword programming} and a heuristic derived from \emph{structured Gaussian elimination}. We obtained data structures that are significantly smaller than commonly used hypergraph-based constructions while maintaining or improving the lookup times and providing still feasible construction.We then apply these improvements to another kind of structures: \emph{compressed static hash functions}. The theoretical construction technique for this kind of data structure uses prefix-free codes with variable length to encode the set of values. Adopting this solution, we can reduce the\n space usage of each element to (essentially) the entropy of the list of output values of the function.Indeed, we need to solve an even bigger linear system of equations, and the time required to build the structure increases. In this thesis, we present the first engineered implementation of compressed hash functions. For example, we were able to store a function with geometrically distributed output, with parameter

p=0.5

in just

2.28

bit per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires

\approx\log \log n

bits per key, where

n

is the number of keys, and similar lookup time. We can also store a function with an output distributed following a Zipfian distribution with parameter

s=2

and

N= 10^6

in just

2.75

bits per key, whereas a non-compressed function would require more than

20

, with a threefold increase in construction time and significantly faster lookups

AIR Universita degli studi di Milano

Multimethods and separate static typechecking in a language with C++-like object model

Author: Panizzi Emanuele
Pastorelli Bernardo
Publication venue
Publication date: 01/01/1999
Field of study

The goal of this paper is the description and analysis of multimethod implementation in a new object-oriented, class-based programming language called OOLANG. The implementation of the multimethod typecheck and selection, deeply analyzed in the paper, is performed in two phases in order to allow static typechecking and separate compilation of modules. The first phase is performed at compile time, while the second is executed at link time and does not require the modules' source code. OOLANG has syntax similar to C++; the main differences are the absence of pointers and the realization of polymorphism through subsumption. It adopts the C++ object model and supports multiple inheritance as well as virtual base classes. For this reason, it has been necessary to define techniques for realigning argument and return value addresses when performing multimethod invocations.Comment: 15 pages, 18 figure

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Compressive sampling for accelerometer signals in structural health monitoring

Author: Bao Yuequan
Beck James L.
Li Hui
Publication venue: 'Academy of Traumatology'
Publication date: 01/05/2011
Field of study

In structural health monitoring (SHM) of civil structures, data compression is often needed to reduce the cost of data transfer and storage, because of the large volumes of sensor data generated from the monitoring system. The traditional framework for data compression is to first sample the full signal and, then to compress it. Recently, a new data compression method named compressive sampling (CS) that can acquire the data directly in compressed form by using special sensors has been presented. In this article, the potential of CS for data compression of vibration data is investigated using simulation of the CS sensor algorithm. For reconstruction of the signal, both wavelet and Fourier orthogonal bases are examined. The acceleration data collected from the SHM system of Shandong Binzhou Yellow River Highway Bridge is used to analyze the data compression ability of CS. For comparison, both the wavelet-based and Huffman coding methods are employed to compress the data. The results show that the values of compression ratios achieved using CS are not high, because the vibration data used in SHM of civil structures are not naturally sparse in the chosen bases

Caltech Authors