31,348 research outputs found
Partial fillup and search time in LC tries
Andersson and Nilsson introduced in 1993 a level-compressed trie (in short:
LC trie) in which a full subtree of a node is compressed to a single node of
degree being the size of the subtree. Recent experimental results indicated a
'dramatic improvement' when full subtrees are replaced by partially filled
subtrees. In this paper, we provide a theoretical justification of these
experimental results showing, among others, a rather moderate improvement of
the search time over the original LC tries. For such an analysis, we assume
that n strings are generated independently by a binary memoryless source with p
denoting the probability of emitting a 1. We first prove that the so called
alpha-fillup level (i.e., the largest level in a trie with alpha fraction of
nodes present at this level) is concentrated on two values with high
probability. We give these values explicitly up to O(1), and observe that the
value of alpha (strictly between 0 and 1) does not affect the leading term.
This result directly yields the typical depth (search time) in the alpha-LC
tries with p not equal to 1/2, which turns out to be C loglog n for an
explicitly given constant C (depending on p but not on alpha). This should be
compared with recently found typical depth in the original LC tries which is C'
loglog n for a larger constant C'. The search time in alpha-LC tries is thus
smaller but of the same order as in the original LC tries.Comment: 13 page
Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes
I argue that data becomes temporarily interesting by itself to some
self-improving, but computationally limited, subjective observer once he learns
to predict or compress the data in a better way, thus making it subjectively
simpler and more beautiful. Curiosity is the desire to create or discover more
non-random, non-arbitrary, regular data that is novel and surprising not in the
traditional sense of Boltzmann and Shannon but in the sense that it allows for
compression progress because its regularity was not yet known. This drive
maximizes interestingness, the first derivative of subjective beauty or
compressibility, that is, the steepness of the learning curve. It motivates
exploring infants, pure mathematicians, composers, artists, dancers, comedians,
yourself, and (since 1990) artificial systems.Comment: 35 pages, 3 figures, based on KES 2008 keynote and ALT 2007 / DS 2007
joint invited lectur
Roadmap on optical security
Postprint (author's final draft
A Novel Rate Control Algorithm for Onboard Predictive Coding of Multispectral and Hyperspectral Images
Predictive coding is attractive for compression onboard of spacecrafts thanks
to its low computational complexity, modest memory requirements and the ability
to accurately control quality on a pixel-by-pixel basis. Traditionally,
predictive compression focused on the lossless and near-lossless modes of
operation where the maximum error can be bounded but the rate of the compressed
image is variable. Rate control is considered a challenging problem for
predictive encoders due to the dependencies between quantization and prediction
in the feedback loop, and the lack of a signal representation that packs the
signal's energy into few coefficients. In this paper, we show that it is
possible to design a rate control scheme intended for onboard implementation.
In particular, we propose a general framework to select quantizers in each
spatial and spectral region of an image so as to achieve the desired target
rate while minimizing distortion. The rate control algorithm allows to achieve
lossy, near-lossless compression, and any in-between type of compression, e.g.,
lossy compression with a near-lossless constraint. While this framework is
independent of the specific predictor used, in order to show its performance,
in this paper we tailor it to the predictor adopted by the CCSDS-123 lossless
compression standard, obtaining an extension that allows to perform lossless,
near-lossless and lossy compression in a single package. We show that the rate
controller has excellent performance in terms of accuracy in the output rate,
rate-distortion characteristics and is extremely competitive with respect to
state-of-the-art transform coding
Handling Massive N-Gram Datasets Efficiently
This paper deals with the two fundamental problems concerning the handling of
large n-gram language models: indexing, that is compressing the n-gram strings
and associated satellite data without compromising their retrieval speed; and
estimation, that is computing the probability distribution of the strings from
a large textual source. Regarding the problem of indexing, we describe
compressed, exact and lossless data structures that achieve, at the same time,
high space reductions and no time degradation with respect to state-of-the-art
solutions and related software packages. In particular, we present a compressed
trie data structure in which each word following a context of fixed length k,
i.e., its preceding k words, is encoded as an integer whose value is
proportional to the number of words that follow such context. Since the number
of words following a given context is typically very small in natural
languages, we lower the space of representation to compression levels that were
never achieved before. Despite the significant savings in space, our technique
introduces a negligible penalty at query time. Regarding the problem of
estimation, we present a novel algorithm for estimating modified Kneser-Ney
language models, that have emerged as the de-facto choice for language modeling
in both academia and industry, thanks to their relatively low perplexity
performance. Estimating such models from large textual sources poses the
challenge of devising algorithms that make a parsimonious use of the disk. The
state-of-the-art algorithm uses three sorting steps in external memory: we show
an improved construction that requires only one sorting step thanks to
exploiting the properties of the extracted n-gram strings. With an extensive
experimental analysis performed on billions of n-grams, we show an average
improvement of 4.5X on the total running time of the state-of-the-art approach.Comment: Published in ACM Transactions on Information Systems (TOIS), February
2019, Article No: 2
Microstructure modelling of hot deformation of Al–1%Mg alloy
This study presents the application of the finite elementmethod and intelligent systems techniques to the
prediction of microstructural mapping for aluminium alloys. Here, the material within each finite element
is defined using a hybrid model. The hybrid model is based on neuro-fuzzy and physically based components
and it has been combined with the finite element technique. The model simulates the evolution of
the internal state variables (i.e. dislocation density, subgrain size and subgrain boundary misorientation)
and their effect on the recrystallisation behaviour of the stock. This paper presents the theory behind
the model development, the integration between the numerical techniques, and the application of the
technique to a hot rolling operation using aluminium, 1 wt% magnesium alloy. Furthermore, experimental
data from plane strain compression (PSC) tests and rolling are used to validate the modelling outcome.
The results show that the recrystallisation kinetics agree well with the experimental results for different
annealing times. This hybrid approach has proved to be more accurate than conventional methods using empirical equations
- …