22,335 research outputs found
Do not forget: Full memory in memory-based learning of word pronunciation
Memory-based learning, keeping full memory of learning material, appears a
viable approach to learning NLP tasks, and is often superior in generalisation
accuracy to eager learning approaches that abstract from learning material.
Here we investigate three partial memory-based learning approaches which remove
from memory specific task instance types estimated to be exceptional. The three
approaches each implement one heuristic function for estimating exceptionality
of instance types: (i) typicality, (ii) class prediction strength, and (iii)
friendly-neighbourhood size. Experiments are performed with the memory-based
learning algorithm IB1-IG trained on English word pronunciation. We find that
removing instance types with low prediction strength (ii) is the only tested
method which does not seriously harm generalisation accuracy. We conclude that
keeping full memory of types rather than tokens, and excluding minority
ambiguities appear to be the only performance-preserving optimisations of
memory-based learning.Comment: uses conll98, epsf, and ipamacs (WSU IPA
Feat: Functional Enumeration of Algebraic Types
In mathematics, an enumeration of a set S is a bijective function from (an initial segment of) the natural numbers to S. We define "functional enumerations" as efficiently computable such bijections. This paper describes a theory of functional enumeration and provides an algebra of enumerations closed under sums, products, guarded recursion and bijections. We partition each enumerated set into numbered, finite subsets.
We provide a generic enumeration such that the number of each part corresponds to the size of its values (measured in the number of constructors). We implement our ideas in a Haskell library called testing-feat, and make the source code freely available. Feat provides efficient "random access" to enumerated values. The primary application is property-based testing, where it is used to define both random sampling (for example QuickCheck generators) and exhaustive enumeration (in the style of SmallCheck). We claim that functional enumeration is the best option for automatically generating test cases from large groups of mutually recursive syntax tree types. As a case study we use Feat to test the pretty-printer of the Template Haskell library (uncovering several bugs)
Synchronous Counting and Computational Algorithm Design
Consider a complete communication network on nodes, each of which is a
state machine. In synchronous 2-counting, the nodes receive a common clock
pulse and they have to agree on which pulses are "odd" and which are "even". We
require that the solution is self-stabilising (reaching the correct operation
from any initial state) and it tolerates Byzantine failures (nodes that
send arbitrary misinformation). Prior algorithms are expensive to implement in
hardware: they require a source of random bits or a large number of states.
This work consists of two parts. In the first part, we use computational
techniques (often known as synthesis) to construct very compact deterministic
algorithms for the first non-trivial case of . While no algorithm exists
for , we show that as few as 3 states per node are sufficient for all
values . Moreover, the problem cannot be solved with only 2 states per
node for , but there is a 2-state solution for all values .
In the second part, we develop and compare two different approaches for
synthesising synchronous counting algorithms. Both approaches are based on
casting the synthesis problem as a propositional satisfiability (SAT) problem
and employing modern SAT-solvers. The difference lies in how to solve the SAT
problem: either in a direct fashion, or incrementally within a counter-example
guided abstraction refinement loop. Empirical results suggest that the former
technique is more efficient if we want to synthesise time-optimal algorithms,
while the latter technique discovers non-optimal algorithms more quickly.Comment: 35 pages, extended and revised versio
Weighted dynamic finger in binary search trees
It is shown that the online binary search tree data structure GreedyASS
performs asymptotically as well on a sufficiently long sequence of searches as
any static binary search tree where each search begins from the previous search
(rather than the root). This bound is known to be equivalent to assigning each
item in the search tree a positive weight and bounding the search
cost of an item in the search sequence by
amortized. This result is the strongest finger-type bound to be proven for
binary search trees. By setting the weights to be equal, one observes that our
bound implies the dynamic finger bound. Compared to the previous proof of the
dynamic finger bound for Splay trees, our result is significantly shorter,
stronger, simpler, and has reasonable constants.Comment: An earlier version of this work appeared in the Proceedings of the
Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithm
Adaptive Threshold Sampling and Estimation
Sampling is a fundamental problem in both computer science and statistics. A
number of issues arise when designing a method based on sampling. These include
statistical considerations such as constructing a good sampling design and
ensuring there are good, tractable estimators for the quantities of interest as
well as computational considerations such as designing fast algorithms for
streaming data and ensuring the sample fits within memory constraints.
Unfortunately, existing sampling methods are only able to address all of these
issues in limited scenarios.
We develop a framework that can be used to address these issues in a broad
range of scenarios. In particular, it addresses the problem of drawing and
using samples under some memory budget constraint. This problem can be
challenging since the memory budget forces samples to be drawn
non-independently and consequently, makes computation of resulting estimators
difficult.
At the core of the framework is the notion of a data adaptive thresholding
scheme where the threshold effectively allows one to treat the non-independent
sample as if it were drawn independently. We provide sufficient conditions for
a thresholding scheme to allow this and provide ways to build and compose such
schemes.
Furthermore, we provide fast algorithms to efficiently sample under these
thresholding schemes
The Cost of Address Translation
Modern computers are not random access machines (RAMs). They have a memory
hierarchy, multiple cores, and virtual memory. In this paper, we address the
computational cost of address translation in virtual memory. Starting point for
our work is the observation that the analysis of some simple algorithms (random
scan of an array, binary search, heapsort) in either the RAM model or the EM
model (external memory model) does not correctly predict growth rates of actual
running times. We propose the VAT model (virtual address translation) to
account for the cost of address translations and analyze the algorithms
mentioned above and others in the model. The predictions agree with the
measurements. We also analyze the VAT-cost of cache-oblivious algorithms.Comment: A extended abstract of this paper was published in the proceedings of
ALENEX13, New Orleans, US
- …