116,295 research outputs found
Applying MDL to Learning Best Model Granularity
The Minimum Description Length (MDL) principle is solidly based on a provably
ideal method of inference using Kolmogorov complexity. We test how the theory
behaves in practice on a general problem in model selection: that of learning
the best model granularity. The performance of a model depends critically on
the granularity, for example the choice of precision of the parameters. Too
high precision generally involves modeling of accidental noise and too low
precision may lead to confusion of models that should be distinguished. This
precision is often determined ad hoc. In MDL the best model is the one that
most compresses a two-part code of the data set: this embodies ``Occam's
Razor.'' In two quite different experimental settings the theoretical value
determined using MDL coincides with the best value found experimentally. In the
first experiment the task is to recognize isolated handwritten characters in
one subject's handwriting, irrespective of size and orientation. Based on a new
modification of elastic matching, using multiple prototypes per character, the
optimal prediction rate is predicted for the learned parameter (length of
sampling interval) considered most likely by MDL, which is shown to coincide
with the best value found experimentally. In the second experiment the task is
to model a robot arm with two degrees of freedom using a three layer
feed-forward neural network where we need to determine the number of nodes in
the hidden layer giving best modeling performance. The optimal model (the one
that extrapolizes best on unseen examples) is predicted for the number of nodes
in the hidden layer considered most likely by MDL, which again is found to
coincide with the best value found experimentally.Comment: LaTeX, 32 pages, 5 figures. Artificial Intelligence journal, To
appea
Advances in All-Neural Speech Recognition
This paper advances the design of CTC-based all-neural (or end-to-end) speech
recognizers. We propose a novel symbol inventory, and a novel iterated-CTC
method in which a second system is used to transform a noisy initial output
into a cleaner version. We present a number of stabilization and initialization
methods we have found useful in training these networks. We evaluate our system
on the commonly used NIST 2000 conversational telephony test set, and
significantly exceed the previously published performance of similar systems,
both with and without the use of an external language model and decoding
technology
System and method for character recognition
A character recognition system is disclosed in which each character in a retina, defining a scanning raster, is scanned with random lines uniformly distributed over the retina. For each type of character to be recognized the system stores a probability density function (PDF) of the random line intersection lengths and/or a PDF of the random line number of intersections. As an unknown character is scanned, the random line intersection lengths and/or the random line number of intersections are accumulated and based on a comparison with the prestored PDFs a classification of the unknown character is performed
- …