9 research outputs found
Recommended from our members
Converting English text to speech : a machine learning approach
The task of mapping spelled English words into strings of phonemes and stresses ("reading aloud") has many practical applications. Several commercial systems perform this task by applying a knowledge base of expert-supplied letter-to-sound
rules. This dissertation presents a set of machine learning methods for automatically constructing letter-to-sound rules by analyzing a dictionary of words and their pronunciations. Taken together, these methods provide a substantial performance
improvement over the best commercial system-DECtalk from Digital
Equipment Corporation. In a performance test, the learning methods were trained on a dictionary of 19,002 words. Then, human subjects were asked to compare the performance of the resulting letter-to-sound rules against the dictionary for an additional
1,000 words not used during training. In a blind procedure, the subjects rated the pronunciations of both the learned rules and the DECtalk rules according to whether they were noticeably different from the dictionary pronunciation. The error rate for the learned rules was 28.8% (288 words noticeably different), while the error rate for the DECtalk rules was 44.3% (443 words noticeably different). If, instead of using human judges, we required that the pronunciations of the letter-to-sound rules exactly match the dictionary to be counted correct, then the error
rate for our learned rules is 35.2% and the error rate for DECtalk is 63.6%. Similar results were observed at the level of individual letters, phonemes, and stresses. To achieve these results, several techniques were combined. The key learning
technique represents the output classes by the codewords of an error-correcting code. Boolean concept learning methods, such as the standard ID3 decision-tree algorithm, can be applied to learn the individual bits of these codewords. This converts the multiclass learning problem into a number of boolean concept learning problems. This method is shown to be superior to several other methods: multiclass ID3, one-tree-per-class 1D3, the domain-specific distributed code employed by T. Sejnowski and C. Rosenberg in their NETtalk system, and a method developed by D. Wolpert. Similar results in the domain of isolated-letter speech recognition with the backpropagation algorithm show that error-correcting output codes provide a domain-independent, algorithm-independent approach to multiclass learning problems
Forgetting Exceptions is Harmful in Language Learning
We show that in language learning, contrary to received wisdom, keeping
exceptional training instances in memory can be beneficial for generalization
accuracy. We investigate this phenomenon empirically on a selection of
benchmark natural language processing tasks: grapheme-to-phoneme conversion,
part-of-speech tagging, prepositional-phrase attachment, and base noun phrase
chunking. In a first series of experiments we combine memory-based learning
with training set editing techniques, in which instances are edited based on
their typicality and class prediction strength. Results show that editing
exceptional instances (with low typicality or low class prediction strength)
tends to harm generalization accuracy. In a second series of experiments we
compare memory-based learning and decision-tree learning methods on the same
selection of tasks, and find that decision-tree learning often performs worse
than memory-based learning. Moreover, the decrease in performance can be linked
to the degree of abstraction from exceptions (i.e., pruning or eagerness). We
provide explanations for both results in terms of the properties of the natural
language processing tasks and the learning algorithms.Comment: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex
styles. Pre-print version of article to appear in Machine Learning 11:1-3,
Special Issue on Natural Language Learning. Figures on page 22 slightly
compressed to avoid page overloa
Recommended from our members
A study of instance-based algorithms for supervised learning tasks : mathematical, empirical, and psychological evaluations
This dissertation introduces a framework for specifying instance-based algorithms that can solve supervised learning tasks. These algorithms input a sequence of instances and yield a partial concept description, which is represented by a set of stored instances and associated information. This description can be used to predict values for subsequently presented instances. The thesis of this framework is that extensional concept descriptions and lazy generalization strategies can support efficient supervised learning behavior.The instance-based learning framework consists of three components. The pre-processor component transforms an instance into a more palatable form for the performance component, which computes the instance's similarity with a set of stored instances and yields a prediction for its target value(s). Therefore, the similarity and prediction functions impose generalizations on the stored instances to inductively derive predictions. The learning component assesses the accuracy of these prediction(s) and updates partial concept descriptions to improve their predictive accuracy.This framework is evaluated in four ways. First, its generality is evaluated by mathematically determining the classes of symbolic concepts and numeric functions that can be closely approximated by IB_1, a simple algorithm specified by this framework. Second, this framework is empirically evaluated for its ability to specify algorithms that improve IB_1's learning efficiency. Significant efficiency improvements are obtained by instance-based algorithms that reduce storage requirements, tolerate noisy data, and learn domain-specific similarity functions respectively. Alternative component definitions for these algorithms are empirically analyzed in a set of five high-level parameter studies. Third, this framework is evaluated for its ability to specify psychologically plausible process models for categorization tasks. Results from subject experiments indicate a positive correlation between a models' ability to utilize attribute correlation information and its ability to explain psychological phenomena. Finally, this framework is evaluated for its ability to explain and relate a dozen prominent instance-based learning systems. The survey shows that this framework requires only slight modifications to fit these highly diverse systems. Relationships with edited nearest neighbor algorithms, case-based reasoners, and artificial neural networks are also described
Recommended from our members
A study of distance-based machine learning algorithms
Distance-based algorithms are machine learning algorithms that classify queries
by computing distances between these queries and a number of internally stored
exemplars. Exemplars that are closest to the query have the largest in
uence on
the classi cation assigned to the query. Two speci c distance-based algorithms, the
nearest neighbor algorithm and the nearest-hyperrectangle algorithm, are studied in
detail.
It is shown that the k-nearest neighbor algorithm (kNN) outperforms the rst-
nearest neighbor algorithm only under certain conditions. Data sets must contain
moderate amounts of noise. Training examples from the di erent classes must belong
to clusters that allow an increase in the value of k without reaching into clusters of
other classes. Methods for choosing the value of k for kNN are investigated. It is
shown that one-fold cross-validation on a restricted number of values for k su ces
for best performance. It is also shown that for best performance the votes of the
k-nearest neighbors of a query should be weighted in inverse proportion to their
distances from the query.
Principal component analysis is shown to reduce the number of relevant dimen-
sions substantially in several domains. Two methods for learning feature weights
for a weighted Euclidean distance metric are proposed. These methods improve the
performance of kNN and NN in a variety of domains.
The nearest-hyperrectangle algorithm (NGE) is found to give predictions that are
substantially inferior to those given by kNN in a variety of domains. Experiments performed to understand this inferior performance led to the discovery of several
improvements to NGE. Foremost of these is BNGE, a batch algorithm that avoids
construction of overlapping hyperrectangles from di erent classes. Although it is
generally superior to NGE, BNGE is still signi cantly inferior to kNN in a variety
of domains. Hence, a hybrid algorithm (KBNGE), that uses BNGE in parts of the
input space that can be represented by a single hyperrectangle and kNN otherwise,
is introduced.
The primary contributions of this dissertation are (a) several improvements to
existing distance-based algorithms, (b) several new distance-based algorithms, and
(c) an experimentally supported understanding of the conditions under which various
distance-based algorithms are likely to give good performance
Guaranteeing generalisation in neural networks
Neural networks need to be able to guarantee their intrinsic generalisation abilities if they are to be used reliably.
Mitchell's concept and version spaces technique is able to guarantee generalisation in the symbolic concept-learning environment in which it is implemented. Generalisation, according to Mitchell, is guaranteed when there is no alternative concept that is consistent with all the examples presented so far, except the current concept, given the bias of the user. A form of bidirectional convergence is used by Mitchell to recognise when the no-alternative situation has been reached.
Mitchell's technique has problems of search and storage feasibility in its symbolic environment. This thesis aims to show that by evolving the technique further in a neural environment, these problems can be overcome.
Firstly, the biasing factors which affect the kind of concept that can be learned are explored in a neural network context. Secondly, approaches for abstracting the underlying features of the symbolic technique that enable recognition of the no-alternative situation are discussed. The discussion generates neural techniques for guaranteeing generalisation and culminates in a neural technique which is able to recognise when the best fit neural weight state has been found for a given set of data and topology