25 research outputs found
Stable Encoding of Large Finite-State Automata in Recurrent Neural Networks with Sigmoid Discriminants
We propose an algorithm for encoding deterministic finite-state automata (DFAs)
in second-order recurrent neural networks with sigmoidal discriminant function
and we prove that the languages accepted by the
constructed network and the DFA are identical. The desired finite-state
network dynamics is achieved by programming a small subset of all weights.
A worst case analysis reveals a relationship between the weight strength
and the maximum allowed network size which guarantees finite-state
behavior of the constructed network.
We illustrate the method by encoding random DFAs with 10, 100, and 1,000
states. While the theory predicts that the weight strength scales with
the DFA size, we find the weight strength to be almost constant for all
the experiments. These results can be explained by noting that the
generated DFAs represent average cases. We empirically demonstrate the
existence of extreme DFAs for which the weight strength scales with DFA size.
(Also cross-referenced as UMIACS-TR-94-101
Provably Stable Interpretable Encodings of Context Free Grammars in RNNs with a Differentiable Stack
Given a collection of strings belonging to a context free grammar (CFG) and
another collection of strings not belonging to the CFG, how might one infer the
grammar? This is the problem of grammatical inference. Since CFGs are the
languages recognized by pushdown automata (PDA), it suffices to determine the
state transition rules and stack action rules of the corresponding PDA. An
approach would be to train a recurrent neural network (RNN) to classify the
sample data and attempt to extract these PDA rules. But neural networks are not
a priori aware of the structure of a PDA and would likely require many samples
to infer this structure. Furthermore, extracting the PDA rules from the RNN is
nontrivial. We build a RNN specifically structured like a PDA, where weights
correspond directly to the PDA rules. This requires a stack architecture that
is somehow differentiable (to enable gradient-based learning) and stable (an
unstable stack will show deteriorating performance with longer strings). We
propose a stack architecture that is differentiable and that provably exhibits
orbital stability. Using this stack, we construct a neural network that
provably approximates a PDA for strings of arbitrary length. Moreover, our
model and method of proof can easily be generalized to other state machines,
such as a Turing Machine.Comment: 20 pages, 2 figure
Constructing Deterministic Finite-State Automata in Recurrent Neural Networks
Recurrent neural networks that are {\it trained} to behave like
deterministic finite-state automata (DFAs) can show deteriorating
performance when tested on long strings. This deteriorating performance
can be attributed to the instability of the internal representation of the
learned DFA states. The use of a sigmoidal discriminant function together
with the recurrent structure contribute to this instability. We prove
that a simple algorithm can {\it construct} second-order recurrent neural
networks with a sparse interconnection topology and sigmoidal discriminant
function such that the internal DFA state representations are stable, i.e.
the constructed network correctly classifies strings of {\it arbitrary
length}. The algorithm is based on encoding strengths of weights directly
into the neural network. We derive a relationship between the weight
strength and the number of DFA states for robust string classification.
For a DFA with states and input alphabet symbols, the constructive
algorithm generates a ``programmed" neural network with neurons and
weights. We compare our algorithm to other methods proposed in the
literature.
Revised in February 1996
(Also cross-referenced as UMIACS-TR-95-50
Symbolic and connectionist learning techniques for grammatical inference
This thesis is structured in four parts for a total of ten chapters. The first part, introduction and review (Chapters 1 to 4), presents an extensive state-of-the-art review of both symbolic and connectionist GI methods, that serves also to state most of the basic material needed to describe later the contributions of the thesis. These contributions constitute the contents of the rest of parts (Chapters 5 to 10). The second part, contributions on symbolic and connectionist techniques for regular grammatical inference (Chapters 5 to 7), describes the contributions related to the theory and methods for regular GI, which include other lateral subjects such as the representation oÃ. finite-state machines (FSMs) in recurrent neural networks (RNNs).The third part of the thesis, augmented regular expressions and their inductive inference, comprises Chapters 8 and 9. The augmented regular expressions (or AREs) are defined and proposed as a new representation for a subclass of CSLs that does not contain all the context-free languages but a large class of languages capable of describing patterns with symmetries and other (context-sensitive) structures of interest in pattern recognition problems.The fourth part of the thesis just includes Chapter 10: conclusions and future research. Chapter 10 summarizes the main results obtained and points out the lines of further research that should be followed both to deepen in some of the theoretical aspects raised and to facilitate the application of the developed GI tools to real-world problems in the area of computer vision
Combined optimization algorithms applied to pattern classification
Accurate classification by minimizing the error on test samples is the main
goal in pattern classification. Combinatorial optimization is a well-known
method for solving minimization problems, however, only a few examples of
classifiers axe described in the literature where combinatorial optimization is
used in pattern classification. Recently, there has been a growing interest
in combining classifiers and improving the consensus of results for a greater
accuracy. In the light of the "No Ree Lunch Theorems", we analyse the combination
of simulated annealing, a powerful combinatorial optimization method
that produces high quality results, with the classical perceptron algorithm.
This combination is called LSA machine. Our analysis aims at finding paradigms
for problem-dependent parameter settings that ensure high classifica,
tion results. Our computational experiments on a large number of benchmark
problems lead to results that either outperform or axe at least competitive to
results published in the literature. Apart from paxameter settings, our analysis
focuses on a difficult problem in computation theory, namely the network
complexity problem. The depth vs size problem of neural networks is one of
the hardest problems in theoretical computing, with very little progress over
the past decades. In order to investigate this problem, we introduce a new
recursive learning method for training hidden layers in constant depth circuits.
Our findings make contributions to a) the field of Machine Learning, as the
proposed method is applicable in training feedforward neural networks, and to
b) the field of circuit complexity by proposing an upper bound for the number
of hidden units sufficient to achieve a high classification rate. One of the major
findings of our research is that the size of the network can be bounded by
the input size of the problem and an approximate upper bound of 8 + √2n/n
threshold gates as being sufficient for a small error rate, where n := log/SL
and SL is the training set
Recommended from our members
The application of artificial neural networks to interpret acoustic emissions from submerged arc welding
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Automated fusion welding processes play a fundamental role in modern manufacturing industries. The proliferation of joint geometries together with the large permutation of associated process variable configurations has given rise to research into complex system modelling and control strategies. Many of these techniques have involved monitoring of not only the electrical characteristics of the process but visual and acoustic information. Acoustic information derived from certain welding processes is well documented as it is an established fact that skilled manual welders utilise such information as an aid to creating an optimum weld. The experimental investigation presented in this thesis is dedicated to the feasibility of monitoring airborne acoustic emissions of Submerged Arc Welding (SAW) for diagnostic and real time control purposes. The experimental method adopted for this research takes a cybernetic approach to data processing and interpretation in an attempt to replicate the robustness of human biological functions. A custom designed audio hardware system was used to analyse signals obtained from bead on mild steel plate fusion welds. Time and frequency domains were used in an attempt to establish salient characteristics or identify the signatures associated with changes of the process variables. The featured parameters were voltage / current and weld travel speed, due to their ease of validation. However, consideration has also been given to weld defect prediction due to process instabilities. As the data proved to be highly correlated and erratic when subjected to off line statistical analysis, extensive investigation was given to the application of artificial neural networks to signal processing and real time control scenarios. As a consequence, a dedicated neural based software system was developed, utilising supervised and unsupervised neural techniques to monitor the process. The research was aimed at proving the feasibility of monitoring the electrical process parameters and stability of the welding process in real time. It was shown to be possible, by the exploitation of artificial neural networks, to generate a number of monitoring parameters indicative of the welding process state. The limitations of the present neural method and proposed developments are discussed, together with an overview of applied neural network technology and its impact on artificial intelligence and robotic control. Further developments are considered together with recommendations for future areas of research
Mining a Small Medical Data Set by Integrating the Decision Tree and t-test
[[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]國外[[incitationindex]]EI[[booktype]]紙本[[countrycodes]]FI
State of the Art in Face Recognition
Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state
A novel approach to handwritten character recognition
A number of new techniques and approaches for off-line handwritten character recognition are presented which individually make significant advancements in the field.
First. an outline-based vectorization algorithm is described which gives improved accuracy in producing vector representations of the pen strokes used to draw characters. Later. Vectorization and other types of preprocessing are criticized and an approach to recognition is suggested which avoids separate preprocessing stages by incorporating them into later stages. Apart from the increased speed of this approach. it allows more effective alteration of the character images since more is known about them at the later stages. It also allows the possibility of alterations being corrected if they are initially detrimental to recognition.
A new feature measurement. the Radial Distance/Sector Area feature. is presented which is highly robust. tolerant to noise. distortion and style variation. and gives high accuracy results when used for training and testing in a statistical or neural classifier. A very powerful classifier is therefore obtained for recognizing correctly segmented characters. The segmentation task is explored in a simple system of integrated over-segmentation. Character classification and approximate dictionary checking. This can be extended to a full system for handprinted word recognition.
In addition to the advancements made by these methods. a powerful new approach to handwritten character recognition is proposed as a direction for future research. This proposal combines the ideas and techniques developed in this thesis in a hierarchical network of classifier modules to achieve context-sensitive. off-line recognition of handwritten text. A new type of "intelligent" feedback is used to direct the search to contextually sensible classifications. A powerful adaptive segmentation system is proposed which. when used as the bottom layer in the hierarchical network. allows initially incorrect segmentations to be adjusted according to the hypotheses of the higher level context modules