Neural architectures for database query processing, syntax analysis, knowledge representation, and inference by Chen, Chun-Hsien
Retrospective Theses and Dissertations Iowa State University Capstones, Theses andDissertations
1997
Neural architectures for database query processing,
syntax analysis, knowledge representation, and
inference
Chun-Hsien Chen
Iowa State University
Follow this and additional works at: https://lib.dr.iastate.edu/rtd
Part of the Artificial Intelligence and Robotics Commons
This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University
Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University
Digital Repository. For more information, please contact digirep@iastate.edu.
Recommended Citation
Chen, Chun-Hsien, "Neural architectures for database query processing, syntax analysis, knowledge representation, and inference "
(1997). Retrospective Theses and Dissertations. 11833.
https://lib.dr.iastate.edu/rtd/11833
INFORMATION TO USERS 
This manuscript has been reproduced from the micro&hn master. UMI 
films the text directly from the original or copy submitted. Thus, some 
thesis and dissertation copies are in ^pewriter face, while others may 
be from smy type of conq)uter printer. 
Hie qnality of this reproduction is dqiendent upon the qnali^ of the 
copy submitted. Broken or indistinct print, colored or poor quaii^ 
illustrations and photographs, print bleedthrough, substandard margipc;^ 
and in^oper alignment can adversefy affect rqmxhiction. 
In the unlikely event that the author did not send UMI a complete 
manuscript and there are missing pages, these will be noted. Also, if 
unauthorized copyright niiaterial had to be removed, a note win indicate 
the deletion. 
Oversize materials (e.g^ mapSy drawings, charts) are reproduced by 
sectioning the original, beginning at the upper left-hand comer and 
continuing from left to right in equal sections with small overlq)s. Each 
original is also photogr^hed in one exposure and is included in 
reduced form at the back of the book. 
Photogr^hs included in the original manuscript have been reproduced 
xerographically in this copy. Higher quality 6" x 9" black and white 
photogr^hic prints are available for any photographs or illustrations 
appearing in this copy for an additional cbarge. Contaa UMI directfy 
to order. 
A Bell & Howell Information Company 
300 North Zeeb Road. Ann Arbor. Ml 48106-1346 USA 
313/761-4700 800/521-0600 

Neural architectures for database query processing, syntax analysis, knowledge 
representation, and inference 
by 
Chun-Hsien Chen 
A dissertation submitted to the graduate faculty 
in partial fulfillment of the requirements for the degree of 
DOCTOR OF PHILOSOPHY 
Major: Computer Science 
Major Professor; Vasant Honavar 
Iowa State University 
Ames, Iowa 
1997 
Copyright © Chun-Hsien Chen, 1997. All rights reserved. 
UMI Nxunber: 9826594 
Copyright 1997 by-
Chen, Chun-Hsien 
All rights reserved. 
UMI Microform 9826594 
Copyright 1998, by UMI Company. All rights reserved. 
This microform edition is protected against unauthorized 
copying under Title 17, United States Code. 
UMI 
300 North Zeeb Road 
Ann Arbor, MI 48103 
ii 
Graduate College 
Iowa State University 
This is to certify that the Doctoral dissertation of 
Chun-Hsien Chen 
has met the dissertation requirements of Iowa State University 
Comm 
^ ^ mpittee Member 
Committee Member 
Committee Member 
Major Professor 
College 
Signature was redacted for privacy.
Signature was redacted for privacy.
Signature was redacted for privacy.
Signature was redacted for privacy.
Signature was redacted for privacy.
Signature was redacted for privacy.
Signature was redacted for privacy.
iii 
TABLE OF CONTENTS 
ACKNOWLEDGEMENTS x 
ABSTRACT xi 
1 INTRODUCTION 1 
1.1 Artificial Neural Networks 3 
1.1.1 Artificial neural units 3 
1.1.2 Activation functions 4 
1.1.3 Types of artificial neural networks and their computational capabilities 5 
1.1.4 Implementation of artificial neural networks 6 
1.2 A Brief Review of Artificial Neural Networks 6 
1.3 .A.n Overview of the Dissertation 9 
2 A NEURAL MEMORY FOR CONTENT AS WELL AS ADDRESS-
BASED STORAGE AND RECALL 13 
2.1 Introduction 13 
2.1.1 Information retrieval and binary mapping 14 
2.1.2 Associative memory (Content-addressed memory) 15 
2.1.3 Address-based memory 18 
2.1.4 Perceptrons 19 
2.2 Multi-layer Perceptrons as Neural Memories 19 
2.2.1 The application of linear separability of binary vertices in pattern clas­
sification 20 
2.2.2 Best match: pattern classification with precision control 23 
2.2.3 Storage capacity 24 
iv 
2.2.4 Synthesis of associative and address-based memories 24 
2.2.5 Exact match: binary mapping Perceptron (BMP) module 27 
2.2.6 Conversion between memory models using bipolar and binary inputs . . 28 
2.3 Properties of the Proposed Neural Associative Memory 32 
2.3.1 Partiai match: associative recall from a partially specified input 33 
2.3.2 Multiple associative recalls 35 
2.3.3 Fault tolerance 38 
2.4 Summary and Discussion 40 
3 NEURAL ARCHITECTURES FOR INFORMATION RETRIEVAL AND 
DATABASE QUERY PROCESSING 42 
3.1 Introduction 42 
3.1.1 Information retrieval in neural associative memories 44 
3.2 Query Processing Using Neural Associative Memories 45 
3.2.1 Realization of lexical access for a machine-readable lexicon using a neural 
associative memory 45 
3.2.2 Realization of a library query system using a neural associative memory 50 
3.2.3 The implementation of case insensitive pattern matching 52 
3.3 Comparison with Other Database Query Processing Techniques 53 
3.3.1 Performance of current electronic realization for neural networks .... 53 
3.3.2 Analysis of query processing in conventional computer systems 54 
3.4 Summary and Discussion 58 
4 NEURAL ARCHITECTURES FOR ELEMENTARY LOGICAL INFER­
ENCE 59 
4.1 Introduction 59 
4.2 Neural Assemblies for the Recognition of Partial Patterns 60 
4.2.1 A neural assembly for inclusive pattern recognition 61 
4.2.2 A neural assembly for exclusive pattern recognition 62 
4.3 A Neural Assembly for Executing a Logical AND (AND Neural Assembly) .... 64 
V 
4.4 Neural Assemblies for Executing Logic ORs (OR Neural Assemblies) 67 
4.4.1 A general OR neural assembly 67 
4.4.2 A monotone OR neural assembly 69 
4.5 A Neureil Architecture for Realizing DNF Boolean Functions 70 
4.6 Summary and Discussion 71 
5 NEURAL ARCHITECTURES FOR SEQUENCE PROCESSING .... 73 
5.1 Introduction 73 
5.1.1 Symbolic functions and binary mappings 74 
5.2 Neural Network Design for Deterministic Finite Automata (NN DFA) 76 
5.2.1 Deterministic finite automata (DFA) 76 
5.2.2 Architecture of NN DFA 77 
5.3 Neural Network Design for Deterministic Pushdown Automata (NN DPDA) . . 80 
5.3.1 Deterministic pushdown automata (DPDA) 80 
5.3.2 Architecture of NN DPDA 81 
5.4 Neural Network Design for Stack (NN Stack) 84 
5.4.1 Symbolic representation of stack 84 
5.4.2 Architecture of NN Stack 85 
5.4.3 NN Stack in action 89 
5.5 Neural Network Design for Nondeterministic Finite Automata (NN NFA) ... 90 
5.5.1 Nondeterministic finite automata (NFA) 91 
5.5.2 Model for concurrently tracking all the possible nondeterministic moves 
in the operation of an NFA using RNN 92 
5.5.3 Architecture of NN NFA 95 
5.5.4 Proof of correctness 103 
5.5.5 NN NFA in Action 105 
5.6 Summary and Discussion 106 
6 NEURAL ARCHITECTURES FOR SYNTAX ANALYSIS 108 
6.1 Introduction 108 
vi 
6.1.1 Review of related research on neural architectures for syntax analysis . 109 
6.2 Neural Network Design for a Lexical Analyzer (NNLexAn) 112 
6.2.1 Neural network design for a word segmenter (NNSeg) 113 
6.2.2 Neural network design for a word lookup table (NNLTab) 115 
6.3 A Modular Neural Architecture for LR Parser (NNLR Parser) 116 
6.3.1 Representation of parse table 116 
6.3.2 Representation of parsing moves and parse trees 120 
6.3.3 Architecture of NNLR parser 121 
6.3.4 NNLR Parser in action 124 
6.4 Performance Analysis 128 
6.4.1 Performance analysis of lexical analyzer 129 
6.4.2 Performance analysis of LR parser 131 
6.5 Summary and Discussion 133 
7 CONCLUSION 137 
APPENDIX. ACRONYMS 138 
BIBLIOGRAPHY 140 
vii 
LIST OF FIGURES 
Figure 1.1 A typical computing unit of an ANN 4 
Figure 2.1 Examples of memory pattern, noisy patterns, and partial pattern ... 17 
Figure 2.2 The spatial distribution of a 3-dimensional and an n-dimensional binary 
hypercubes 22 
Figure 2.3 The setting of connection weights and hidden node threshold in the 
proposed neural memory (a 2-layer Perceptron with binary input) for 
a given associated memory pair 26 
Figure 2.4 The settings of connection weights and hidden node threshold in the 
proposed BMP module for an associated binary mapping ordered pair 28 
Figure 2.5 The setting of connection weights and hidden node threshold in the 
proposed neural memory (a 2-layer Perceptron with bipolar input) for 
a given associated memory pair 32 
Figure 3.1 A modular design of the proposed ANN memory for easy expansion. 
This 1-dimensional array structure can be easily extended to 2 or 5-
dimensional array structures 50 
Figure 4.1 A 1-layer Perceptron which recognizes all the 5-dimensional binary pat­
terns that contain the partial pattern <1,?,0,?,1>, where ? denotes 
don't care 63 
Figure 4.2 A 1-layer Perceptron which recognizes all the 5-dimensional binary pat­
terns that don't contain the partial pattern <I,?,0,?,1>, where ? de­
notes don't care 65 
viii 
Figure 4.3 An AND neural assembly which realizes the logical AND function C(v) 66 
Figure 4.4 An OR neural assembly which realizes the logical OR function D(v) . . . 69 
Figure 4.5 An neural architecture which realizes the DNF Boolean function E(v) . 72 
Figure 5.1 The proposed modular neural network architecture for DFA 79 
Figure 5.2 The proposed modular neural network architecture for DPDA 83 
Figure 5.3 The proposed neural network architecture for stack mechanism .... 86 
Figure 5.4 The state diagram of an NFA that accepts any input string containing 
the sub-string abaa 93 
Figure 5.5 The state diagram of a DFA that accepts any input string containing 
the sub-string abaa 93 
Figure 5.6 The proposed recurrent neural network architecture for concurrently 
tracking all the nondeterministic computations of a given NFA 96 
Figure 6.1 The simplified state diagram of a DFA which recognizes keywords: 
begin, end, if, then, and else 113 
Figure 6.2 The state diagram of a DFA which simulates a simple word segmenter 
carving continuous input stream of charjicters into words including in­
teger constants, keywords and identifiers 114 
Figure 6.3 The proposed neural network architecture for LR(1) parser 119 
Figure 6.4 The state diagram of the DFA A/£,, for the lexical analyzer Li 127 
ix 
LIST OF TABLES 
Table 2,1 The corresponding maximal storage capacity of a l-layer Perceptron 
with 100 input neurons for classifying binary patterns for a range of 
allowable noise levels 25 
Table 3.1 A comparison of the estimated performance of the proposed neural as­
sociative memory with that of other techniques commonly used in con­
ventional computer systems for locating a record pointer in key-based 
organizations 56 
Table 3.2 A comparison of the capabilities of the proposed neural associative 
memory with those of other techniques commonly used in conventional 
computer systems for exact match and partial match 56 
Table 6.1 The parse table of the LR(1) parser for grammar Gi 125 
Table 6.2 The transition function (Jj[„ of the DFA 127 
Table 6.3 Moves of the LR(1) parser for grammar Gi on input string Ixl-I-I . . . 129 
Table 6.4 A comparison of the estimated performance of the proposed NNLR 
Parser with that of conventional computer systems for syntax analysis 133 
X 
ACKNOWLEDGEMENTS 
I would like to express my sincere appreciation to Dr. Vasant Honavar for his support, 
assistance, and guidance, as well as challenging and inspiring criticism in the preparation of 
this dissertation and in my studies at Iowa State University. I would also like to thank my 
Graduate Committee: Dr. Tom Barta, Dr. Julie Dickerson, Dr. Les Miller, and Dr. Johnny 
Wong for their helpful advice, comments, and suggestions. 
I am deeply indebted to the Extended and Continuing Education Office, Iowa State Uni­
versity, for the warm friendship of the staff, the valuable work experience, and the timely 
financial support without which my study for a Ph.D. would not be possible. I am grateful 
to Russell Meier and the Artificial Intelligence Research Group: Karthik Balakrishnan, Armin 
Mikler, Rajesh Parekh and Jihoon Yang for their friendship and inspiring discussion. 1 am 
also grateful to my friends in Ames, Iowa, for keeping me company at my leisure time and for 
nourishing me with friendship. 
Also. I would like to express my deepest gratitude to my parents for their everlasting 
support and love. 
xi 
ABSTRACT 
Artificial neural networks (ANN), due to their inherent parallelism, potential for fault 
tolerance, and adaptation through learning, offer an attractive computational paradigm for a 
variety of applications in computer science and engineering, artificial intelligence, robotics, and 
cognitive modeling. Despite the success in the application of ANN to a broad range of numeric 
tasks in pattern classification, control, function approximation, and system identification, the 
integration of ANN and symbolic computing is only beginning to be explored. This disserta­
tion explores to integrate ANN and some essential symbolic computations for content-based 
associative symbolic processing. This offers an opportunity to explore the potential benefits of 
ANN's inherent parallelism in the design of high performance computing systems for real time 
content-based symbolic processing applications. We develop methods to systematically design 
massively parallel architectures for pattern-directed symbol processing using neural associative 
memories as key components. In particular, we propose neural architectures for content-based 
as well as address-based data storage and recall, information retrieval and database query 
processing, elementary logical inference, sequence processing, and syntax analysis. Their po­
tential advantages over conventional serial computer implementations of the same functions 
are examined in the dissertation. 
1 
1 INTRODUCTION 
The goal of artificial intelligence (AI), broadly interpreted, is to understand and engineer 
intelligent systems. It is often suggested that traditionally serial symbol processing systems of 
AI and inherently massively parallel artificial neural networks (ANN) offer two radically, per­
haps even irreconcilably different paradigms for modelling minds and brains — both artificial 
as well as natural [130, 160]. AI has been successful in applications such as theorem proving, 
knowledge-based expert systems, mathematical reasoning, syntax analysis, and related appli­
cations which mainly involve systematic symbol manipulation. On the other hand, ANN have 
been particularly successful in applications such as pattern recognition, function approxima­
tion, and nonlinear control [60, 150] which involve primarily numeric computation. Meyerowitz 
has suggested that the design of neural architectures capable of supporting dynamic represen­
tations for symbol manipulation is one of the grand challenges of neural network research [113]. 
As shown by Church, Kleene, McCulloch, Post, Turing, and others through their work on the 
theory of Computation [117. 100], both AI and .A.NN represent particular realizations of a 
universal (Turing-equivalent) model of computation [185]. Thus, despite assertions by some 
to the contrary, any task that can be realized by one can, in principle, be accomplished by 
the other. However, most AI systems have been traxiitionally programmed in languages that 
were influenced by Von Neumann's design of a serial stored program computer. ANN systems 
on the other hand, have been inspired by (albeit overly simplified) models of biological neural 
networks. They represent different commitments regarding the architecture and the primitive 
building blocks used to implement the necessary computations. Thus they occupy different 
regions characterized by possibly different cost-performance traxdeoffs in a much larger space of 
potentially interesting designs for intelligent systems. Recently, several researchers have begun 
2 
to explore previously unexplored parts of this design space. 
Given the reliance of both traditional AI and ANN on essentially equivalent formal models 
of computation, a central issue in design and analysis of intelligent systems has to do with 
the identification and implementation, under a variety of design, cost, and performance con­
straints, of a suitable subset of Turing-computable functions that adequately model the desired 
behaviors. Today's AI and ANN systems each demonstrate at least one way of performing a 
certain task (e.g., logical inference, pattern recognition, syntax analysis) naturally and thus 
pose the interesting problem for the other of doing the same task, perhaps more elegantly, 
efficiently, robustly, or cost-effectively than the other. In this context, it is beneficial to crit­
ically examine the often implicit and unstated assumptions on which current AI and ANN 
systems are based and to identify alternative (and potentially better) approaches to design­
ing such systems. Massively parallel symbol processing architectures for AI systems or highly 
structured (as opposed to homogeneous, fully connected) ANN are just two examples of a 
wide range of approaches to designing intelligent systems [185, 72, 73]. Of particular interest 
are alternative designs (including synergistic hybrids of ANN and AI designs) for intelligent 
systems [47, 65, 70, 72, 73, 99, 136, 172, 179, 185]. Examples of such systems include: neu­
ral architectures for information retrieval and database query processing [23, 24], generation 
of context-free languages [187], rule-based inference [5, 31. 141, 167, 176], computer vision 
[11. 119]. natural language processing [14, 32], learning [46, 69, 168], and knowledge-based 
systems [94, 145]. We strongly believe that a judicious and systematic exploration of the de­
sign space of such systems is essential for understanding the nature of key cost-performance 
tradeoffs in the synthesis of intelligent systems. 
This dissertation explores to integrate ANN and some essential symbolic computations 
for content-based associative symbolic processing. This offers an opportunity to explore the 
potential benefits of ANN's inherent parallelism in the design of high performance computing 
systems for real time content-based symbolic processing applications. 
3 
1.1 Artificial Neural Networks 
ANN are biologically inspired by the neural systems of human brain which are massively 
parallel interconnected networks of hierarchically organized nerve cells (neurons). ANN are 
extremely simplified models of biological neuraJ systems in many aspects such as the structure 
of basic computational units, the mechanism for information processing, network architecture, 
etc. Compared to most current digitai computer systems, ANN are particularly well-suited for 
pattern-directed problems - pattern completion, pattern classification and pattern association 
[29] which arise frequently in applications such as language processing, speech recognition, and 
pattern recognition. 
It is worth mentioning that the primary goal of ANN research (unlike neural modelling 
or computational neuroscience research) is not to discover a computational model for the 
detailed processes of human brain but to technologically pursue a computing paradigm which 
can effectively realize and efficiently perform high-level intelligent processes. 
1.1.1 Artificial neural units 
A typical computing unit (node) in an ANN has n input and m output connections, each of 
which has an associated weight. The node computes the weighted sum on the inputs, compares 
the sum to its node threshold, and produces its output based on an activation function. 
commonly used activation function is threshold function. The resulting output is sent along 
the output connections to other nodes. The output of such a node used in this dissertation is 
defined by 
where i, is the value of input i, Wi is the associated weight on input connection z, 6 is the 
node threshold, y is the output value, and / is the activation function. Figure 1.1 shows such 
a node. 
n 
(1.1)  
1=1 
X, X, 
Figure 1.1 A typical computing unit of an ANN 
1.1.2 Activation functions 
The types of activation functions used by an ANN affect its expressiveness, computational 
capabilities, and performance. Several typical activation functions are linear, binary sigmoidal, 
bipolar sigmoidal, binary hardlimiter, bipolar hardlimiter, gaussian, and ramp [56, 102] defined 
as follows: 
linear: /l(s) = cs, where s = Ya-i WiXi - 0 and c> 0 
binary sigmoidal: /s(s) = 7:^7=^7, where s = t«,i, - 6 and c> 0 
bipolar sigmoidal: fs{s) = where s = u?,x, - 6 and c > 0 
binary hardlimiter: 
0 otherwise 
1 if s > 0 
, where s = X2"=i ~ ^ 
+1 if s > 0 
bipolar hardlimiter: /H(S) = < , where s = ^ "=1 — 9 
— 1 otherwise 
5 
gaussian: /G(S) = EI^, where s = Yli=oi'"^i - ^ i) 
ramp: 
+1 if cs > +1 
fnis) = < cs if (cs| < 1 > where s = to.-z, - 9 and c> 0 
— 1 if cs < -1 
In above equations, c is activation gain. Note that most of the activation functions produce 
output in the range of [0,1] for binary signals, and [—1,1] for bipolar signals. 
This dissertation uses two types of threshold functions: binary hardlimiter and bipolar 
hardlimiter. Their simplicity allows simple and efficient hardware implementation of such 
threshold functions. 
1.1.3 Types of artificial neural networks and their computational capabilities 
ANN can be mainly classified into three basic categories: feedforward networks, feedback 
networks and recurrent networks [28, 56, 131] according to their architectures, functionalities, 
and signal propagation direction of their connections. The output of a feedforward network is 
a function of current input, and its connections are unidirectional. The output of a feedback 
network is a function of current input (and past inputs in some cases), and its connections 
are not necessarily unidirectional. The output of a recurrent network is a function of current 
and past inputs, and its connections are unidirectional. Architecturally, a recurrent network is 
a feedforward networks with recurrent connections, but it is a feedback network functionally. 
Since the output of both feedback networks and recurrent networks can be a function of past 
inputs, and thus they are suitable for sequence processing. Mathematically, the computing 
of feedforward networks approximates a function mapping, and that of feedback networks 
approximates finite state machines, pushdown automata, or Turing machines. 
Typically, a feedforward network and a recurrent network has a layer of input neurons to 
receive input, a layer of output neurons to produce output, and often layers of hidden neurons 
to extend the computing capability of the network. Usually, the neurons of a feedback net­
6 
work are classified into input, hidden, and output neurons functionally but not architecturaJly. 
Perceptrons [155] and multi-layer Perceptron [156] are two examples of feedforward networks. 
Elman network [33] and Jordan network [81] are two examples of recurrent networks. Hopfield 
networks [75] and BAM [90] two examples of feedback networks. 
1.1.4 Implementation of artificial neural networks 
Due to the computations required by enormous neural nodes to calculate their thresholded 
activation and weighted sum on the inputs from their associated input connections in an 
ANN, ANN systems generally require more intensive computational power but simpler types 
of computations than current computer systems do. There are many technologies available 
for implementing ANN, mainly including software simulation which is the most widely used 
due to the fact that digital computer systems are highly available for writing and testing the 
simulation programs, electronic hardware (digital VLSI, analog VLSI, hybrid of digital and 
analog VLSI, etc.) realization which potentially owns both the benefits of high performance and 
cost-effectiveness currently due to the fact that VLSI provides relatively high performance and 
is extensively used in current computer systems, optical computing which potentially has the 
highest performance because it computes at the speed of light, and biological implementation 
which is biologically closer to biological nervous systems. 
1.2 A Brief Review of ArtificieJ Neural Networks 
Since the resurgence of research on ANN in 1980s, ANN have attracted much interest of 
many researchers from various science and engineering disciplines, which is shown by the ex­
plosive amount of applications and published technical papers on ANN in 1980s and 1990s. It 
is beyond the scope of this dissertation to review in detail the rich literature in every research 
area of ANN. Instead, this section will only briefly review up to late 1980s several representa­
tive concepts and landmarks on the common research ground of ANN. Reviewed in much more 
detail in the related chapters is the literature specific to the research topics covered by this 
7 
dissertation which explores methods for systematically designing neural architectures for asso­
ciative memories, database query processing, elementary logical inference, sequence processing, 
and syntax analysis. The reference book [189] which provides more than 4000 references is a 
good source of research material for facilitating a general and in-depth understanding of ANN 
research. 
Two of ANN research problem domains for which few conventional computing solutions 
exist are 
• associative memories which are anticipated to provide the same advantageous capability 
as human memory does and are currently mainly used in the applications of pattern 
classification based on their capability for best match and partial match, and 
• learning which is anticipated to be used as an efficient and cost-effective alternative to 
knowledge engineering for automated knowledge acquisition without intensive program­
ming. 
Following brief review on ANN literature mainly proceeds along these two intermingled 
themes which have driven the development of new ANN architectures, models, and algorithms 
for information processing. The development of formal mathematical models for ANN can 
be traced back to the early 1940s in the work by McCulIoch and Pitts [108], which showed 
that any logical proposition can be represented by a network of interconnected neurons of two 
states if enough neurons are provided. The computational capability of McCulloch-Pitts neural 
networks was proved to be equivalent to Turing machines [183] which are the essential model 
of symbolic computation and can perform any computation that can be described by a finite 
program in any general purpose language [26]. 
In 1949, Hebb proposed the first learning rule for neurons [63]. In the late 1950s and 
early 1960s, Rosenblatt introduced a class of neural networks [155], called perceptrons, which 
can learn to classify patterns through supervised learning. Rosenblatt's work helped produce a 
large amount of research activities in this early ANN research era. In 1969, Minsky and Papert 
showed in their landmark book Perceptrons that the computational power of perceptron's 
single-layer learning algorithm is only able to solve linearly separable problem but not a large 
8 
class of other problems [118]. With the misinterpretation of such a result, research funding and 
interest in ANN drastically dropped in the following 1970s. In the dark ages of the 1970s, the 
dedicated and everlasting efforts of Amari [7, 8, 9], Anderson [10], Fukushima [39], Grossberg 
[51, 52, 53], Kohonen [84, 85], and many other researchers ultimately brought in the renascence 
of ANN in 1980s. 
The tremendous resurgence of ANN research interest in 1980s was mainly due to the in­
vention of Hopfield networks [75] which can serve as content-addressable memory or solve 
combinatorial optimization problems [76], and the introduction of Backpropagation learning 
algorithm [156] which overcomes the limitation of perceptron's single-layer learning algorithm 
in linearly separable problems and can be exploited to train multi-layer perceptron to solve 
nonlinearly separable problems. Since then, Backpropagation multi-layer perceptron has been 
successfully applied in a variety of applications and has became the most widely used neural 
network paradigm. The Backpropagation learning algorithm were independently derived by 
Werbos [193], Parker [138, 139], and LeCun [98], but its popularity was mainly due to the 
effort of Rumelhart, McClelland, and the PDP Group. Other representative ANN models in 
the bright 1980s, to name a few, include Hinton, Sejnowski, and Ackley's Boltzmann machine 
model [1, 67] which can be used to find the global optimum solution for a given problem; 
Kohonen's Self-Organizing Feature Map [86] which can be trained without supervision to find 
the organization of relationships among training patterns; Kosko's BAM [89, 90] which can 
serve as hetero-associative memory and temporal associative memory; Carpenter and Gross-
berg's ART networks [16, 17, 18] which can be typically used to cluster training patterns via 
unsupervised training; Radial Basis Function method [114, 146, 147] which was originally used 
for function interpolation and was also applied to other applications [128, 149]; Hecht-Nielsen's 
Counterpropagation network [64] which has both supervised as well as unsupervised training 
stages and can be trained to perform pattern mapping, data compression and associative re­
call; Fukushima, Miyake, and Ito's Neocognitron [40, 41] which can be trained with supervision 
to recognize handwritten characters; and recurrent neural networks [33, 81, 144] which allow 
recursive processing on input string of variable length. A more detailed taxonomy of most 
9 
neural network architectures and learning algorithms can be found in [56, 102, 112]. 
1.3 An Overview of the Dissertation 
Artificial neural networks, due to their inherent parallelism, potential for fault tolerance, 
and adaptation through learning, offer an attractive computational paradigm for a variety 
of applications in computer science and engineering, artificial intelligence, robotics, and cog­
nitive modeling. Despite the success in the application of ANN to a broad range of nu­
meric tasks in pattern classification, control, function approximation, and system identifi­
cation, the integration of ANN and symbolic computing is only beginning to be explored 
[22, 23, 47, 70, 72, 73, 99, 145, 172, 179, 185] and is currently viewed as one of important 
research goals in massively parallel computing and artificial intelligence [65]. 
Pattern-directed associative processing relies on associative pattern matching and retrieval, 
is central to many problem solving paradigms in AI (e.g., knowledge based expert systems, 
case based reasoning) as well as computer science (e.g., database query processing, information 
retrieval) [54, 97, 181], and dominates the computational requirements of many applications 
in AI and computer science [55. 97, 127]. This dissertation proposes methods to system­
atically design massively parallel architectures for pattern-directed symbol processing using 
neural associative memories as key components. In particular, we propose neural architectures 
for content-based as well as address-based data storage and recall, database query processing, 
elementary logical inference, sequence processing, and syntax analysis. Their potential advan­
tages over conventional serial computer implementations of the same functions are examined 
in the dissertation. 
Chapter 2 proposes an approach for the design of a neural memory which supports both 
content-based (associative) and address-based data storage and retrieval. The proposed neu­
ral associative memory allows efficient access of stored data by way of massively parallel best 
match, partial match and exact match. When used as a content-addressed memory, the pro­
posed neural memory supports recall from partial input patterns, (sequential) multiple recalls, 
fault-tolerance, precision control and sorted extraction of all stored memory patterns. When 
10 
used as an address-based memory, the memory module can provide working space for dynamic 
representations for symbol processing and shared message-passing among neural network mod­
ules within an integrated neural network system. It also provides for real-time update of 
memory contents by one-shot learning without interference with other stored patterns. 
The pattern matching and retrieval process in the proposed neural associative memory 
which provides massive communication bandwidth and processing units respectively via its 
massive connections and nodes to match a given pattern with all stored patterns in parallel 
within one step can be far more efficient (in terms of computation time) than that in a key-based 
organization of the sort used in conventional computer systems. Chapter 3 takes advantage 
of this fact to explore the potential benefits of the proposed neural associative memory in 
the implementation of efficient, noise-tolerant information retrieval and query module in large 
database systems. 
Since most of current digital computer systems store data using address-based memories 
which are accessed via shared buses, the retrieval of a desired data item satisfying certain crite­
ria (patterns) from a set of candidate data items stored in the memories is inherently sequential 
and requires certain data organization, which is manipulated and interpreted by a relatively 
complex program(s), to provide appropriate performance. Although parallel pattern matching 
can be achieved by current digital computer systems when the systems are provided with mul­
tiple processors and memory buses, it would not be cost effective to dedicate such systems to 
applications which mainly involve intensive pattern matching. The proposed neural associative 
memory is a cost effective SIMD computer dedicated to pattern association. Therefore, such 
SIMD capability of the proposed neural associative memory is further explored for relational 
database queries. The potential merits of ANN's inherent parallelism and noise-tolerance for 
database query processing are demonstrated by comparing the estimated performance of the 
proposed neural architecture with that of other techniques commonly used in conventional 
computer systems for database query processing. 
Chapter 4 explores how neural architectures for binary pattern recognition can be extended 
for elementary logical inference. The proposed neural assemblies for propositional logic are 
11 
based on geometrical/mathematical analysis. Logical operations such as AND and OR are realized 
by neural assemblies for the recognition of binary subpatterns. It is known that any proposition 
(or equivalently a Boolean function) can be represented in DNF, and hence can be realized 
by a 2-layer neural architecture assembled using the proposed AMD and OR neural assemblies. 
Since logical AND, logical OR, as well as DNF representation are essential to logical inference 
and Boolean functions are basic to many applications in science and engineering, we expect the 
proposed neural assemblies would find use in the construction of modular neural networks for 
a variety of applications. For instance. Chapter 5 illustrates their use in an neural architecture 
for sequence processing. 
Chapter 5 proposes methods for systematic design of neural architectures for sequence 
processing, which are used as building blocks to systematically assemble neural architectures 
for syntax analysis in Chapter 6. Basically, memories and sequence processing mechanisms 
(with flow control capability) compose current digital computer systems which are driven by 
sequences of binary codes which are translated from sequences of symbolic program repre­
sentation that humans can efficiently and effectively read, write, and reason on. Therefore, a 
computing system integrated from the proposed neural architectures for memories and sequence 
processing is expected to possess computation capability corresponding to that of current dig­
ital computer systems. 
Chapter 6 explores the advantages of ANN's inherent parallelism and associative processing 
capability in the design of modular neural architectures for syntax analysis using a pre-specified 
grammar — a prototypical symbol processing task. A more general goal of this chapter is to 
explore the systematical design of massively parallel architectures for symbol processing using 
the neural associative memory proposed in Chapter 2 and the neural architectures for sequence 
processing proposed in Chapter 5 as key components. 
Since each component in the proposed neural architectures for syntax analysis computes 
a well-defined symbolic function, it facilitates the systematic synthesis as well as analysis 
of the resulting symbolic computation at a fairly abstract (symbolic) level. This facilitates 
rapid design and test of other provably correct prototypes of modular neural architectures for 
12 
complex symbolic processing using simpler building blocks by way of recursion, composition of 
elementary symbolic functions, and data representation manipulated by them. The elementary 
symbolic functions are represented in terms of binary mappings which are realized provably 
correctly by basic neural modules using one-shot learning. 
Chapter 6 concludes with a summary of the key contributions of this dissertation. 
13 
2 A NEURAL MEMORY FOR CONTENT AS WELL AS 
ADDRESS-BASED STORAGE AND RECALL 
2.1 Introduction 
This chapter presents an approach to design of a neural architecture for both associative 
(content-addressed) and address-based memories. Several interesting properties of the memo­
ries are mathematically analyzed in detail such that it is known that by systematically adjusting 
the node thresholds and connection weights, the same proposed neural architecture can serve as 
memories with precision control to perform best match, exact match and partial match which 
are main knowledge retrieval techniques extensively used in numerous artificial intelligence sys­
tems [191]. When used as an associative memory, the proposed neural architecture supports 
recall from partial input patterns, (sequential) multiple recalls and fault tolerance. When used 
as an address-based memory, the memory can provide working spax:e for dynamic representa­
tions for symbol processing and shared message-passing among neural network modules within 
an integrated neural network system. It also provides for real-time update of memory contents 
by one-shot learning without interference with other stored patterns. 
It is generally agreed that artificial neural networks (ANN) have demonstrated success in 
/ou;-/et;e/perceptual tasks (e.g., signal processing, pattern recognition) [62, 93, 111, 113]. How­
ever, despite their generality (as computational models) and despite the potential advantages 
of using them as components in general-purpose artificial intelligence systems which usually in­
volve content-based or memory-based knowledge storing and retrieving [47, 70, 72, 73, 99, 173, 
179], detailed design and performance tradeoffs in integrated systems of this sort are yet to be 
fully understood and working prototypes of such systems are only beginning to be developed. 
Towards this end, an innovative design and careful analysis of neural associative memories with 
14 
emphasis on problems and prospects of integrating them into larger systems that combine the 
advantages of both traditionai symbol processing and neural network approaches to artificial 
intelligence is needed. 
A particular class of neural memories built from threshold logic units (Perceptrons or 
McCulloch-Pitts neurons) is explored from a geometrical/mathematical perspective in this 
chapter. This analysis provides mathematical foundations for understanding several interesting 
properties of such memories including; auto as well as hetero-associative recall from partially 
specified patterns, (sequential) sorted recall of multiple stored patterns with different degrees of 
match with an input pattern, incremental learning, fault tolerance, and address-based storage 
and recall (mimicking the behavior of memories used in conventional digital computers). The 
mathematical analysis also suggests efficient hardware realizations of such memories. This 
chapter is organized as follows: 
• Section 2.1 reviews associative memory, address-based memory, and key properties of 
multi-layer Perceptrons which form the basis of the proposed neural memories. 
• Section 2.2 develops the theoretical foundations and examines the storage capacity of the 
proposed binary/bipolar neural memories through an investigation of the spatial distri­
bution and linear separability of vertices in binary/bipolar hypercubes from a geometric 
perspective. 
• Section 2.3 explores several interesting properties of the proposed memory modules in­
cluding: recall from partially specified input patterns, (sequential) multiple recalls, and 
fault tolerance by examining and extending the physical meanings of the settings of 
connection weights and neuron thresholds in the proposed neural memories. 
• Section 2.4 concludes with a summary of the chapter and a brief discussion of related 
researches. 
2.1.1 Information retrieval and binary mapping 
In general, most classification and information retrieval problems using discrete input/output 
values can be viewed in terms of a binary random mapping //, where // is rigidly defined from 
15 
a set U  o f  k  distinct binary input vectors ui, Uk of dimension n to a set V o i  k  binary output 
vectors vi, Vk of dimension m such that fj :U V and 
/f(«t) = Vi for 1 < i < k (2.1) 
Note that // is a partial function. 
2.1.2 Associative memory (Content-addressed memory) 
Since the resurgence of ANN in 1980s, ANN have been applied in many science and engi­
neering disciplines. This is shown by the explosive growth in the number of published technical 
papers on ANN in 1980s and 1990s. In particular, neural architectures for associative memo­
ries have been the subject of considerable research, because of their potential applications in 
several areas of artificial intelligence, computer science, and cognitive modelling. 
The term associative memory (AM) or content-addressed memory refers to a memory sys­
tem where recall of a stored pattern is accomplished by providing a noisy or partially specified 
input pattern. Examples of such memory models include Hopfield networks [75], correlation 
matrix memories [84], bidirectional associative memories [90], among others [9, 59, 91, 125]. A 
precise definition of binary/bipolar associative memories follows: 
Let D f } ( u , u ' )  denote the Hamming distance between binary (bipolar) vectors u  and u  . 
Hamming distance is the number of bits that differ between two binary (bipolar) vectors. 
Suppose we are given a set U of k binary input vectors Ui, ..., Uk of dimension n and a set V 
of k desired binary output vectors Vi, ..., Vk of dimension m. Then the task is to design an 
associative memory that can store each of the input-output pattern pairs. 
In many applications, it is useful to be able to control the degree of mismatch that is 
tolerated during information retrieval. This is accomplished by introducing the concept of 
precision control in associative memory as follows: Define U^ipi) = {u\u 6 B " & Dh{u, Ui) < 
Pi}' 1 < I S is the set of n-dimensional binary vectors which have Hamming 
distance less than or equal to p, away from the given n-dimensional binary vector u,, where 
B" is the universe of n-dimensional binary vectors, and p,- is called allowable precision level 
and is an adjustable integer parameter. 
16 
Information retrieval in a binary associative memory can be specified in terms of a binary 
associative mapping fA'-U^-^Vas follows: 
/4(x) = u.- if X 6 Ur^ipi), l<i<k (2.2) 
where = U*_iC/,"(p,) = C/i"(pi)UC/2 (P2) • • -^U^iPn) and conventionally = 0 
for i  ^  j ,  I  <  t , j  <  k -  For example, if such a memory is used to store and recall uppercase 
English characters, then U = V and u,- = u,-, 1 < i < 26. Suppose the allowable precision 
levels (i.e., all of the p,s) are set equal to (Hamming distance) 4. Then in Figure 2.1, the noisy 
input patterns 1 and 2 would result in the recall of the stored memory pattern T. Multiple 
recalls are possible in the proposed neural memory when 7^ j such that Uf^ipi) H U^{pj) ^ 0 
in which case Ja is a one-to-many mapping. Most conventional associative memory models 
seldom tackle the problem of multiple recalls. 
Note that // C Ja if functions // (expression 2.1) and Ja are viewed as sets of input-output 
ordered pairs of the functions // and Ja respectively. That is, 
/ /  =  { { x j l { x ) ) \ x  6 U }  
f A  =  { [ x j A { x ) ) \ x € U ^ }  
The partial function /a may be extended to a full function /^ : B" -> (V U {< 0"* >}) for 
binary associative (information retrieval) memory as follows: 
f A ( x )  = fA(x) if I e f/" (2.3) 
<0'"> ifi € (B'^-C/") 
where <0'"> is the m-dimensional binary vector of all zeros and denotes a value which is 
undefined. 
Content-addressed memories can be divided into two categories: auto-associative memories 
(used primarily for reconstructing a pattern from a noisy or partially specified pattern) and 
hetero-associative memories which can be used to store associated pattern pairs so that when an 
input pattern is provided, the associated pattern is retrieved. The types of pattern associations 
that can be stored in neural associative memories depend on various factors such as: the choice 
of neural network architecture, the choice of activation functions computed by the neurons. 
17 
noisy pattern I noisy pattern 2 
HD = 2 HD = 4 
TTTTTTTT 
memory pattern 
Hi HD = 6 half pattern 
noisy pattern 3 partial pattern 
Figure 2.1 Examples of memory pattern, noisy patterns, and partial pat­
tern 
and the algorithm used to set up the parameters (thresholds and weights) associated with 
the neurons and connections. Thus, a linear associative memory with n input neurons can 
store and recall perfectly at most n pattern associations. Similar storage capacity results are 
known for several content-addressed memory models such as the Hopfield network [75, 109], 
bi-directional associative memories [90], correlation matrix memories [84], etc. A variety of 
associative memory models are discussed in [62, 93]. As already pointed out, many simple 
content-addressed memory models studied in the literature are incapable of stable storage 
and recall of associations between arbitrary pairs of patterns (except under certain restricted 
circumstances). In such models, whether a pattern can be associated with another critically 
depends on how the two patterns are coded as bit vectors as well as on all the other pattern 
associations that have already been stored in memory. The ability to reliably store and recall 
associations between arbitrary patterns is regarded by many to be a prerequisite for higher 
level cognitive activity (e.g., logical inference) [35]. The associative memory model proposed 
in this chapter is designed to reliably store and recall associations between arbitrary pairs of 
patterns. 
18 
2.1.3 Address-based memory 
Address-based memory is extensively used for storing both data as well as programs in 
current computer systems. In cognitive models and artificial intelligence programs based on 
Von Neumann model of computation, i.e., models within the so-called symbolic paradigm 
[126], axldress-based memory often serves as the working memory (or scratch-pad) for storing 
intermediate results during the execution of a program. On the surface, storage and recall of 
patterns using axldresses appear to be very different in spirit from the recall of patterns based 
on their content (as judged by its similarity to a stored pattern). Indeed, many authors have 
suggested this to be a primary difference between neural networks (or connectionist models) 
and traditional artificial intelligence systems. However, this perceived difference is rather 
superficial given the demonstrable Turing-equivalence of suflBciently powerful neural network 
models [69, 70]. Therefore, it is rather straightforward to design neural memories capable of 
address-based storage and recall of patterns as the following discussion illustrates. 
A mathematical model for information retrieval in address-based memory can be formulated 
in terms of a binary random mapping // (expression 2.1) by extending the partial function // 
to a full function // : (F U {<0'">}) for address-based (information retrieval) memory 
as follows: 
f l i x ]  if X € t/" fl{x) = (2.4) 
<0'"> if X G (B" - U) 
f[ maps from the set of n-bit binary addresses to the set of m-bit binary values. The retrieved 
value (or content of a memory address) is undefined if no pattern has been stored at the 
corresponding address. 
It is well known (in the literature on the design of memory systems for digital computers) 
that this approach to address-based memory design is not necessarily the most efficient for 
large address spaces. In this case, hierarchical memory organization using multiple levels of 
address decoding and multiple memory modules of the type specified above is a more practical 
alternative [171]. 
19 
2.1.4 Perceptrons 
A 1-layer Perceptron has n input neurons, m output neurons and one layer of connection 
weights. The output y,- of output neuron i is given by y,- = WijXj — Oi). Wij denotes 
the weight on the link from input neuron j to output neuron i, 0{ is the threshold of output 
neuron xj is input value at input neuron j, and fff is binary hardlimiter function, where 
{1 if X > 0 (2.5) 0 otherwise 
It is well known that such a 1-layer Perceptron can implement only linearly separable functions 
from R" to {0,1}"* [118]. We can see the connection weight vector W{ =< wn,..., Win and 
the node threshold 0,- as defining a linear hyperplane Hi which partitions the n-dimensional 
pattern spzice into two half-spaces, where [-j^ denotes the transpose of a vector or a matrix. 
A 2-layer Perceptron has one layer of k hidden neurons (and hence two layers of connection 
weights with each hidden neuron being connected to every input neuron as well as every output 
neuron). In this chapter, we use 2-layer Perceptron in which each hidden neuron uses binary 
hardlimiter function ffj as activation function. The output of output neuron i is given by 
yi = f{Y^i=iWiizi — 1); where zi is the output of hidden neuron /, / is binary hardlimiter 
function ffj in the model using binary output, and / is bipolar hardlimiter function Jh in the 
model using bipolar output. (The thresholds of all output neurons are set to 1). The bipolar 
hardlimiter function fn is defined as 
f n i x )  = 
1 if X > 0 
(2.6) 
— 1 otherwise 
2.2 Multi-layer Perceptrons as Neural Memories 
This section describes the synthesis of a binary address-based memory or a binary asso­
ciative memory using a 2-layer Perceptron. The binary address-based memory has a stor­
age capacity of A'^ = 2" while the binary associative memory has a storage capacity N = 
20 
[2"/^f_oC(n, z)J, where n is the number of input neurons and p is the adjustable precision 
level (allowable noise level) measured in terms of Hamming distance. A hidden neuron is used 
for each stored associative pair of input and output patterns. The numbers of input and out­
put neurons are fixed in these models. In the case of associative memory, this amounts to 
fixing the dimensionality of input and output patterns; while in the address based memory, it 
is tantamount to fixing the majcimum size of the address space and the dimensionality of the 
patterns stored in memory. 
2.2.1 The application of linear sepeirability of binary vertices in pattern 
classification 
Note that every n-dimensional binary vector is a binary vertex of an n-dimensional hy-
percube. Hereafter, we will use the terms binary vertex and binary vector interchangeably. 
The following theorem and its proof facilitate the systematic synthesis of the proposed neural 
memories. 
Theorem 2.1: Let u be a binary vector of dimension n, i.e., u =< ui,...,u„ where 
Ui € {0,1} for I < i < n. Let u =< ui, be the complement of the binary vector 
u. That is, u, + If, = 1 for 1 < e < n. Let u — u = =< ,..., >^. Note that 
^re/u g for 1 < i < n. Let us call the reference vector. Let be the set of n-
dimensional binary vertices which are at a Hamming distance p away from vertex u, 0 < p < ra. 
Then every binary vertex x 6 5p falls on an n-dimensional linear hyperplane which is 
perpendicular to the reference vector Furthermore, if /f"*" = < p < n}, the 
n-dimensional linear hyperplanes in i/"'" are mutually parallel. 
Proof: Let z be a binary vertex in Sp, x — u — =< and be the 
length of the projection of onto the reference vector Note that = 0 or 1 
if = 1, and = 0 or — 1 if = — 1 for 1 < i < n. Note also that there are p 
components of x'''^-^" such that x"*^" = 0 and (n — p) components x"-^" of x'"®-'^" such that 
= 1 or - 1, where 1 < i, j < n. Let || • || denote the length of a vector. Then 
r  = ( 1 1 \  
21 
1 ( "f „-/.<•/.) (2.9) 
" " ^;'f-=lor-l r-^-=0 
^ (71 -p) (2.10) ||ure/„!| 
= -L("-P) (2-11) 
y / n  
Thus, Vi € 5p, the length of the projection of onto the reference vector is 
(n — p)ly/n. That is, all binary vertices in lie on the same n-dimensional linear hyperplane 
H p ' ^  w h i c h  i s  p e r p e n d i c u l a r  t o  t h e  r e f e r e n c e  v e c t o r  a n d  l o c a t e d  a t  a  d i s t a n c e  o f  i n — p ) f y / n  
from the vertex u, that is, a distance ply/n to vertex u. Hence, every hyperplane i?p*" G 
0 < p < ra, is parallel to every other hyperplane in ZT"-". There are n+1 such mutually parallel 
linear hyperplanes /fp*"'s (Figure 2.2). Among them, u is on fTo'" and u is on /T"'". The 
hyperplanes have same normal vector (u — u)/v/n. 0 
The expression defining the n-dimensional linear hyperplane /fp'", 0 < p < n, is 
- l)a:t) - (X] Ui - p) = 0 (2.12) 
:=l 1=1 
which can be derived as follows. Let x be a binary vertex on hyperplane /fp where 0 < p < ra. 
From expressions 2.7 and 2.11. we have: 
~  ^  = - ^ ( u  -  u ) ^ ( x  ~  u )  = - ^ { n  -  p )  (2.13) 
Thus. 
{ u - u ) ^ { x - u )  =  [ n  —  p )  (2.14) 
So the defining expression of the hyperplane i/p 0 < p < n, is given by: 
/ f p ' "  =  [ u  ~ u ) ^ { x - u )  =  { n  -  p )  (2.15) 
= (u — u)'^x — (u — u)^u — (n — p) = 0, note ti^H = 0 (2.16) 
= (u - u)^x - (n - ||iZ||^-p) = 0, note ||uip + ||M||^ = n (2.17) 
= (u - u)"^! - (||u||^ - p) = 0 (2.18) 
22 
010 
110 1 
X 
1 oil 
; 100 
y 
• 
101 
000 001 
" = S "„> 
d 
u - u 
I 
\ 
I 
1 
\ 
I 
_ J. 
I 
vectors in S " 
vectors in S ^ 
' / 
' / 
/ / 
t / 
I / _ 
' / X - u, X in S " 
^ ir ^ vectors in S ", n-l 
Figure 2.2 The spatial distribution of a 3-dimensional and an 
n-dimensional binary hypercubes 
= (til - "l)xi + ... 4- («n - Un)Xn - (||u|P " P) = 0 
= {2ui - l)ii -I-... -f- (2u„ - l)x„ - (||u|p - p) = 0 
= (^(2u,- - l)x,) - (||u|p - p) = 0 
:=1 
n 
= - l)x.) - "«• - p) = 0 
i=l 
(2.19) 
(2.20) 
(2.21) 
(2.22) 
1=1 
|2 _ Vn Note that |(u|p = X)"=i "t since u is a binary vector. From above, it is known that the 
expressions defining the n+l n-dimensionaJ mutually parallel linear hyperplanes /fp'^'s, 0 < 
p < n, have same coefficients but different constant terms. Every hyperplane ifp'". where 
0 < p < n, can serve as a linear separating hyperplane to partition all n-dimensional binary 
vertices into two sets. Such a linear separating hyperplane can be efficiently implemented for 
2-class pattern classification by a l-iayer Perceptron with one output neuron. The output 
neuron has a threshold of u, — p and the connection weight on the link from input neuron 
1 is given by 2u,- — 1(= u, - u,) for 1 < t < n, where n is the number of input neurons. 
When the separating hyperplane ffp •" is realized by a 1-layer, 1-output Perceptron, the 
value of 2u{ — 1 is either 1 or —1, x, is either 1 or 0, and (2u, — l)i, can therefore be 1, 0 or 
-1; and u,- is integer. Also note that the majcimum activation of the 1-layer Perceptron 
23 
is p and minimum value is —(n — p); since the separating hyperplane ffp'" is defined as (u — 
tr)(i — u)^ — (n — p) = 0, the maximum value of (u — u)(x — u)^ is n when x = u. and the 
m i n i m u m  v a l u e  o f  ( u  —  u ) { x  —  u ) ^  i s  0  w h e n  x  =  u .  
2.2.2 Best match: pattern classification with precision control 
Since each binary vertex of dimension n on hyperplane Hp is Hamming distance p away 
from vertex u, there are Np = C(n,p) = (n-^')!p! binary vertices of dimension n on 
hyperplane where 0 < p < n. The separating hyperplane partitions all the binary 
vertices of a binary n-hypercube into two sets. One set contains 0 
binary vertices that are at a Hamming distance less than or equal to p away from vertex u, and 
the other contains IVb = HiUp+i = Jl?=p+i C{n, i) binary vertices that are at a Hamming 
distance more than p away from vertex u. Let us call the former partition the associative 
partition (denoted by a'^) of u, the vertex u the center of that associative partition, and p the 
radius of the associative partition. Note that both and are defined by the given binary 
exemplar pattern u and its precision level p. 
In theory, an n-hypercube can be almost equally partitioned by = [2"/^f_o CCn, i)J 
such associative partitions as Ofp with each associative partition containing 
dimensional binary vertices which are all at a Hamming distance less than or equal to p away 
from their corresponding partition center. The partition centers correspond to the given binary 
exemplar patterns. 
We say that an associative partition ap| is not isolated from another associative partition 
7^ j) if Qfpl ckpj / 0- Thus, if two associative partitions are not isolated from each 
other, they overlap and as a result, there is at least one binary vector that is a member of 
both partitions. The separating hyperplanes (or equivalently, associative partitions) can be 
implemented in a 1-layer Perceptron with N output neurons to recognize N patterns with 
precision level (allowable noise level) up to Hamming distance p,- for exemplar pattern i/,, 
1 < z < A^, provided the precision levels (p,s) are chosen to ensure that each associative 
partition is isolated from every other. When x e 0;^; is fed into the l-layer Perceptron, the 
24 
output neuron that corresponds to the separating hyperplane -ffp;"' is activated to produce an 
output of 1. In order to ensure that the associative partitions corresponding to two exemplar 
p a t t e r n s  i / i  a n d  u j  a r e  i s o l a t e d  f r o m  e a c h  o t h e r ,  D n { u { , u j )  h a s  t o  b e  g r e a t e r  t h a n  ( p , -  - h  p j )  
where p,- and pj are the allowable precision levels. Otherwise, the associative partitions of i/i 
and i/j would overlap with each other, and when an input pattern i, where Dff{x, f,) < pi and 
< pj, is fed into the 1-layer Perceptron, the output neurons for the two exemplar 
patterns Ui and i/j will produce 1 as their outputs. In this case, the input pattern x cannot be 
unambiguously classified as it falls in the region of overlap between the associative partitions 
and apj. 
2.2.3 Storage capacity 
Suppose input patterns are 10x10 arrays of binary pixels (see Figure 2.1). Then 100 input 
neurons are required to implement such a 1-layer Perceptron for pattern classification. The 
number of possible input patterns is 2^°° « 10^°. An output neuron is needed for each distinct 
exemplar pattern. Table 2.1 shows the corresponding maximal storage capacity of the 1-layer 
Perceptrons designed for a range of different allowable noise levels. Table 2.1 also suggests 
that a l-layer Perceptron with n input neurons has very high storage capacity for classifying 
binary patterns and that the allowable precision (noise) levels of less than 30% are desirable 
for reliable classification. 
2.2.4 Synthesis of associative and address-based memories 
Given a set U of k distinct binary input vectors Ui, ..., Uk of dimension n, where u,- =< 
"ii, —1 u,n and Uig 6 {0,1} for 1 < i < A: &: 1 < 5 < n; and a set V of k desired binary 
(bipolar) output vectors uj, ..., Vk of dimension m, where u,- =< u,i,..., u.m and Vih 6 {0,1} 
(or {-1,1}) for 1 < i < fc & 1 < /i < m. Assume the Hamming distance between any two 
binary vectors in U is at least 2p+1, where p 6 N. This ensures that all associative partitions 
would be isolated with the precision level being set at p. 
We can now design a neural architecture for information retrieval using axldress-based 
25 
Table 2.1 The corresponding maximal storage capacity of a 1-layer Per-
ceptron with 100 input neurons for classifying binary patterns 
for a range of allowable noise levels 
allowable noise maximal capacity 
0% N = [2i°°/El:^'"'C'(100,i)J a 1.0 X 10^ 
10% N - .  = [2i«'/5:f^'^o-^C(100, i)J « 5.0 X 10^® 
20% N = i)J % 1.4 X 10^ 
30% N = L2^°°/El™''°"^C(100, i)J % 2.4 X 10" 
40% N = [2^°o/i:|°O''O-''C(100,z)J w 38 
50% (
M II 
memory, denoted by function // (defined by expression 2.4), or associative memory, denoted 
by function /a (defined by expression 2.3). For this purpose, a memory module of a 2-layer 
Perceptron can be synthesized using the 1-layer Perceptrons, proposed for pattern classification 
in Section 2.2.1. as follows: 
The memory module (using binary input) has n input, k hidden, and m output neurons. 
For each associative ordered pair (u,-, u,), where 1 < t < A:, we create a hidden neuron i with 
threshold ~ Pi (see Figure 2.3), where p,- 6 N and p,- < p is the adjustable precision 
level for that associative pair. The connection weight from input neuron g to hidden neuron i is 
'2uig — l (= Uig—Uig) aud that from hidden neuron i to output neuron h is Vih- The threshold for 
each of the output neurons is set to 1. The activation functions at hidden neurons are binary 
hardlimiter function ///, The activation functions at output neurons are binary hardlimiter 
function fn (expression 2.5) if the desired output of output neurons is binary. The activation 
functions at output neurons are bipolar hardlimiter function /// (expression 2.6) if the desired 
output of output neurons is binary. 
Since input is binary, the weights in the Ist-layer connections of the memory module are 
either 1 or -1. A bit of an input pattern that is wrongly on (with respect to a stored pattern), 
contributes -1 to the activation of the corresponding hidden neuron and a bit of an input 
pattern that is rightly on (with respect to a stored pattern) contributes +1 to the activation 
of the corresponding hidden neuron. A bit of an input pattern that is (rightly or wrongly) 
off (with respect to a stored pattern) contributes 0 to the activation of the corresponding 
26 
y. Xh ym 
output layer 
hidden layer 
input layer 
Figure 2.3 The setting of connection weights and hidden node threshold in 
the proposed neural memory (a 2-layer Perceptron with binary 
input) for a given associated memory pair 
hidden neuron. Each hidden neuron sums up the contributions to its activation from its 1st-
layer connections, compares the result with its threshold (which equals the number of 1 in the 
stored memory pattern minus its desired precision level), and produces output value 1 if its 
activation exceeds or equals its threshold. If one of the hidden neurons is turned on, one of the 
stored memory patterns will be recalled by that hidden neuron, i^ote that an input pattern is 
matched against all the stored memory patterns in parallel. If the time delay for computing 
the activation at a neuron is fixed, the time complexity for such a pattern matching process is 
0(1). Note that this is attained at the cost of a hidden neuron (and its connections) for each 
stored association. 
Since all the associative partitions are isolated from each other, when the memory module 
is presented with a binary input vector x 6 only the hidden neuron i produces an output 
of 1 and the output values from all other hidden neurons are 0. So the value at output neuron 
j is Vij, and hence the output binary vector will be < Utii-MUim >^= U{. Since for each 
memory association pair a hidden neuron is created and its creation or deletion is independent 
Vi, V. im 
hidden neuron i 
27 
of other stored associative pairs, this particular design of associative memory lends itself to 
rapid one-shot incremental learning with no interference with previously stored associations. 
It is also worth pointing out that exactly the same network architecture can be used to 
realize both associative as well as address-based memory. If p,- is set as 0, | j = 1 and 
the memory module functions as an address-based memory when 2" hidden neurons are used 
to resolve all possible addresses; and if 1 < pi < p, | | > 1 and it can be used as an 
associative memory with adjustable precision control. (| A | denotes the cardinality of a 
set A). Address-based memory, extensively used in current computer system, can serve as 
working spcice of dynamic representations for symbol processing and shared message-passing 
space among neural network modules in an integrated neural network system. As working space 
for symbol manipulation, neural memories have to allow run-time update without learning and 
do not degrade when the number of stored memory patterns increases. Note that the proposed 
neural address-based memory has these two properties. 
2.2.5 Exact match: binary mapping Perceptron (BMP) module 
Let £/• be a set of k distinct binary input vectors Ui, ..., Uk of dimension n, where u,- =< 
u,i,..., and 6 {0,1} for 1 < i < fc & 1 < < n: and V be a set of k desired binary 
output vectors VI, .... Vk of dimension M, where U,- =< u,I,..., u.m and I;,A € {0,1} for 
l<i<kLl<H<m. Consider a binary mapping function /BMP : B" —(V U {<0'">}) 
defined as follows: 
V i  if X = U i ,  I  <  i  <  n  SBMP{X) = (2.23) 
<0'"> if X 6 (B" -C/) 
where B" is the n-dimensional binary space. A BMP module for the binary mapping function 
IBMP can be synthesized using a 2-layer Perceptron as follows: The BMP module (see Figure 
2.4) has n input, k hidden and m output neurons. For each binary mapping ordered pair 
(u,, Ui), where 1 < i < fc, we create a hidden neuron i with threshold u,j. The connection 
weight from input neuron g to hidden neuron i is 2uig — 1 (= Uig — Uig) and that from hidden 
neuron i to output neuron h is u,/,. The threshold for each of the output neurons is set to 1. 
The activation functions at hidden and output neurons are binary hardlimiter function ///. 
28 
yi yh ym 
output layer 
hidden layer 
input layer 
Figure 2.4 The settings of connection weights and hidden node threshold 
in the proposed BMP module for an associated binary mapping 
ordered pair 
Note that for the binary input vector w,-, only the hidden neuron i outputs a 1, and the 
rest of the hidden neurons output 0. Thus the output of the ^h output neuron is Vij, and so 
the binary output vector is < u,i, >= u,-. While for an input vector x 0 {/. no hidden 
neuron is activated and the output is <0'">. 
2.2.6 Conversion between memory models using bipoleir and binziry inputs 
Much of the analysis in previous subsections assumed binary input patterns. It turns out 
that the use of bipolar instead of binary inputs simplifies the implementation of the proposed 
associative memory design especially when recaJl from partially specified input patterns is 
desired (see Section 2 for details). This subsection explores the relationship between memory 
models using binary and bipolar inputs and the conversion between the two. 
The spatial distribution and geometrical characteristics of bipolar vertices in a bipolar 
hypercube is very similar to those of binary vertices in a binary hypercube, except that the 
distance between any two bipolar vertices is 2 times of that between their corresponding binary 
V:, 
V. Vi, im 
hidden neuron i 
29 
vertices. Given a bipolar vertex u, there also exist n+1 mutuaJly parallel linear hyperplanes 
which have similar features described in Theorem 2.1. The expression defining in an 
n-dimensional bipolar hypercube is 
n 
•" = - (n - 2p) = 0 (2.24) 
1=1 
which will be derived in the following. All notations here are same as those in Section 2.2.4, 
with one exception: bipolar vector (vertex) is used in place of binary vector (vertex). In this 
case Ui 6 {—1,1} ; u,- + u, = 0 ; 6 {2, —2} ; = 0 or 2 if = 2, and = 0 
or -2 if = —2 ; there are p components of such that = 0, and (n — p) 
components of such that x^^" =2 or — 2. Then 
(2-25) 
= "f <'7'-'7''+ t (2-27) 
' " r;'=^''=2or-2 x,'"'=^>*=0 
^ X 4(71 - p) (2.28) 
From expressions 2.25 and 2.30 we have: 
(2.26) 
||tire/u 
= (2.30) 
(2.31) 
Thus, 
(u - ir)^(i - u) = 4(71 - p) (2.32) 
So the hyperplane •" is given by 
Hp'" = (u - u)^(x - u) = 4(71 - p) (2.33) 
= (u - u^)i - (« - u)^u - 4(n - p) = 0 (2.34) 
=  (u —  u ) ^ x  —  ( u ^ u  —  ||u|p + 471 — 4p) = 0, note vTu = —n (2.35) 
30 
(ti — u)^x — (—n — n + 4n — 4p) = 0 
(u — u)^x — (2n — 4p) = 0 
(til -ui)xi + — + (u„ - u„)i„ - (2n - 4p) =0 
2tfiXi + ... + 2unXn — (2n — 4p) = 0 (2.39) 
(2.38) 
(2.37) 
(2.36) 
UiXi + ... + UnXn - (n - 2p) = 0 (2.40) 
n 
(^ u.z.) - (n - 2p) = 0 (2.41) 
:=1 
Every hyperplane where 0 < p < n, can serve as a linear separating hyperplane to 
partition all n-dimensional bipolar vertices into two sets. Such a linear separating hyperplane 
can be efficiently implemented for pattern classification by a l-layer Perceptron with one output 
neuron. The output neuron has a threshold of n — 2p and the connection weight on the link 
from input neuron i is given by u, for 1 < i < n, where n is the number of input neurons. 
Since input is bipolar, the connection weight in the l-layer Perceptron is either 1 or — 1. 
The connection weight of 1 matches the corresponding bit of an input pattern if it is on while a 
connection weight of -1 matches the corresponding bit of an input pattern if it is off. A match 
contributes 1 unit to the activation of the corresponding hidden neuron while a mismatch 
contributes —1 unit. Each hidden neuron sums up the contributions to its activation from 
each of its input links, compares it with its threshold and activates the corresponding output 
neuron if the degree of match (similarity measurement) for the entire pattern exceeds or equals 
the threshold. 
Note that the value passed from each connection is either 1 or —1 , compared to the 
three values {l.O, —1} in the binary model. This property can further simplify the hardware 
implementation requirement for a l-layer Perceptron using bipolar (as opposed to binary) 
input. 
Based on this l-layer Perceptron and the method described in previous subsections for 
setting the weights of the second layer connections, the synthesis of a memory module (using 
bipolar input, see Figure 2.5) of 2-layer Perceptron is rather straightforward given a set of 
desired pattern associations. 
31 
The memory module (using bipolar input) has n input, k hidden, and m output neurons. 
For each associative ordered pair (u,-, u,), where 1 < i < A;, we create a hidden neuron j with 
threshold n -2p,-, where p,- G N and p,- < n is the adjustable precision level for that associative 
pair. The connection weight from input neuron g to hidden neuron i is and that from 
hidden neuron i to output neuron h is o,/,. The threshold for each of the output neurons is 
set to 1. The activation functions at hidden neurons are binary hardlimiter function fn- The 
activation functions at output neurons are binary hardlimiter function fn if the desired output 
of output neurons is binary. The activation functions at output neurons are bipolar hardlimiter 
function Jh if the desired output of output neurons is binary. 
It is worth pointing out that the bipolar associative neural memory model derived here 
turns out to be exactly equivalent to the memory model proposed by [59] which uses real-value 
neuron thresholds and proves the eifectiveness of the bipolar memory model by algebra based 
on using Hamming distance as difference measurement between input pattern and memory 
patterns. In this subsection, the spatial distribution and linear separability of bipolar vertices 
in a bipolar hypercube is examined from a geometrical perspective to locate a set of mutually 
parallel linear hyperplanes which respectively separate nicely all the bipolar vertices into two 
sets to facilitate the design of bipolar neural memories. The linear separating hyperplanes 
can be efficiently implemented in a 1-layer Perceptron with connection v/eights and neuron 
thresholds of integer values. 
Some notable differences between the binary and bipolar associative memory models de­
veloped above are: 
• The binary model uses binary hardlimiter as the activation function at both hidden and 
output neurons, so does the bipolar model if the associated output is binary. If the 
associated output is bipolar, binary and bipolar hardlimiters (respectively) are used as 
activation functions at hidden and output neurons. 
• Threshold setting for a hidden neuron in the binary model equals the number of on 
bits of the corresponding memory pattern minus the desired precision level (measured 
in Hamming distance), which is not independent of the corresponding memory pattern; 
32 
output layer 
hidden layer 
input layer 
Figure 2.5 The setting of connection weights and hidden node threshold in 
the proposed neural memory (a 2-layer Perceptron with bipolar 
input) for a given associated memory pair. 
whereas threshold setting for a hidden neuron in the bipolar model equals the number of 
input neurons minus twice the value of the desired precision level, which is independent 
of all the memory patterns. This has a special advantage when the associative memory 
is used to recall a pattern based on a partially specified input (see Section 2 for details). 
2.3 Properties of the Proposed Neural Associative Memory 
The following three subsections explore and develop mathematical models for several in­
teresting properties of the proposed bipolar neural associative memory including: recall from 
p a r t i a l l y  s p e c i f i e d  i n p u t  p a t t e r n s ,  ( s e q u e n t i a l )  m u l t i p l e  r e c a l l s ,  a n d  f a u l t  t o l e r a n c e .  A  s e t  U  
of k bipolar input vectors Ui, Ufc of dimension n and a set V of k desired binary/bipolar 
output vectors vi, ..., of dimension m are given. In the discussion that follows, we assume 
that the m-dimensional null pattern (a vector of all Os in the binary case or a vector of all —Is 
i n  t h e  b i p o l a r  c a s e )  i s  e x c l u d e d  f r o m  V .  
yi Vb Ym 
im 
hidden neuron i 
Xn 
33 
2.3.1 Partial match: associative recall from a partiaUy specified input 
This subsection examines the problem of recall from a partially specified bipolar input 
pattern. The analysis that follows assumes that the unavailable components of a bipolar partial 
input pattern have a default value of 0. Thus, a bipolar partial input pattern is completed by 
filling in a 0 for each of the unavailable components (the Os as a whole also can serve as noise 
mask or filter). This makes it possible to handle the problem of associative recall from partially 
specified input pattern in a manner that is analogous to that of recall from completely specified 
input pattern. The Ist-layer connections of the neural memory module perform similarity 
measurements on the available components of a partial input pattern, ignore the similarity 
measurements on the unavailable components, and pass the similarity measurements to the 
corresponding hidden neurons to decide whether to activate a corresponding hidden neuron. 
Let u be a partially specified n-dimensional bipolar pattern with the values of some of its 
components being unknown. Define 
• bits{u): a function which counts the number of components with known values (+1 or 
-1) of bipolar partial pattern ii 
• parfO(u): a function which pads the unavailable bits of bipolar partial pattern ii with Os 
• u © u; a binary predicate which tests whether "u is a partial pattern of v", where "u is 
a partial pattern of i'" means that the values of available components of u are same as 
those of their corresponding components in v 
For example, let u =<?, — 1,1,1, ? > be a 5-dimensional partial pattern whose first and 
fifth components have unspecified values. Then bits{u) = 3, padO[u) =<0, —1,1,1,0> and 
u G <1, -1,1,1,1> is true. (Note that this definition of a partial pattern respects the positions 
of the components and does not accommodate shifts or translation). 
Let D f f { u , v )  denote the Hamming distance between two bipolar partial patterns u and v  
which have same corresponding unavailable components. If bits{ii) = j, padO{it) is a padded 
j - b i t  p a r t i a l  p a t t e r n  d e r i v e d  f r o m  p a r t i a l l y  s p e c i f i e d  p a t t e r n  i i .  D e f i n e  U j i -  =  { «  |  b i t s { u )  =  
j & uQui}, l<j<nkl<i<k. i.e.. Up- is the set of partial patterns, with j available bits, 
34 
of a bipolar pattern u,. Define C/p,(p,) = { p a d Q { u )  | 3«, u  € U p ^  & D f f { u , u )  <  [ j / n \  x  p i } ,  
1 ^ J £ " ^  ^ i-®M Up^{pi) is the set of padded >bit partiaJ patterns which are at 
Hamming distance less than or equal to \_j/n\ x p,- to any one of the j-bit partial patterns of 
full pattern Ui. 
Practical applications may place limits on the range of usable settings of p, (allowable noise 
level) (see Section 2 for details). It may also be necessary to limit recall from partial input 
pattern to cases in which a sufficiently large number of bits in the input pattern, say j > c, 
have available values. For instance, when dealing with patterns of 10x10 pixels, we may set 
Pi = 0.3 x 100 = 30 and require that at least 40% of the pixels be available in the input pattern. 
To simplify matters in what follows, we use the same precision level p for each stored 
pattern. That is, p,- = p,l < i < k. However, note that particular applications may require 
the use of different values of p, under different circumstances. For example, punctuation 
symbols and letters of the alphabet may need different values of p,- for successful recognition 
in recognizing printed English characters. 
Let Uf,TiP) = and = uLiC/|>r(P)- Let fp : ^ienote 
the function of recall from padded bipolar partial pattern. Then fp is defined as follows: 
f p [ x )  =  u,-: if z € U^'^ip), l<i<k (2.42) 
fp is a partial function and is extended to a function fp : 3*^^" —> (V U {< ( —1)™ >}) for 
recall from padded bipolar partial pattern using associative memory as follows; 
/P(X)= 
f p { x )  i f x e t / r ' ^ C p )  
(2-43) 
<(-!)'"> if X 6 -C;^~"(p)) 
where is the universe of n-dimensional vectors each of whose components is 1, 0. or —1 
and which have at least c non-zero components (corresponding to the available bits) and thus 
at most (n — c) zeros for the unavailable bits (as a result of padding). 
It is easy to see that if the Ist-layer connection weights in the bipolar neural memory 
(described in Section 2) were set up using only a part of a complete memory pattern, the con­
nection weights set for the available components of its partial pattern would be same as those 
35 
that would have been obtained if the complete memory pattern were used in establishing the 
weights. The threshold setting for a hidden neuron in the bipolar memory equals the number 
of input neurons minus twice the value of the desired precision level, which is independent of 
all stored memory patterns but depends on the dimensionality of input pattern. Hence the 
bipolar neural memory module designed for recall from a fully specified input pattern can be 
used for associative recall from a partially specified input pattern by only adjusting the thresh­
olds of the hidden neurons as follows: multiply the threshold of each hidden neuron by the ratio 
of the number of available components of a partial input pattern to that of a complete pattern. 
That is, reduce the threshold of each hidden neuron i from (n —2p,) to (n —2p,) x Ua/n, where 
Ua < n is the number of the available bits of a partial input pattern. 
Note that p,- is the precision level for memory pattern i in the problem of recall from full 
pattern and (n — 2p,) x Ua/n = Ua — 2(p. x Ua/n) is the new threshold for recall from a partial 
pattern. The expression for the new threshold is similar to that for old threshold. In the new 
threshold equals the number of available bits of a partial input pattern and pi x Ua/n is 
the new precision level. In the interest of efficiency of a hardware realization, it is desirable to 
use \pi X Ua/n'] or [p,- x Ua/n] as the new precision level, where [•] and [-J respectively denote 
the integer ceiling and floor of a real value. 
2.3.2 Multiple associative recalls 
The memory retrieval process in the neural memory described in Section 2 can be viewed 
as a two-stage process: identification and recall. During identification of an input pattern, the 
Ist-layer connections perform similarity measurements and sufficiently activate zero or more 
hidden neurons so that they produce outputs of 1. The actual choice of hidden neurons to be 
turned on is a function of the Ist-layer weights, the input pattern, and the threshold settings 
of the hidden neurons. During recall, if only one hidden neuron is turned on, one of the 
stored memory patterns will be recalled by that hidden neuron along with its associated 2nd-
layer connections. Without any additional control, if multiple hidden neurons are enabled, the 
corresponding output pattern will be a superposition of the output patterns associated with 
36 
each of the activated hidden neurons. With the addition of appropriate control circuitry, this 
behavior can be modified to yield sequential recall of more than one stored pattern. This has 
a number of practical applications such as information retrieval, database query processing 
[23, 24] (see Chapter 3), knowledge-based diagnosis systems, etc. This has the effect of 
searching through memory for patterns that are sufficiently close to a given input pattern and 
then recaJl them one after another. 
Multiple recalls are possible if some of the associative partitions realized in the memory 
module are not isolated (see Section 2 for details). An input pattern (a bipolar vertex) located 
in a region of overlap among several partitions is close enough to the corresponding partition 
centers (stored memory patterns) at the same time and hence can turn on more than one 
hidden neuron. The following explores this phenomenon in more detail. 
Define i/"(p,) = {w | u 6 B" & D h {u , U i )  < p,}, 1 < i < A:, where B" is the universe of 
n-dimensional bipolar vectors; i.e., U^{pi) to be the set of n-dimensionaJ bipolar vectors which 
have Hamming distance less than or equal to p,- away from the given n-dimensional bipolar 
vector u, , where p,- is a specified precision level. Let p,- = p, 1 < z < A:. 
Define /A/ : U^{p) [2^ — 0) as follows: 
f M { x )  =  { v , - 1 X e iliip)-, 1 < z < A:} (2.44) 
where L / ^ { p )  =  U y { p )  U U ^ i p )  • • • U U ^ i p ) ,  Ui{ p )  n Uj[ p )  ^  0 for some i  j .  and 2^ is the 
power set of V (i.e., the set of all subsets of V). The output of f\j is a set of bipolar vectors 
that correspond to the set of patterns that should be recalled given the bipolar input vector x. 
Jm is a partial function and is extended to a full function Jm - (2'^u{< (—l)"* >} — 0) 
to describe multiple recall in the neural associative memory as follows: 
f M { x )  = fxfix) ifx6C/"(p) (2.45) 
[{<(-!)'">} ifi6(B"-t/"(p)) 
Recall of multiple patterns is likely to be all the more useful when the input pattern is only 
partially specified. The following extends the mathematical model for multiple recall outlined 
above to deal with recall from a partially specified bipolar input pattern. 
37 
Let Jmp '• U^{p) —y (2^ — 0) be a function defined as follows: 
/ m p ( x )  =  { v i  I X  e  i / p 7 " ( p } ,  1  <  I  <  * : }  (2.46) 
where £/p,-(p) N U p j i p )  7^ 0 for some /I's and i  ^  j's, c  <  h  <  n  and 1 < i. j  <  k .  /A/P is 
a partial function and is extended to a function f^p : —> (2^ U {< (—1)"* >} — 0) for 
multiple recalls from padded bipolar partial patterns in associative memory as follows: 
Another interesting property of the proposed ANN memory is that it allows sorted multiple 
reca//described as follows: If the input pattern is held constant and the thresholds of all hidden 
neurons are decremented at each time step, then gradually more and more hidden neurons will 
be turned on. Decrementing the threshold of a hidden neuron results in an enlargement of the 
corresponding associative partition in a geometrical sense, and hence more and more partitions 
will overlap at the input vector (vertex) from iteration to iteration. In the absence of any other 
control circuits, the recalled pattern will be a superposition of the outputs resulting from all 
of the hidden neurons that are enabled at any time step. However, in many applications, 
we need different patterns to be recalled individually. This can be accomplished by adding a 
habituation mechanism that forces a hidden neuron to turn itself off automatically after it has 
been on for one time step unless a new input pattern is presented. This results in a serialized 
or sequential recall of patterns in increasing order of dissimilarity (as measured by Hamming 
distance) from the input pattern. Alternatively, one can perform a set difference operation 
on the hidden neuron outputs from every pair of consecutive time steps before allowing the 
hidden neurons to influence the 2nd-layer connections and the output neurons. It is rather 
straightforward to implement such set-theoretic operations using neural networks [25]. 
As already pointed out, the ability to perform multiple recalls is more likely to be useful 
when dealing with partially specified input patterns. Such information retrieval applications 
of practical interest include: database lookup using keywords, diagnosis of diseases or faults 
from a partially specified set of symptoms or test results, and DNA sequence recognition from 
available DNA segments. 
f M p { x )  = < fMp{x) ifxeC/p~"(p) 
< (-I)'" > if a: € (B^~" - t/^~"(p)) 
(2.47) 
38 
2.3.3 Fault tolerance 
This section discusses the performance of the proposed neural memory in the presence of 
two basic types of faults — connection fault and neuron fault. In the discussion that follows, 
it is assumed that: 
• When a connection fails and stops passing a value, it is assumed that default value 0 is 
passed from that faulty connection. 
• When an input or hidden neuron fails and stops functioning, it is assumed that default 
value 0 is passed along each of its outgoing connections. 
• When an output neuron fails, it is assumed that default value —1 (or 0 for binary output) 
is produced by that output neuron. 
2.3.3.1 Connection fault 
First we note that a single fault in a Ist-layer connection has less deleterious effect on the 
performance of the memory than that caused by a noisy bit in an input pattern. This is because 
a faulty Ist-layer connection will adversely affect only one of the similarity measurements 
between the input pattern and the stored memory patterns whereas a noisy bit of an input 
pattern affects all the similarity measurements. 
For example, assume u,- =< 1,-1,1,-1,1 and Uj =< 1,1,1,1,1 are two of the 
memory patterns stored by hidden neurons i, j and their respective connections in an auto-
associative memory. Note that Uj) = 2. Let wj denote the weight vector of the Ist-layer 
connections connected to hidden neuron i. Suppose the precision levels p,- = pj = 1. Then 
w} =<1, —1,1, —1,1>^, Wj =<1,1,1,1,1>^, and the thresholds at hidden neurons i and j 
are = dj = 3 in the neural memory according to expression 2.41. Suppose a noisy input 
pattern x =<1, -1,1,1,1>^ is fed into the neural memory module. Then both hidden neurons 
i and j are activated and as a result, the output is a superposition of memory patterns tt,- and 
Uj. Suppose the connection from the second input neuron to hidden neuron i is faulty, which 
causes wl =<1,0,1, -1,1>^ equivalently. Assumes that wj is unaffected by the connection 
39 
fault. When the noisy input pattern x is fed into the neural memory module, the summation 
values at hidden neuron i and j are —1 and 0 respectively. Only hidden neuron j is activated 
and memory pattern Uj is recalled. 
Since each of the 2nd-layer connections emanating from a hidden unit stores 1 bit of the 
corresponding stored memory pattern, a faulty 2nd-layer connection corrupts at most 1 bit 
of the recalled memory pattern. Suppose the connection from hidden neuron j to the third 
output neuron is faulty. When only hidden neuron j is activated to recall memory pattern Uj, 
a default value 0 is passed from that faulty connection to the third output neuron under the 
assumption of connection fault. The recalled output is <1,1, -1,1,1>^ which is one Hamming 
distance from memory pattern Uj. When only hidden neuron i is activated to recall memory 
pattern u, with a connection fault from hidden neuron i to the second output neuron, the 
recalled output is <1, —1,1, — 1,1>^ which equals Ui 
2.3.3.2 Neuron fault 
A fault in one of the input neurons has less of an adverse effect on the performance of the 
memory than 1 bit of noise in the input pattern. However, it is easy to see that an input 
neuron fault is more serious than a fault in a single Ist-layer connection. This is because a 
fault in one of the input neuron adversely affects each of the stored memory patterns. For 
example, suppose u,, Uj and x are as before. Let us consider following three cases: 
• Case 1: one of the first, third, or fifth input neurons is faulty. When x is fed into the 
neural memory module, the summation values at hidden neurons i and j are both —1. 
No memory pattern is recalled. 
• Case 2: the second input neuron is faulty. Then the summation values at hidden neuron 
i and j are —1 and 1 respectively. The hidden neuron j is activated to recall memory 
pattern Uj. 
• Case 3: the fourth input neuron is faulty. Then the summation values at hidden neuron 
i and j are 1 and -1 respectively. The hidden neuron i is activated to recall memory 
pattern u,-. 
40 
If a hidden neuron is faulty, the memory pattern associated with that hidden neuron can not 
be recalled. If an output neuron is faulty, the corresponding bit of all recalled memory patterns 
will have value —1. So the recalled memory patterns having value 1 at that corresponding bit 
are corrupted by one bit of noise. 
2.4 Summary and Discussion 
This chapter has discussed the analysis and synthesis of a neural memory for both address-
based as well as associative (content-based) storage and recall of patterns. When used as 
content-addressed memory, the proposed ANN memory allows adjustable precision and sorted 
extraction of all stored memory patterns, has high potential storage capacity, and exhibits 
several interesting properties: recall from partial pattern, multiple recall and fault tolerance. 
It also lends itself to one-shot incremental learning without interference with previously mem­
orized patterns. A detailed mathematical analysis of the properties of the proposed neural 
memory architecture is presented. Address-based memory can serve as working space of dy­
namic representations for symbol processing and shared message-passing space among neural 
network modules in an integrated neural network system. It provides for reliable content 
modification in real time, a necessary feature for symbol processing applications. 
The pattern matching process of the proposed content-addressed memory in which data 
parallelism is achieved and all memory patterns are compared with input pattern in parallel 
within one step can be far more efficient (in terms of computation time) than searching for 
data in a key-based organization of the sort commonly used in conventional computer systems 
[23]. With the need for real-time response in language translation and with the increased 
number of users as well as increased use of large networked databases over the Internet, efficient 
architectures for high-speed table lookup, message routing and database query processing have 
assumed great practical significance. Extensions of the proposed ANN memory architecture 
for efficiently handling database queries and syntax analysis are proposed in Chapters 3 and 
6 (also see [22, 23]) respectively. 
41 
It is worth mentioning that the proposed neural memory supports realization of many-to-
one binary random mappings which is extensively used in the design of digital logic devices 
such as logic circuitries of AND/OR (or NAHD/NQR) gates. The design and optimization of such 
circuitry has always been one of the key research problems in the VLSI research community 
and industry. The hardware realization of the proposed neural architecture provides same 
flexibility as PLA (programmable logic array) and thus higher abstraction level than AND/OR 
gates for logic functions (many-to-one binary mappings). The anticipated performance of 
hardware realizations of the proposed memory architecture is evaluated in Section 3.2.1.1. 
If the hardware realization provides for run-time loading of connection weights and neuron 
thresholds under software control, it provides for an efficient, time-saving, and error-preventing 
alternative to the implementation of PLA and combinational circuitry of AND/OR gates for logic 
circuitry. 
42 
3 NEURAL ARCHITECTURES FOR INFORMATION RETRIEVAL 
AND DATABASE QUERY PROCESSING 
3.1 Introduction 
This chapter explores the application of neural associative memory to efficient implementa­
tion of noise-tolerant information retrieval and query module in large database systems. Based 
on the neural associative memory proposed in Chapter 2, a library query system and a query 
system for text-based machine-readable lexicon are explored respectively by exploiting the ca­
pability of neural associative memory for massively parallel associative pattern matching and 
retrieval. The performance of the ANN-based database query module is analyzed and com­
pared with other techniques commonly used in current computer systems. The results of this 
analysis suggest that the proposed ANN design olFers an attractive approach for the realiza­
tion of query modules in large database and knowledge base systems, especially for information 
retrieval based on partial matches. 
Artificial neural networks offer an attractive computational model for a variety of ap­
plications in pattern classification, language processing, complex systems modelling, control, 
optimization, prediction and automated reasoning for a variety of reasons including; poten­
tial for massively parallel, high-speed processing, resilience in the presence of faults (failure of 
components) and noise. Despite a large number of successful applications of ANN in aforemen­
tioned areas, their use in complex symbolic computing tasks (including storage and retrieval 
of records in large databases, and inference in deductive knowledge bases) is only beginning to 
be explored [21, 22, 23, 24, 47, 72, 99, 179]. 
Database query entails a process of content-based table lookup (associative search and re­
trieval) which is used in a wide variety of computing applications. Examples of such lookup 
43 
tables include: routing tables used in routing of messages in communication networks, symbol 
tables used in compiling computer programs written in high level languages, knowledge bases 
which store facts and rules in relational form, fact and rule tables used in unification process of 
logic programming systems, keyword tables (inverted and signature files) used in information 
retrieval applications [37], and machine-readable lexicons, dictionary as well as varieties of ta­
bles used in memory-based parsing [82] for natural language processing. In such tables, every 
table entry is an associated input-output ordered pair. As the number of table entries and 
the occurrence of partially specified inputs increase, the delay of locating an associative table 
entry can become a severe bottleneck in large-scale information processing tasks which involve 
extensive associative table lookup. Therefore, many researchers have explored to augment con­
ventional database systems with subsystems which effectively exploit associative processing to 
enhance the performance of the systems [30, 101, 121, 135, 157, 162, 196]. Many applications 
require associative table lookup mechanism or query processing system to be capable of re­
trieving items based on partial matches (some features of the input are noisy or missing) or 
retrieval of multiple records matching the specified query criteria. This capability is compu­
tationally rather expensive in many current computer systems. The ANN-based approach to 
database query processing that is proposed in this chapter exploits the fact that an associative 
table lookup task can be viewed at an abstract level in terms of associative pattern matching 
and retrieval which can be efficiently realized using neural associative memories. The rest of 
the chapter is organized as follows: 
• The rest of Section 3.1 briefly discusses how to represent symbolic information in terms 
of binary codings to facilitate symbolic information manipulation on the proposed neural 
associative memory which operates on bipolar/binary values. 
• Section 3.2 explores information retrieval and query processing using neural associative 
memory. ANN designs are developed respectively for a library query system and a query 
system for text-based machine-readable lexicon by taking advantage of the capability 
of the proposed neural associative memory for massively parallel pattern matching and 
retrieval. 
44 
• Section 3.3 compares the performance of the proposed ANN-based query processing sys­
tem with that of several commonly used techniques. 
• Section 3.4 concludes with a summary. 
3.1.1 Information retrieval in neural associative memories 
Most database systems store (symbolic) data in the form of structured records. When a 
query is made, the database system searches and retrieves records that match the user's query 
criteria which typically only partially specify the contents of records to be retrieved. Also, 
there are usually multiple records that match a query (e.g., books written by a particular 
author in a library or the lexical specifications of the words matching the partially specified 
input pattern ma?e in a machine-readable lexicon, where the symbol ? means the English 
letter at that position is unavailable). Thus, query processing in a database can be viewed as 
an instance of the task of recall of multiple stored patterns given a partial specification of the 
patterns to be recalled. The proposed neural associative memory which is capable of massively 
parallel best match, exact match, and partial match and recall of binary (bipolar) patterns 
can serve to efficiently handle information retrieval and query processing in large database 
systems. 
The proposed neural associative memory operates on binary (bipolar) values. Since humans 
find it difficult to work with binary codes, we use symbolic representations when the neural 
associative memory is used for information storage and retrieval. The translation from symbolic 
representations to binary codings can be done automatically and is not discussed here. 
In general, symbolic information retrieval (lookup) from a table can be viewed in terms of 
a binary random mapping fi :U V, defined in expression 2.1. A binary vector u, E U can 
be used to represent an ordered set of r binary-coded symbols from symbol sets Fi, r2, • • - , Tr 
respectively (i.e., 3ai € Fi,..., Oir € Fr s.t. Ui = ai •a2-.•••ocr, where • denotes the concatenation 
of two binary codes), and a binary vector u,- £ V can be used to represent an ordered set of 
t symbols from symbol sets Ai,A2, respectively, where 1 < i < A:. In the context 
of Section 3, every F,- denotes the set of ASCII-coded English letters, r is the length (in 
45 
number of English letters) of the input, i = 1, and Ai is the set of A/-bit record pointers, 
where 1 < i < r. Let \U\ = |ri|ir2| ---irrl. Then, // defines a symbolic mapping function 
/s : Ti X r2 • • • X Tr —»• Ai X Aj • • • X At. In this case, the I/O mapping of symbolic function fs 
(information retrieval from a symbolic table, given a query criterion) can be viewed in terms 
of the binary (bipolar) mapping operations of // which is realized by the proposed neural 
associative memory. 
3.2 Query Processing Using Neural Associative Memories 
This section describes the use of the neural associative memory described in Chapter 2 
to implement high-speed database query systems. An ANN-based library query system and 
an ANN-based query system for a text-based machine-readable lexicon for natural language 
processing (NLP) are presented respectively to illustrate the key concepts. As the quantity of 
entries (records) of a database increases, the cost of locating an entry can become a significant 
cost for real-time, large-scale machine processing of text and for a library system with huge 
stored volumes and large users. For example, the library at Iowa State University has over 2 
million volumes, and the number of words a native English speaker knows is estimated to be 
between 50,000 and 250,000 [4]. In the proposed ANN-based query systems, such a cost can be 
reduced significantly by taking advantage of the capability of the proposed neural associative 
memory for massively parallel associative pattern matching and retrieval. 
3.2.1 Realization of lexical access for a machine-readable lexicon using a neural 
associative memory 
In the analysis, interpretation, and generation of natural languages, the lexicon is one of the 
central components of many NLP applications. Basically, the lexical specification for a word 
in a lexicon contains phonological, morpho-syntactic, syntactic, semantic, and other fields [58]. 
Each field may contain several sub-fields. In a lexical database which realizes a machine-
readable lexicon for real-time NLP, the lengths of the fields and sub-fields are usually fixed to 
allow efficient random access to them. This is where a computational lexicon is distinguished 
46 
from a dictionary in which the format of lexical entries is mostly irregular and hence the access 
of the lexical fields for a word (lexeme) is sequential. Typically, a dictionary contains much 
free text including definitions, examples, cross-reference words, and others. 
Generally, there are two basic conceptions about the form of the items which serve as access 
keys in a lexicon. One is minimal listing hypothesis [15] which only lists lexemes and results 
in a root lexicon. A lexeme may have several variants, e.g., in English, the words: produces, 
produced, producing, producer, productive and production are variants of the lexeme 
produce, and the words: shorter, shortest and shortly are variants of the lexeme short. 
The other is full listing hypothesis which lists all possible words of a language and results in 
a full-form lexicon. A root lexicon is more compact and requires a rule system to process 
the variants of lexemes, while a full-form lexicon is more computationally efficient in terms of 
lexical access and more user-friendly in terms of lexicon editing and extension [58]. Therefore, a 
hybrid of the two conceptions is often adopted in many computational lexicon applications. In 
the following, the term access key is used to stand for either word or lexeme in a computational 
lexicon no matter whether it is a root or full-form lexicon. 
There are several models of lexical access in a computational lexicon. Our ANN-based 
query system for NLP lexicon is based on the search model of lexical access (indirect access) 
[36, 58]. In such a model, a text-based computational lexicon which associates every access key 
with its lexical specification contains two organizations: one is called master file which stores 
entries of lexical specifications, and the other is called access file which consists of pairs of 
(<access key>, <lexical pointer>). The access keys are organized to allow location of desired 
access keys and their associated lexical pointers efficiently. The lexical pointers point to the 
lexical specifications of their corresponding entries in the master file. The process of lexical 
access in the search model is similar to that of locating a book in a library. To locate a book 
from a collection of shelves (the master file) in a library, the book catalog (the access file) is 
searched using author name(s) and/or book title to find the call number (a pointer indicating 
the location) of a desired book. 
47 
A noise-tolerant neural associative memory which can efficiently support the process of 
search and retrieval of desired lexical pointers for a text-based machine-readable English lexicon 
is designed as follows. Suppose English letters of the lexicon are represented using 8-bit ASCII 
codes (extended to 8 bits by padding each 7-bit ASCII code with a leading 0). Assume the 
maximal length of an English word is L letters. Since each letter is represented by an 8-bit 
ASCII code, SL input neurons are used in the ANN memory. Each binary bit of the ASCII 
input is converted into a bipolar bit ip by expression Xp = 2x6 — 1 before it is fed into the 
ANN memory to execute a query. (This is motivated by the relative efficiency of the hardware 
implementations of binary and bipolar neural associative memories - see Chapter 2 for details). 
Let the output (the lexical pointer) be represented as an A/-bit binary vector which can access 
at most 2^ lexical specifications in the lexical database. So, the ANN memory uses M output 
neurons. 
For every associative ordered pair of an access key and a lexical pointer, a hidden neuron 
is used in the ANN memory. Suppose there are k such pairs. Then, the ANN memory uses 
k hidden neurons. Every access key is represented by padding its corresponding English word 
with trailing spaces and each binary bit xj of every access key is converted into a bipolar bit 
Xp by expression Xp = 2z(, — 1 to be stored in the ANN memory. For example, if an English 
word has j letters {j < L), then the first j letters of its corresponding access key are from the 
English word and the last L — j letters of the access key art paces. The reason for such 
padding will become obvious in the coming examples. The ASCII code for the special symbol 
space is 20i6 = 0010 OOOO2. During storage of an associated pair, the connection weights 
are set as explained in Section 2.2.6. Note that the input and output of an associated pair are 
represented in bipolar and binary values respectively. During recall, the thresholds of hidden 
neurons are adjusted for each query as outlined in Section 2.3.1 (where for each query, the 
value of Ua can be set either by centralized check on the number of letters of the input access 
key, or distributed circuitry embedded in input neurons). The precision level p is set at 0 for 
this associative ANN memory. 
48 
3.2.1.1 Examples of query processing in tiie neural lexicon 
The following examples illustrate how the proposed ANN memory for NLP lexicon retrieves 
desired lexical pointers by processing a query which may contain a partially specified input 
(target access key). 
• Example 1 (exact match): Suppose the lexical pointer of the word product is to be 
retrieved from the ANN memory. Then, the first 7 letters of the target access key to be 
searched are p, r, o, d, u, c and t, and the last L — 7 letters are spaces. In this 
case, no letter of the target access key is unavailable. Therefore, in the ANN memory, 
the threshold set at all hidden neurons is L x 8 = 8L. Suppose a hidden neuron i is 
used for the association of this access key and its associated lexical pointer. When the 
target access key is presented to the ANN memory, only hidden neuron i has net input 
of 0 and other hidden neurons have net input less than 0 (see Section 2 for details). So, 
hidden neuron i is activated to recall the desired lexical pointer using the weights on the 
2nd-layer connections associated with hidden neuron i. 
• Example 2 (prefix match): Suppose the lexical pointer(s) of the word(s) matching the 
pattern product* is to be retrieved from the ANN memory, where the symbol * means 
the trailing English letters starting from that position are unavailable. In this case, 
the last L — 7 letters of the target access key are viewed as unavailable, only the first 
7 letters are available, and its first 7 letters are p, r, o, d, u, c and t. Therefore, 
in the ANN memory, only the first 7 x 8 = 56 input neurons have input value either 
1 or -1, the other input neurons are fed with 0, and the threshold set at all hidden 
neurons is 7 x 8 = 56. Suppose, in the lexicon, product, production, productive, 
productively, productiveness and productivity are the words the first 7 letters of 
which match the pattern product*. In this case, six hidden neurons are used for the 
associations of these six access keys and their lexical pointers respectively in the ANN 
memory. When the partially specified target access key is presented to the ANN memory, 
only these six hidden neurons have net input of 0 and other hidden neurons have net input 
49 
less than 0. So, these six hidden neurons get activated one at a time to sequentially recall 
the associated lexical pointers using the weights on the 2nd-layer connections respectively 
associated with these six hidden neurons. 
• Example 3 (partial match); Suppose the lexical pointer(s) of a noisy 7-letter word 
pro??ct is to be retrieved from the ANN memory, where the symbol ? means the 
English letter at that position is unavailable. In this case, 2 of the letters (the 4th and 
5th letters) of the target access key are viewed as unavailable, its first 3, 6th and 7th 
letters are p, r, o, c and t respectively, and the last L—7 letters are spaces. Therefore, 
in the ANN memory, the input neurons representing the 4th and 5th input letters are 
fed with 0, other input neurons have input value either 1 or -1, and the threshold set at 
all hidden neurons is (L — 2) x 8 = 8{L — 2). Suppose, in the lexicon, product, project, 
and protect are the only 7-letter words which match the pattern pro??ct. Therefore, 
three hidden neurons are used for the associations of these three access keys and their 
lexical pointers respectively in the ANN memory. When the partially specified target 
access key is presented to the ANN memory, only these three hidden neurons have net 
input of 0 and other hidden neurons have net input less than 0. So, these three hidden 
neurons get activated one at a time to sequentially recall the associated lexical pointers 
using the weights on the 2nd-layer connections respectively associated with them. 
The large number of hidden neurons in such an ANN module poses a problem for hardware 
realization because of the large fan-out for input neurons and large fan-in for output neurons. 
One solution to this problem is to divide the whole module into several sub-modules which 
contain same number of input, hidden, and output neurons. These sub-modules are linked 
together by shared input and output bus (see Figure 3.1). Such a bus topology also makes 
it possible to easily expand the size of the ANN memory. The /-dimensional array structure 
shown in Figure 3.1 can be easily extended to 2 or ^dimensional array structures. 
50 
output output bus 
input input bus 
module n module 1 module 2 
IXi 
o •••o 
IXI 
IXI 
o •••o 
IXI 
IXI 
o —o 
IXI 
output neurons 
hidden neurons 
input neurons 
Figure 3.1 A modular design of tiie proposed ANN memory for easy expan­
sion. This 1-dimensionaI array structure can be easily extended 
to 2 or S-dimensional array structures. 
3.2.2 Realization of a library query system using a neural associative memory 
A neural associative memory that can be used to support a library system queried by name 
can be designed as follows: Suppose the input is a name (provided in a format with last name 
followed by first name) of an author, and characters that appear in the name are represented 
using 8-bit ASCII codes. Assume the length of both last and first name are truncated to at 
most L characters each. Since each ASCII code consists of 8 binary bits, 16L input neurons 
are used in the ANN memory. The first 8L input neurons are for last name and the last SL 
neurons for first name. Each binary bit xi, of the ASCII input is converted into a bipolar bit Xp 
by expression Xp = 2xb — 1 before it is fed into the .ANN memory module for database queries. 
Let output be a M-dimensional binary vector pointing to a record in the library database 
that contains information about a volume (or the binary vector can encode information about 
a volume directly). The output binary vector in turn can therefore be used to locate the 
title, author, call number and other relevant information about a volume. Using M output 
neurons, we can access at most 2^^ records from the library database. Each hidden neuron in 
51 
the associative memory module is used to realize an ordered pair associating an author's name 
with an M-bit pointer that points to a record which contains information about a corresponding 
volume. The last and first names of an author of an associated pair are represented by padding 
the names with trailing spaces and each binary bit xg of the padded names is converted into 
a bipolar bit Zp by expression x-p = 2x6 — 1 to be stored in the ANN memory. For example, 
if Smith John is the name part of an associated pair, the first 5 letters for the last name part 
of the associated pair are Smith and the other L — 5 letters are spaces, and the first 4 letters 
for the first name part of the associated pair are John and the other L — 4 letters are spaces. 
During storage of an associated pair, the connection weights are set as explained in Section 
2.2.6. Note that the input and output of an associated pair are represented in bipolar and 
binary values respectively. During recall, the thresholds of hidden neurons are adjusted for 
each query as outlined in Section 2.3.1. The precision level p is set at 0 for this associative 
ANN memory module. 
The following cases illustrate how the ANN-based library query system retrieves desired 
record pointers by processing a query which may contain a partially specified input. 
• Case 1: Suppose a user enters Smith *" to search for the books written by authors with 
last name Smith. In this case, that part of input for first name is viewed as unavailable, 
the first 5 letters for the part of input for last name are S, m, i, t, and h. and the 
other L — o letters are spaces. Therefore, in the ANN memory, the first 8 x £ = 8L input 
neurons have input value either 1 or -1, the last 8x L = 8L input neurons which together 
represent the part of input for first name are fed with 0, and the threshold set at all 
hidden neurons is 8 x L = SL. Suppose the library database contains k volumes written 
by authors with last name Smith. In this case, the ANN memory module contains k 
hidden neurons for these k volumes (one for each volume written by an author whose 
last name is Smith). During the recall process all these hidden neurons will have net input 
of 0 and other hidden neurons have net input less than 0 (see Chapter 2 for details). The 
neurons with non-negative net input get activated one at a time to sequentially recall 
the desired M-bit pointers pointing to the books written by authors with the specified 
52 
last name. 
• Case 2: suppose a user enters "» JohrT to search for the books written by authors with 
first name John. In this case, that part of input for last name is viewed as unavailable, 
the first 4 letters for the part of input for first name are J, o, h, and n, and the other 
L — 4 letters are spaces. Therefore, in the ANN memory, the last 8 x L = 81- input 
neurons have input value either 1 or-1, the first 8xL = 8L input neurons which together 
represent the part of input for last name are fed with 0, and the threshold set at all hidden 
neurons is 8 x L = 8L. The recall of the associated pointers proceeds as in Case 1. 
• Case 3: Suppose a user enters ""Smith J*" to search for the books written by authors 
with last name called Smith and first name beginning with a J. In this case, the rest of 
the letters of first name is viewed as unavailable, the first 5 letters for the part of input 
for last name are S, m, i, t, and h, and the other L — b letters are spaces. Therefore, 
in the ANN memory, the first 8 x (L + 1) = 8(L + 1) input neurons have input vaJue 
either 1 or -1, the last 8 x (L - 1) = 8(L — 1) input neurons are fed with 0, and the 
threshold set at all hidden neurons is 8 x (L + 1) = 8{L -f 1). The recall of the associated 
pointers proceeds as in Case 1. 
3.2.3 The implementation of case insensitive pattern matching 
It is rather straightforward to modify the proposed ANN-based query system to make it 
case-insensitive. The following shows ASCII codes of English letters, which are denoted in 
hexadecimal and binary codes. 
A = 41I6 = 0100 OOOI2, ... , Z = 5.4I6 = 0101 IOIO2 
a = 6I16 = 0110 OOOI2, ... , 2 = 7Ai6 = 0111 IOIO2 
The binary codes for the capital case and small case of every same English letter only 
differ at the 3rd bit counted from left hand side. If that bit is viewed as "'don't care" (or 
unavailable), this query system will be case insensitive. This effect can be achieved by treating 
the corresponding input value as though it was unavailable. 
53 
3.3 Comparison with Other Database Query Processing Techniques 
This section compares the anticipated performance of the proposed neural architecture for 
database query processing with other approaches that are widely used in current computer 
systems. Such a comparison takes into account the performance of hardware used in these 
systems and the process used for locating data items. It is assumed that the systems have 
comparable I/O characteristics which are not discussed here. First, the performance of the 
proposed neural network is estimated, based on current CMOS technology for realizing neu­
ral networks. Next, the operation of conventional database systems is examined, and their 
performance is estimated and compared to that of the proposed neural architecture. 
3.3.1 Performance of current electronic realization for neural networks 
Electronic hardware realizations of ANN have been explored by several authors [49, 50, 
57, 103, 106, 107, 120, 152, 184, 190]. Such implementations typically employ CMOS analog, 
digital, or hybrid (analog/digital) electronic circuits. Analog circuits typically consist of pro­
cessing elements for multiplication, summation and thresholding. Analog CMOS technology 
is attractive for realization of ANN because it can yield compact circuits that are capable of 
high-speed asynchronous operation [48]. [184] reports a measured propagation delay of 104ns 
in a digital circuit with each synapse containing an 8-bit memory, an 8-bit subtractor and an 
8-bit adder. [50] reports throughput at the rate of lOMHz (or equivalently, delay of 100ns) in 
a Hamming Net pattern classifier using analog circuits. [106] describes a hybrid analog-digital 
design with 5-bit (4 bits -f- sign) binary synapse weight values and current-summing circuits 
that is used to realize a 2-layer feed-forward ANN with a network computation delay of less 
than 20ns. 
The Ist-layer and 2nd-layer subnetworks of the proposed neural architecture for database 
query processing are very similar to the Ist-layer subnetwork of a Hamming Net respectively, 
and the neural architecture with 2 connection layers in the proposed ANN is exactly same 
as that implemented by [106] except [106] uses discretized inputs, 5-bit synaptic weights, and 
sigmoid-like activation function. The proposed ANN uses bipolar inputs, weights in { — 1,0,1} 
54 
and binary hardlimiter as activation function. Hence the computation delay of the proposed 
ANN can be expected to be at worst of the order of 100 ns and at best 20 ns given the current 
CMOS technology for realizing ANN. 
The development of specialized hardware for implementation of ANN is still in its early 
stages. Conventional CMOS technology that is currently the main technology for VLSI im­
plementation of ANN is known to be slow [92, 104]. Other technologies, such as BiCMOS. 
NCMOS [92], pseudo-NMOS logic, standard N-P domino logic, and quasi N-P domino logic 
[104], may provide better performance for the realization of ANN. Thus, the performance of 
the hardware implementation of ANN is likely to improve with technological advances in VLSI. 
3.3.2 Analysis of query processing in conventional computer systems 
Accessing information based on a key is central to information retrieval systems [37, 157, 
158] and database systems [186]. In relational database systems implemented on conventional 
computer systems, given the value for a key, a record is located efficiently by using key-based 
organizations including hashing, index-sequential access files and B-trees [186]. Such a key-
based organization usually contains two data structures: index files(s) and master file. In an 
index file, every key is organized and usually associated with a record pointer which points to 
a corresponding record in the master file which is typically stored in secondary storage devices 
like hard disks for large databases. Conventionally, estimated cost of locating a record is based 
on the number of physical block accesses of secondary storage devices [186] since the access 
latency with current cost-effective disk systems is around 5~10 ms {millisecond) and every one 
of the repetitive search steps which together facilitate locating a desired record pointer from 
index files (loaded into the main memory) takes only several CPU clock cycles. The clock 
cycle of current cost-effective CPUs is around 2~10 ns. With the large number of entries in 
index files of large databases and with the low price of current memory chips, master files of 
databases for real-time applications tend to be loaded into the main memory to avoid accessing 
records from low-speed secondary storage devices (compared to memory chips) and thus the 
cost of locating a desired record pointer can become a dominant cost for record retrieval in 
55 
large databases. 
The following compares the anticipated performance of the proposed neural associative 
memory with other approaches that are widely used in current computer systems for locating 
a record pointer associated with a given key. In the following analysis, it is assumed that all 
program and index files for processing queries using current computer systems are pre-loaded 
into the main memory. The effect of data dependency among instructions which offsets pipeline 
and superscalar effects and thus much reduces the average performance of current computer 
systems is not considered here. 
To simplify the comparison, it is assumed that each instruction on a conventional computer 
takes r ns on an average. For instance, on a relatively cost-effective 100 MIPS processor, a 
typiczd instruction would take 10 ns (The MIPS measure for speed combines clock speed, effect 
of caching, pipelining and superscalar design into a single figure for speed of a microprocessor). 
Similarly, we will assume that a single identification and recall operation cycle by a neural 
associative memory takes a ns. Assuming hardware implementation based on current CMOS 
VLSI technology, a is around 20~100 ns. Table 3.1 summarizes from following analysis 
the estimated performance of the proposed neural associative memory and other techniques 
commonly used in conventional computer systems for locating a desired record pointer. The 
summary assumes that the value of the key is given, the data structures and programs are 
loaded into the main memory of the computer systems used, index search occurs in a balanced 
binary tree of (2^ — 1) records, and partial match occurs in a k-d-tree of N records. L is the 
total number of bytes of a key, n is the data bus width of the computer systems used, h is 
the average number of executed instructions in a hashing cycle, r is the average time delay 
for executing an instruction, b is the average number of executed instructions in a comparison 
cycle for every n bits in a binary search cycle, ce is the time delay of the proposed neural 
memory, K is the number of index fields used in the k-d-tree, and J is the number of index 
fields specified in a query criterion. Table 3.2 summarizes the capabilities of the proposed 
neural associative memory and other techniques commonly used in conventional computer 
56 
Table 3.1 A comparison of the estimated performance of the proposed neu­
ral associative memory with that of other techniques commonly 
used in conventional computer systems for locating a record 
pointer in key-based organizations 
Method Estimated time (ns) 
hashing [SZ/n] h T 
index search { M  - l ) \ A L / n ] b T  
ANN memory a 
k-d-tree (partial match) 0 ( f ^ ( K - J ) I K )  
Table 3.2 A comparison of the capabilities of the proposed neural asso­
ciative memory with those of other techniques commonly used 
in conventional computer systems for exact match and partial 
match 
Method Exact match Prefix match Partial match 
hashing efficient unable unable 
index search efficient efficient inefficient 
.A.NN memory efficient efficient efficient 
k-d-tree satisfactory satisfactory inefficient 
systems for exact match, prefix match and partial match mentioned in Section 3.2.1. 
3.3.2.1 Analysis of locating a record pointer using hashing functions 
Hashing structure is the fastest of all key-based searching techniques for locating a record 
pointer for a single record. However, although it is effective in locating a single record by 
exact match (e.g., example 1 of Section 3.2.1), it is inefficient at or incapable of locating 
related records in response to a partially specified input (e.g., examples 2 and 3 of Section 
3.2.1). Let us consider the time needed for locating a record pointer using a hash function 
in current computer systems. Commonly used hash functions are based on multiplication, 
division and addition operations [87, 163]. In hardware implementation addition is faster than 
multiplication which in turn is far faster than division. Assume that computing a hashing 
function on a key with a length of L bytes (characters) takes [8L/n] cycles using a processor 
with an ra-bit data bus and every cycle takes h instructions. Then, the estimated computation 
time for locating a record pointer is fS/z/n] h r . Other overheads in computing a hashing 
57 
function in such systems include the time for handling the potential problem of collisions in 
hash functions. If a single-CPU 100 MIPS processor with a 32-bit data bus is used, it is 
expected that the total computation time for locating a record pointer will typically be in 
excess of 100ns (If L = 15 and h = 5, the total computation time is [8 x 15/32] x 5 x 10 as = 
[120/32] x50 ns= 200 ns). 
3.3.2.2 Analysis of locating a record pointer using index search 
A perfectly balanced binary search tree is another popular, efficient data structure used 
in conventional database systems to locate a single record by exact match (e.g., example 1 of 
Section 3.2.1) or several related records by partial match (e.g., example 2 but not example 3 
of Section 3.2.1). Assume every non-terminal node in a perfectly baJanced binary search tree 
links two child subtrees and there are (2^ — 1) nodes in the tree. Assume the length of the 
index key is L bytes (characters). The average number of nodes visited for locating a desired 
key would be ^ 2^1« M — 1. On an average, every visit takes SL)/n\ = f4L/n] 
comparison cycles for a processor with an n-bit data bus. Suppose every comparison cycle 
takes b instructions. Then, the estimated computation time for locating a desired record 
pointer is [M — l)[4L/n] 6 r. If L = 15, and a 100 MIPS processor with a 32-bit data bus 
is used, the comparison cycle for every 32 bits takes 5 instructions on average, and there are 
2^® — 1 = 65.535 records (the number of words a native English speaker knows is estimated 
to be between 50,000 and 250,000 [4]), then the overhead for locating a desired record pointer 
is about (16 — 1) X [4 X 15/32] x 5 x 10 ns = 1500 ns which compares unfavorably with 100 
ns. Note that this is only the cost of locating a record pointer for a single record. The cost 
of locating several record pointers of related records using user-entered data in an index file 
containing multiple index fields is examined in next section. 
3.3.2.3 The cost of partial-match queries 
One of the most commonly used data structures for processing partial-match queries on 
multiple index fields is k-d-tree [12]. It can provide approximately satisfactory performance 
58 
for locating a single record by exact match or several related records by partial match. In the 
worst case, the number of visited nodes in an ideal k-d-tree of N nodes (one for each record 
stored) for locating the desired record pointers for a partial-match query is 
~ - 1] w (3.1) 
where K is the number of index fields used to construct the k-d-tree, and J out of K index fields 
are explicitly specified by a user query. For typical values of N, K, and J, the performance of 
such systems is far worse than that of the proposed ANN based model according to expression 
3.1. 
3.4 Summary and Discussion 
Artificial neural networks, due to their inherent parallelism and potential for noise toler­
ance, offer an attractive paradigm for efficient implementations of a broad range of information 
processing tasks. In this chapter, we have explored the use of artificial neural networks for 
pattern-based (key-based) query processing in large databases. The use of the proposed af>-
proach was demonstrated using the examples of a library query system and a query system for 
text-based machine-readable lexicon used in natural language processing. The performance of 
a CMOS hardware realization of the proposed neural associative memory for database query 
processing system was estimated and compared with that of other approaches which are widely 
used in conventional databases implemented on current computer systems. The comparison 
shows that ANN architectures for query processing offer an attractive alternative to conven­
tional approaches, especially for dealing with partial-match queries in large databases. With 
the need for real-time response in language translation and with the explosive growth of the 
Internet as well as increased use of large networked databases over the Internet, efficient ar­
chitectures for high-speed information retrieval, associative table lookup, message routing and 
database query processing have assumed great practical significance. 
59 
4 NEURAL ARCHITECTURES FOR ELEMENTARY LOGICAL 
INFERENCE 
4.1 Introduction 
Inference often involves tasks which look for interesting patterns in the input or memory 
to solve questions such as "^What is the most likely answer?", "Is there sufficient evidence to 
adopt a conclusion or is more evidence needed?" [42, 61, 191], etc. Such tasks are important for 
inference from partial information, and they generally involve a process of pattern recognition 
by way of best, partial, and/or exact matches. Artificial neural networks, due to their inherent 
massive parallelism, potential for fault tolerance and adaptation capability through learning, 
have attracted extensive interest for robust and efficient implementations of logical inference 
systems. Many of the systems proposed in the literature are motivated by the need for mas­
sively parallel architecture for AI applications, and some of them are proposed to model human 
cognitive processes robustly. In particular, they explore neural mechanisms for variable binding 
to facilitate complex reasoning based on predicate logic [5. 31. 95, 175, 176]; and connectionist 
realizations of production system [182], expert systems [42, 43], hybrid knowledge processing 
systems [136], semantic networks [68, 167], frame representation [79], planning [145, 188], non­
monotonic reasoning [142], legal reasoning [148], commonsense reasoning [177, 178], and logical 
theorem proving [141]. This chapter explores how neural architectures for binary partial pat­
tern recognition can be extended for elementary logical inference based on propositional logic. 
The proposed neural architectures, like the ones proposed in Chapters 2 and 3 for associa­
tive memory and query processing, exploit the massively parallel computational capability of 
artificial neural networks. 
60 
Prepositional logic, which typically operates on propositions and logical connectives: AND. 
OR, as well as negation, is basic to logical inference. For this reason, it is customary to use 
propositional logic for demonstrating the feasibility of new tools for logical inference. This 
chapter proposes a method based on geometrical/mathematical analysis to systematically de­
sign neural architectures for realizing logical ANDs, logical ORs, and DNF (Disjunctive Normal 
Form) propositions (sum of products). A DNF proposition is a disjunction of conjunctions. 
The evaluation of a conjunction corresponds to that of a logical AND function, and the evalua­
tion of a disjunction corresponds to that of a logical OR function. The evaluation of logical AND 
and OR functions can be respectively realized by the AND and OR neural assemblies proposed in 
this chapter through a process of pattern recognition. It is known that any proposition can be 
represented in DNF. Therefore, any proposition can be realized by a 2-layer neural architecture 
assembled from an OR neural assembly and a fixed number of AND neural assemblies. The rest 
of the chapter is organized as follows: 
• Section 4.2 develops two types of neural assemblies for the recognition of binary partial 
patterns. 
• Section 4.3 develops a general AND neural assembly which can be used to realize any 
arbitrary logical AND function of a finite number of Boolean variables. 
• Section 4.4 develops a general OR neural assembly which can be used to realize any 
arbitrary logical OR function of a finite number of Boolean variables. Then, a monotone 
OR neural assembly is derived. 
• Section 4.5 discusses how to use AND and OR neural assemblies to realize arbitrary Boolean 
functions. 
• Section 4.6 concludes with a summary of the chapter and a brief discussion. 
4.2 Neural Assemblies for the Recognition of Partial Patterns 
This section develops two types of neural assemblies for the recognition of binary partial 
patterns. One of them is used for the recognition of patterns which contain a specific sub-
61 
pattern, and the other is used for the recognition of patterns which don't contain a specific 
sub-pattern. Let us call the former the neural assembly for inclusive pattern recognition, and 
the latter, the neural assembly for exclusive pattern recognition. The two assemblies are used 
to build the AND neural assembly proposed in Section 4 and the OR neural assembly proposed 
in Section 4 respectively. 
4.2.1 A neural assembly for inclusive pattern recognition 
Let u =< ui,...,u„ > be a binary vertex (vector) of dimension n, where u, G {0,1} 
for 1 < j < n. Let ifj'" be an n-dimensional separating hyperplane which can be used 
to implement a I-layer Perceptron to distinguish the vertex u from all other n-dimensional 
vertices. According to expression 2.12, among a set of possible expressions for we 
choose: 
fl's " = ^ u,- = 0 (4.1) 
t=i i=i 
Therefore, in the n-input, 1-output Perceptron implemented to recognize the vertex u 
n 
• the threshold of the output neuron is set as and 
i=l 
• the connection weight from the ith input neuron to the output neuron is set as 2u, — 1. 
Now consider a binary vector u of dimension m, where m > n. Suppose, in a system of 
m variables, only the values of n of m components of vector u are of interest. For two given 
binary vectors u and -u of dimensions n and m respectively, only whether uy, = = 
u,,.... Uj„ = u„ are concerned, where 1 < n < m and 1 < ji < j2 < • • • < jn < "i. Let us call 
=  { J 1 J 2 ,  — J n }  the interest set J'^, and ti(J") =< > the partial vector 
of the binary vector u. Note that several interest sets could be defined concurrently for a given 
problem in an m-dimensional binary space. In the following, the expression for the separating 
hyperplane (expression 4.1) in an n-dimensional binary space is re-defined as a separating 
hyperplane in an m-dimensional binary space to implement a 1-layer Perceptron to 
recognize all the m-dimensional binary vectors whose J"-set partial vectors equal the 
62 
given n-dimensionai binary vector u = =< uy,, , uy„ >=< u\,U2,: 
m  m  n  
^ ^ (2tt. - 1)1,, + 5;; 0 • X. - 5; Ufc = 0 (4.2) 
ji^J" k 
Let Wi be the connection weight from the ith input neuron to the output neuron and 9 be 
the threshold of the output neuron in a 1-layer, 1-output Perceptron. Then, in the m-input, 
1-output Perceptron, 
• Vj,- e y" & 1 < z < n, W j ^  = 2u, — 1, 
• Vi ^ J" & 1 < I < m, Wi = 0, and 
n  
• e = Y^uk. 
k=l 
The values from those input neurons which are not in the interest set J" will not affect the 
net input of the output neuron since the weights on the connections from those input neurons 
are set as 0. These connections together act as a don't-care filter. 
For example, suppose one wants to use a 1-layer Perceptron to recognize all the 5-dimensional 
binary vertices whose 1st, 3rd, and 5th components are 1, 0, and 1 respectively. Then, the 
corresponding interest set would be = {1,3,5}, and the implemented Perceptron is shown 
in Figure 4.1. 
4.2.2 A neurad assembly for exclusive pattern recognition 
For a given n-dimensional binary vertex u =< uj,..., Un >, all n-dimensional binary vertices 
can be partitioned into n +1 parallel layers according to their Hamming distance p to the given 
binary vertex u (Theorem 2.1). Those R + 1 layers are respectively on n + 1 mutually parallel 
n-dimensional hyperplanes Hp'^'s (expression 2.12), 0 < p < ra, where 
- 1)1, - (^ Ui - p) = 0 
t=i t=i 
and u is the only vertex of the first layer which is on where 
^0-  1 )1. -  ^  Ui =  0  (4.3) 
t=i i=i 
63 
y 
e = 2  
X ,  X j  X 3  * 4  * 5  
Figure 4.1 A 1-layer Perceptron which recognizes all the 5-<iimensional 
binary patterns that contain the partial pattern 
where f denotes don't care 
Let II =< ¥I,...,IZTI > be the complement vertex of binary vertex u, i.e., Ui + u, = 1 for 
I < i < n. Then u is the only vertex of the (n+l)th layer which is on where 
/ r r  =  ^ ( 2 « . - l ) x . - ( f ; ; u . - r z ) = 0  ( 4 . 4 )  
1=1 t=I 
The hyperplane which is defined as 
^n-i = - l)x.- - (2^ ti,- - ra + 1) = 0 (4.5) 
t=l i=l 
can be used to implement a 1-layer Perceptron to distinguish the binary vertex u from all 
other n-dimensional binary vertices (i.e. the Perceptron recognizes all the n-dimensional binary 
vertices which are not u) by setting 
n 
• the threshold of the output neuron as ^ u, — n + 1, and 
t=i 
• the connection weight from the ith input neuron to the output neuron as 2u, — 1 
in the n-input, 1-output Perceptron. 
Now consider a binary vector u of dimension m, where m> n. Suppose, only the values of 
n of m components of vector u are of interest and an interest set 7" = {ji, ja, —,jn} is defined. 
The expression for the hyperplane H^'^i in an n-dimensional binary space is re-defined as a 
hyperplane in an m-dimensional binary space to implement a 1-layer Perceptron to 
recognize all the 2"*"" m-dimensional binary vectors whose J"-set partial vectors don't equal 
64 
the n-dimensional binary vector u. Then, 
m  m  71 
{2ui - l)ij. + 0 • X, - (53 Ufc - n + 1) = 0 (4.6) 
ji€J" k 
and, in the 1-layer, m-input, 1-output Perceptron, 
• Vji 6 J" & 1 < I < n, Wj, = 2u, — 1, 
• Vi ^ y" & 1 < I < m, Wi = 0, and 
71 
• g = — ra + 1. 
k=l 
For example, suppose one wants to use a l-layer Perceptron to recognize all the 5-dimensional 
binary vertices whose 1st, 3rd, and 5th components are not 1, 0, and 1 respectively. Then, the 
in Figure 4.2. 
4.3 A Neural Assembly for Executing a Logical AND (AND Neural Assembly) 
This section develops an AND neural assembly which can realize any arbitrary logical AMD 
function of a finite number of Boolean variables. First, we develop notations to represent 
Boolean variables (atomic propositionaJ variables) and logical AND expressions to facilitate 
such a realization. Let -"U, be the negation of the Boolean variable u,. Further, let -ivi be 
denoted by v°, and U,- by vj. Then a logical expression UI A -1U2 A V3 (a conjunction of three 
Boolean variables) can be denoted as v} A U" A U3. Let v =< vi,..., Vn > and R =< 21,..., >, 
where u,, r, 6 {0,1} for 1 < i < n. Then, for a logical AND function denoted by C*'*(u) = 
I7J' A UJ' • • • A v^", we have 
0 if u is any other n-dimensional binary vertex 
The evaluation of the logical AND function C"''(v) can be viewed as a process of binary 
pattern recognition. Thus, it can be realized by a 1-layer Perceptron that implements the 
corresponding interest set would be J® = {1,3,5}, and the implemented Perceptron is shown 
1 if V 
C"-'(r;) = (4.7) 
65 
Figure 4.2 A 1-layer Perceptron which recognizes all the 5-dimen-
sional binary patterns that don't contain the partial pattern 
<1,?,0,?,1>, where ? denotes don't care 
hyperplane to recognize the binary vertex z. Let be used for H^'^. Then, according 
to expression 4.1 and its associated Perceptron implementation, 
Bmd = nr = - 1)1. - E •-. = 0 
t= l  t= l  
(4.8) 
and the logical AND function can be realized by a 1-layer Perceptron with n input 
neurons and one output neuron. The corresponding Perceptron has 
n 
• the threshold of the output neuron set to ^ Zi, and 
1=1 
• the connection weight from the ith input neuron to the output neuron set to 2r, — 1. 
For example, suppose v =< Ui. uj, U3 > and C(u) = uj A -<V2 A U3 = t7i A u® A U3. Then, we 
have 
/ 
1 ifu=<1.0,1> 
(4.9) 
0 if t; is any other n-dimensional binary vertex 
and the corresponding Pe r c e p tron which realizes the logical AND function C(v) is shown in 
Figure 4.3. 
In order to be able to realize all possible logical AND functions in a system of m Boolean 
variables using their corresponding 1-layer Perceptrons, the expression for the separating hy­
perplane extended from an n-dimensional binary space to an m-dimensional binary 
space to recognize all the m-dimensional binary patterns whose partial patterns equal the 
C ( v )  =  
66 
Figure 4.3 An AND neural assembly which realizes the logical AND function 
C(vj 
n-dimensionaJ binary vector r for certain interest set y's, where m > n. Suppose v =< 
ui, U2. —1 i>m > is a binary vertex of dimension m. We define an interest set J" = {ji, j2i —i in}. 
1 < ii < Y2 < • • • < in < m. Let (u) = Vj^ A • A . Then 
(4.10) 
1 ift;(J")=z 
0 if v ( J " ' )  is any other n-dimensional binary vector 
The logical AND function (u) can be realized by a 1-layer Perceptron that implements 
the hyperplane to recognize all the m-dimensional binary vectors whose J"-set partial 
vectors equal to the n-dimensional binary vector r. Let • Then, 
according to expression 4.2 and its associated Perceptron implementation, 
H AND (4.11) 
IX&J" IT J" K 
and the logical AND function = V j ^  A UJ" can be realized by a 1-layer 
Perceptron with m input neurons and one output neuron. In the Perceptron, 
• Vj: € J" & 1 < t < 71, TWj, = 2zi — 1. 
• Vi ^ J" & 1 < z < m, Wi = 0, and 
• ^  = ^ Z k -
k=l 
67 
Such an AND neural assembly will be used as a building block to assemble the neural archi­
tectures for realizing Boolean functions represented in DNF representation (see Section 4). 
Examples of such a neural assembly will be shown in Section 4 to assemble a neural architecture 
which realizes a given DNF Boolean function. 
4.4 Neural Assemblies for Executing Logic ORs (OR Neural Assemblies) 
This section develops OR neural assemblies which can realize any arbitrary logical OR func­
tions of a finite number of Boolean variables. First, a general OR neural assembly is described. 
The assembly will be used a building block to assemble the neurai architecture proposed in 
Section 4 for realizing DNF Boolean functions. Then, a monotone OR neural assembly is de­
rived from the general OR neural assembly. The monotone OR neural assembly will be used as 
a building block to assemble the neural architecture proposed in Section 5.4.2.4 for realizing 
4.4.1 A general OR neurzil sissembly 
This subsection investigates how a l-layer Perceptron can realize a general logical OR func­
tion which contains negated Boolean variables. The notations used here follows that in Section 
4. Let DQ"(V) = L'l' V VP • • -VV^", where V is a logical connective OR, i;, "s are Boolean variables, 
V =< vi,..., Vn >, 2 =< 2i,.... Zn >, and u,, Zi £ {0,1} for 1 < i < n. Then, we have 
The logical OR function DQ'{V) can be realized by a l-layer Perceptron that implements 
the hyperplane to recognize all n-dimensional binary vertices which are not z. Let 
Hq^ be used for Then, according to expression 4.5 and its associated Perceptron 
implementation, 
NFA. 
0 i f  V = <  Z i , Z 2 , . . . , Z n >  
1 if u is any other n-dimensional binary vertex 
(4.12) 
n n 
f f S i  3  =  ^ ( 2 r . - l ) l i - ( 5 ; 2 i - n + l ) = 0  (4.13) 
:=1 t=l 
and, in the n-input, 1-output Perceptron which realizes DQ''{V), 
68 
the threshold of the output neuron is set to ^ z,- — n + 1, and 
«=i 
• the connection weight from the ith input neuron to the output neuron is set to 2zi — 1. 
For example, suppose v  =< ui, V 2 ,  V 3  >  and D ( v )  = uj V -tV2 V U3 = uj V u" V Then, we 
have 
0 if u =< 1,0,1 > 
D { v )  =  (4.14) 
1 if u is any other n-dimensional binary vertex 
and the corresponding Perceptron which realizes the logical OR function D ( v )  is shown in Figure 
4.4. 
In order to be able to realize all the possible logical OR functions in a system of m Boolean 
variables using their corresponding 1-layer Perceptron implementations, the expression for 
the separating hyperplane is extended from an n-dimensional binary space to an m-
dimensional binary space to recognize aJl the m-dimensional binary patterns whose partial 
patterns don't equal the n-dimensional binary vector J for certain interest set J"s, where 
m > n. Suppose v =< ui, uj,..., i;m > is a binary (Boolean) vertex of dimension vn. We 
define an interest set 7" = —lin}, 1 < ii < i2 < • • • < Jn < Let £)^'""^"(i5) = 
V • V Then 
J l  J2 I n  
f 0 i f  v { J ^ )  =  z  
(^) = \ . (4.15) 
[ 1 if v ( J ^ )  is any other n-dimensional binary vector 
The logical OR function [ V )  can be realized by a 1-layer Perceptron that implements 
the hyperplane to recognize the m-dimensional binary vectors whose J"-set partial 
vectors don't equal to the n-dimensional binary vector z. Let be used for 
Then, according to expression 4.6 and its associated Perceptron implementation. 
=  f i  (22. - l ) x j ,  + ^ 0 • Xi - (f; 2jt - n -M) = 0 (4.16) 
J. 6 J" iiJ" k 
and, in the m-input, 1-output Perceptron which realizes (U), 
• Vj, € 7" & 1 < z < n, Wj, = 2zi — 1, 
• Vi ^ J" & 1 < I < m, Wi = 0, and 
69 
Figure 4.4 An OR neural assembly which realizes the logical OR function 
D(v) 
k=l 
Figure 4.2 is a corresponding Perceptron implementation which realize the logical OR 
function y = D{x) = -•xi V 13 V -1X5 = 1° V x^ V x° in a. system which contains Boolean 
variables xi,  X2, 1 3 ,  X4, and 15: where x =< Xi, . . . , 1 5  > .  
4.4.2 A monotone OR neursd assembly 
Consider an n-variable Boolean expression represented as a monotone disjunction: 
uj V uo • • • V (4.17) 
where i;,'s are Boolean variables. Monotone disjunctions are simply disjunctions which don't 
contain negated Boolean variables. For example, viV V2 is a monotone disjunction, but ->viV U2 
is not a monotone disjunction. Let v =< ui,..., v„ > and D^{v) = v} V vj • • - V v^- Then, we 
have 
0 if u =< T" > 
1 if u is any other n-dimensional binary vertex 
The logical monotone OR function D^(v} is a special case of a general logical OR function 
^G*(") s =< 1" >. Then, according to the Perceptron implementation for Dq'{V) 
(which corresponds to expression 4.13), the logical monotone OR function D^{v) can be 
DM = (4.18) 
70 
realized by an n-input, 1-output Perceptron with the threshold of the output neuron being set 
as 1, and the connection weight from every input neuron to the output neuron being set as 1. 
In order to be able to realize all the possible logical monotone OR functions in a system of m 
Boolean variables using their corresponding 1-layer Perceptron implementations, where m > n. 
The logical monotone OR function D^{v) is extended from an n-dimensional Boolean space to 
an m-dimensional Boolean space, where m > n. Suppose u =< vi, V2,Vm > is a Boolean 
v e c t o r  o f  d i m e n s i o n  m .  A s s u m e  a n  i n t e r e s t  s e t  J "  =  { j i , j 2 ,  — , J n } ,  1  <  J i  <  7 2  <  •  •  •  <  J n  <  
is defined. Define (v) = V uj^ • V vj^. Then 
0 ifv(J")=<r> 
(4.19) 
1 if is any other n-dimensional binary vector 
• 771 4/** — \ The logical monotone OR function (v) is a special case of a general logical OR function 
(v) with 2 =< 1" >. Then, according to the Perceptron implementation for 
(which corresponds to expression 4.16), the logical monotone OR function Z3)J^"^"(t;) can be 
realized by an m-input, 1-output Perceptron with 
• E J" &: 1 <{< n, Wj^ = 1, 
• Vi ^ J" 1 < 1 < m, Wi = 0. and 
•  e  =  i .  
Such a general OR neural assembly will be used as a building block to assemble the neural 
architectures for realizing NFA in Section 5.4.2.4. 
4.5 A Neural Architecture for Realizing DNF Boolean Functions 
Let ->C, be the negation of the conjunction C,. Further, let -iC, be denoted by (C,)°, and 
C, by (C,)^ Let v =< ui,..., Vm > and C,- be defined on u for 1 < i < n. Then, a DNF Boolean 
function = Cf V . v can be realized by a 2-layer Perceptron. The first 
layer of the Perceptron consists of n m-input AND neural assemblies defined by expression 
4.11, and the second layer is an n-input OR neural assembly defined by expression 4.13. Each 
of the n AND neural assemblies is used to realize a conjunction C,, where 1 < i < n. 
71 
For example, let v =< ui, U2, vz, v^, vs >, Jf = {1,2,3}, and = {3,4,5}. Then, 
B ( v )  =  (ui A->t;2 A U3) V-"(ua A t;4 A Us) (4.20) 
=  ( u j  A u 5 A t ; ^ ) W ( t ; ^ A t ; ]  A t ; ^ ) °  ( 4 . 2 1 )  
^ v(c5.<i.i.i>.-^l(u))0 (4.22) 
where (O) = vi A ->U2 A V3 and C®''^^'^'^^"'2(U) = U3 A U4 A U5. The corresponding 
2-layer Perceptron which realize the DNF Boolean function B ( v )  is shown in Figure 4.5. 
4.6 Summary and Discussion 
Artificial neural networks, due to their inherent massive parallelism, potential fault tol­
erance and adaptation capability through learning, offer an alternative paradigm for robust 
and efficient implementations of logical inference systems. In this chapter, a method based 
on geometrical/mathematical analysis has been proposed for systematically designing neural 
architectures for elementary logical inference. Particularly, neural architectures for realizing 
logical ANDs, logical ORs, and DNF propositions have been synthesized by way of binary pattern 
recognition. 
The input to a Boolean function can be represented as a binary (bipolar) code. Therefore, 
the e\'aluation of a Boolean function can be viewed as a process of binary (bipolar) pattern 
recognition. It is known that every Boolean function can be represented as a DNF expression 
[43]. A DNF expression is a disjunction of conjunctions. The evaluations of conjunction and 
disjunction can be realized by the proposed AND and OR neural assemblies respectively. Hence, 
any Boolean function (except the constant 0) can be realized by a 2-layer neural architec­
ture (Perceptron) assembled from a fixed number of AND and OR neural assemblies. Besides, 
Perceptrons have space and speed advantages over DNF representations for representing and 
evaluating Boolean functions (see [43] for details). Since logical AND, logical OR, as well as 
DNF representation are essential to logical inference and Boolean functions are basic to many 
applications in science and engineering, we expect the proposed neural assemblies would find 
use in the construction of modular neural networks for a variety of applications. But, in or­
72 
E(v) 
0 = 0 
0 = = 3 (Two AND Assemblies) 
(An OR assembly) 
V2 ^3 Vj 
Figure 4.5 An neural architecture which realizes the DNF Boolean function 
E(v) 
der to apply neural networks to applications involving more complex logical inference, neural 
networks would need to be able to do variable binding, logical proof, unification, resolution, 
etc. 
It is worth pointing out that the derivation of AND and OR neural assemblies which operate 
on bipolar values is straightforward given the methods proposed in this chapter and the method 
proposed in Section 2.2.6 for the conversion between models using bipolar and binary inputs. 
We expect that the resulting bipolar AND and OR neural assemblies will be exactly equivalent 
to those proposed in [43]. Since an input value 0 can be used to stand for unknown in bipolar 
model which denotes true by 1 and false by -1, bipolar model is more flexible than binary 
model which denotes true by 1 and false by 0. 
73 
5 NEURAL ARCHITECTURES FOR SEQUENCE PROCESSING 
5.1 Introduction 
Artificial neural networks (ANN), due to their inherent parallelism, offer an attractive 
paradigm for efficient implementations of functional modules for symbol processing. This 
chapter focuses on systematic designs for neural network architectures for sequence processing 
which is essential to many practical applications involving symbol processing in computer 
science, linguistics, systems modeling and control, artificial intelligence, and structural pattern 
recognition. 
The capabilities of neural network models (in particular, recurrent networks of threshold 
logic units or McCulloch-Pitts neurons) in processing and generating sequences (strings defined 
over some finite alphabet) and hence their formal equivalence with finite state automata or 
regular language generators/recognizers have been known for several decades [83, 108, 117]. 
More recently, recurrent neural network realizations of finite state automata for recognition 
and learning of finite state (regular) languages have been explored by numerous authors [6, 20. 
33. 38, 45, 44, 77, 122, 129, 132, 133, 134, 159, 166, 192]. There has been considerable work 
on extending the computational capabilities of recurrent neural network models by providing 
some form of external memory in the form of a tape [194] or a stack [13, 27, 66, 116, 123, 144, 
161. 169, 174, 197]. To the best of our knowledge, to date, most of the research on neural 
architectures for sequence processing has focused on the investigation of neural networks that 
are designed to leam to handle sequence processing. 
This chapter presents designs of several modular ANN modules for basic sequence process­
ing. The ANN modules which are used as building blocks for the neural architectures proposed 
in Chapter 6 for syntax analysis include neural network architectures for realizing determin­
74 
istic finite automata, stacks, and deterministic pushdown automata. These ANN modules are 
systematically synthesized from the BMP modules proposed in Section 2. Besides, neural 
network architecture for realizing nondeterministic finite automata is proposed to explore the 
potential benefits of ANN in the design of high performance systems for parallel symbolic 
computing applications. The rest of the chapter is organized as follows: 
• The rest of Section 5.1 briefly discusses how to represent symbolic functions in terms of 
binary mappings to facilitate symbolic information manipulation via the proposed BMP 
module which operates on binary values. 
• Sections 5.2, 5.3, 5.4 and 5.5 respectively explore the systematic synthesis of neural 
network architectures for realizing deterministic finite automata, deterministic pushdown 
automata, stack and nondeterministic finite automata. 
• Section 5.6 concludes with a summary and a brief discussion. 
5.1.1 Symbolic functions and binary mappings 
In general, most of simple, non-recursive symbolic functions and table lookup functions can 
be viewed in terms of a binary random mapping fi : U V (expression 2.1). For example, // 
may define a symbolic mapping function /s : Fi x r2 • • • x Fr Ai x A2 • • • x Aj as described 
in Section 3.1.1. In this case, the operations of fs on its associated symbols can be viewed in 
terms of the binary mapping operations of // which in turn can be realized by a BMP module 
proposed in Section 2.2.5. 
Therefore, modular neural network modules for complex symbol processing can be syn­
thesized through a composition of appropriate primitive symbolic functions which are directly 
realized by suitable BMP modules. Two of basic ways of recursively composing composite 
symbolic functions from component symbolic functions (which may themselves be composite 
functions or primitive functions) are discussed here. Let / and g be two symbolic functions 
defined as follows: 
/:RI X R2 — x T p — > A i  X A2 — X As (5.1) 
75 
g :Ai X • - • X Aj —»• Ai X • • • X At (5.2) 
The composition of / and g is denoted by go f such that 
^ o / : Ti X • • • X Tr ->• Ai X • • • X Af (5.3) 
and for every (ai, • • •, Qp) in Fi x • • • x Fr 
5o/(Qi,---,ar) =5(/(oi.---.ar)) (5.4) 
Suppose fi is a symbolic function such that 
/, : Fj X - • - X Fr —)• A," for 1 < J < s (5.5) 
The composition c of symbolic functions g, /i, ..., /, is defined as: 
c : Fi X • • • X Fr -> Ai X • • • X Af (5.6) 
and for every (oi, • • •, Qr) in Fi x • • • x Fr 
C(Q;I, • • •, TTR) = g{Mai, •  -  • ,  O r ) ,  •  •  / , ( q i ,  •  •  • ,  Q r ) )  (5.7) 
The recursive processing of input strings of variable length (of the sort needed in lexical 
analysis and parsing) can be handled by composite functions / : F* —)• A", ^ ; A x F" —>• A. 
and c : A X F" -r A X A' which are respectively realized by the modular recurrent neural 
architectures proposed in this chapter and Chapter 6, where F" (A") denotes the set of all 
strings over the alphabet F (A). Here, function / denotes the recursive processing of input 
strings of variable length by a parser or a lexical analyzer (see Chapter 6); function g denotes 
the recursive evaluation of input strings of variable length by the extended transition function 
of a DFA (see Section 5); and function c denotes the recursive parsing of syntactically tagged 
input tokens by the extended transition function of an LR(1) parser (see Chapter 6). The 
functions /, g, and c that process input strings of variable length can be composed using 
symbolic functions /, g, c, output selector function, and string concatenation function by 
recursion on the length of the input string (See Section 5 for an example). Other recursive 
symbolic functions can also be composed using composition and recursion [140, 154, 195]. 
76 
The operation of a desired composite function on its symbolic input (string) can be fully 
characterized anal3rtically in terms of its component symbolic functions on their respective 
symbolic inputs and outputs. The component symbolic functions are either composite functions 
of other symbolic functions or primitive symbolic functions which are realized directly by 
appropriate BMP modules. This makes it possible to systematically (and provably correctly) 
synthesize any desired symbolic function using BMP modules. (Such designs often require 
recurrent links for realizing recursive functions such as the extended transition function (j of a 
DFA or a more complex recursive function as we shall see later and in Chapter 6). 
5.2 Neural Network Design for Deterministic Finite Automata (NN DFA) 
Deterministic finite automata (finite state machines) are a basic computing model which 
is essential to many science and engineering applications involving sequence processing. This 
section first briefly reviews the symbolic computing model for deterministic finite automata 
and then presents a method to systematically design neural network architectures for realizing 
deterministic finite automata [20]. 
5.2.1 Deterministic finite automata (DFA) 
A deterministic finite automaton is a 5-tuple M^FA = {QS^S,qo, F) [74], where Q is a 
finite non-empty set of states, F is a finite non-empty input alphabet, qo ^ Q is the initial state, 
F C Q IS the set of final or accepting states, and 5 : Q xV Q is the transition function. A 
finite automaton is deterministic if there is at most one transition that is applicable for each 
pair of state and input symbol. 
The extended transition function ^ of a DFA with transition function 5 is a mapping from 
Q X V to Q defined by recursion on the length of the input string as follows; 
• Basis; S{qi, c) = g,, where e is empty string. 
• Recursive step; S{q i ,  ua )  =  S{S(q i ,  u ) , a )  for all input symbols c € T and strings u  E F". 
The computation of the machine MQFA in state g,- with string w halts in state 6{q { ,w) .  
The evaluation of the function S(qQ,w) simulates the repeated application of the transition 
77 
function S required to process the string w from initial state qo. A string w is accepted by 
MDFA if ^(90. W) € F; otherwise it is rejected. The set of strings accepted by MDFA is denoted 
as L{MDFA) = € F}, called the language of MDFA-
A Mealy machine is a DFA augmented with an output function. It is defined by a 6-tuple 
Mufeaiy = (Qi T, A, J, A,9o) [74], where Q, T, 5, and qo are as in the DFA Mdfai A is a finite 
non-empty output alphabet, and A is output function mapping from Q x T to A. X{q, a) is the 
output associated with the transition from state q on input symbol a. The output of MMeaiy 
responding to input string aia2---an is output string A(9o,ai)A(gi,a2) •• •A(gn_i,an)i where 
9o, 9i, —, 9n is the sequence of states such that a,) = qi for 1 < z < n. 
5.2.2 Architecture of NN DFA 
A partially recurrent neural network architecture can be used to realize a DFA as shown in 
[20]. Its central concept is to use a BMP module to realize the transition function of a DFA. 
The neural representation in the BMP module is described as follows. 
• The input neurons are divided into two groups. One group of input neurons has no 
recurrent connections and receives the binary coded current input symbol. There are 
n = |'log2 |r|] such input neurons. The second group has m = [log2(|Q| + 1)] input 
neurons and holds the current state (coded in binary). Each input neuron in this group 
has a recurrent connection from the corresponding output neuron. 
• The output neurons together hold the next state (coded in binary). There are m = 
[log2(lQl + 1)] output neurons. 
• Every transition is represented as an ordered pair of binary codes. For each such ordered 
pair, a hidden neuron and its associated connections are used to realize the ordered pair 
in terms of binary mapping. Thus the number of required hidden neurons equals the 
number of valid transitions in the transition function. For example, suppose p,q E Q,a € 
r,5{p,a) = q is a. valid transition, and p, q as well as a are encoded as binary codes such 
that p=< >,q =< qi,...,qm > and a =< ai,...,a„ > where pi,qi,aj € {0,1} 
for 1 < t < m and I < j < n. Then the transition S{p, a) = 9 is represented as a binary 
78 
mapping ordered pair (< pi, ...,an >, < 91 ,—,  9m >) implemented by a BMP 
module (See Section 2). 
• An explicit synchronization mechanism is used to support the repetitive evaluation of 
the transition function 8 on input string of variable length. 
The transition function of a DFA can be represented as a 2-dimensional table with current 
state and current input symbol as indices. The operation of such a DFA involves repetitive 
lookup of the value for next state from the table using current state and current input symbol 
at each move until an error state or an accepting state is reached. Such a repetitive table 
lookup process involves content-based pattern matching and retrieval wherein the indices of 
the table are used as input patterns to retrieve the next state. This process can exploit the 
massively parallel associative processing capabilities of the neural associative memory proposed 
in Chapter 2. 
Figure 5.1 shows the neural network architecture for realizing a DFA. Let 0, 1, 2, ..., t 
denote a succession of points along the discrete time line. The current and next states are 
denoted by state{t) and state{t + 1) respectively. The current input symbol is denoted by 
input{t). This NN DFA module consists of two BMP modules, one accepting state trapping 
module (AST module) and three buffers. One buffer stores current state state{t), another 
stores input symbol input{t), and the other stores next state stalest -1- 1) which exists only 
logically but not physically. The first two buffers operate under synchronization control which 
enforces discrete time 0, 1, ..., t. The reset link resets the NN DFA to initial state. 
BMP module 1, called NN DFA transition module, realizes the transition function of a DFA. 
BMP module 2 is optional, and it allows the output of the NN DFA to be remapped from the 
output of BMP module 1. The AST module is optional and can be implemented by a BMP 
module. It enables BMP module 2 to produce an output only when the NN DFA goes into an 
accepting state. A connection from the AST module to upper-layer control would be needed 
to alert it when the AST module traps a rejecting state, i.e., when this NN DFA goes into a 
rejecting state. Let < 0"* > denote the encoded binary value of dead state {garbage state), a 
state which is not a final state and has transitions to itself on all input symbols. Note that any 
79 
output 
enable 
> state(t+l) 
input(t) 
state(t+l) 
state(t) 
AST module 
BMP module 2 
BMP module 1 
NN DFA transition module 
synchronization 
control 
reset 
Figu re 5.1 The proposed modular neural network architecture for DFA 
80 
unspecified transition will automatically have the next state coded as < 0"* > as a consequence 
of our design of a BMP module (see Section 2.2.5). This simplifies the implementation of a 
DFA, since any transition to rejecting state does not need to be implemented using a hidden 
neuron in the NN DFA transition module. 
5.3 Neural Network Design for Deterministic Pushdown Automata (NN 
DPDA) 
The capability of DFA is limited to recognition and production of the set of regular lan­
guages, the simplest class of languages in Chomsky hierarchy [74]. The capability of DFA can 
be extended by adding a stack. The resulting automata can recognize the set of determin­
istic context-free languages (DCFL), a more complex and widely used class of languages in 
Chomsky hierarchy [74]. This section describes a method to systematically synthesize neural 
network architectures for deterministic pushdown finite automata [20]. 
5.3.1 Deterministic pushdown automata (DPDA) 
A pushdown automaton MPQA is a- 7-tuple (Q, F, A, S, qo, ±, F) [74], where Q is a finite set 
of states, r is a finite input alphabet, A is a finite stack alphabet, go € Q is the initial state, 
A is a particular stack symbol called stack start symbol, FCQ is the set of final states, and 
5 is the transition function mapping from Q x (Pu {c}) xAtoQx A'. A pushdown automaton 
is deterministic if there is at most one transition that is applicable for each combination of 
state, input symbol and stack top symbol. We denote a DPDA by MDPDA- An input string is 
accepted if the automaton processes the entire string and ends in an accepting state with an 
empty stack. 
For the need of implementing a DPDA in a neural network, we let 5 map from Q x (T U 
{e}) xAtoQx {pop, push, noop} x (Au{*}) to allow stack operation being expressed explicitly 
during the computation of a DPDA, where * denotes a don't ceure value, {pop, push, noop} 
is the set of possible stack operations, and noop denotes no operation. 
81 
5.3.2 Architecture of NN DPDA 
A partially recurrent neural network architecture can be used to realize a DPDA as shown 
in [20]. Its central concept is to use a BMP module to realize the transition function of a 
DPDA. The neural representation in the BMP module is described as follows. 
• The input neurons are divided into three groups. The first group has m = [logjdQl + l)] 
neurons and holds the binary-coded current state. Each input neuron in this set has a 
recurrent connection from the corresponding output neuron. The second group receives 
the binary coded current input symbol and has n = flog2 |r| -t- 1] neurons. The third 
group receives the binary coded stack top symbol and has k = [log2 |A| + 1] neurons. 
The last two groups have no recurrent connections. 
• The output neurons are divided into three groups. The first group represents the binary-
coded next state and has m = [1052(1^1 +1)1 neurons. The second group has two neurons 
and represents the binary-coded stack operation. The third group has k = [log2 |A|-{-1] 
neurons and represents the binary-coded stack symbol to be pushed into the stack or a 
don't care (denoted as *) when the stack action to be performed is a pop. 
• Every transition is represented as an ordered pair of binary codes. For each such ordered 
pair, a hidden neuron and its associated connections are used to realize the ordered 
pair in terms of binary mapping. Thus the number of required hidden neurons equals 
the number of valid transitions in the transition function. For example, suppose p,q £ 
Q,a£ (ru{e}),a,/? 6 A,s € {pop.push.,noop},^(p,a,a) = {q,s,/3) is a valid transition, 
and p, q, a, q, 0 and s are encoded into binary vectors such that p =< pi,..., Pm >, 9 =< 
>, a. =< ai,....an >, a =< Q;i,...,afc >,/3 =< > and s =< Si,S2 >, 
where Pi,quaj,ai,0i,si,s2 € {0,1} for 1 < z < m, 1 < j < n, and I < I < k. Note 
that our representation of a transition of a DPDA is different from the conventional 
representation in that we express stack pop/push action explicitly. Stack push, pop, and 
noop actions are denoted by s =< 0,1 >, s =< 1,0 >, and s =< 0,0 > respectively. 
Then the transition 6(p, o, a) = (q, s, /3) is represented as the binary mapping ordered pair 
82 
(<Pi,...,p„i,ai,...,an,Q;i,...,Q:fc>,< Qi, Qm, Si, S2,0i,0k >] to be implemented by 
a BMP module (See Section 2.2.5). 
• An explicit synchronization mechanism is used to support the repetitive evaluation of 
the transition function on input string of variable length. 
The transition function of a DPDA can be represented as a 3-dimensional table with current 
state, current input symbol, and stzLck top symbol as indices. The operation of such a DPDA 
involves repetitive lookup of the value for next state from the table using current state, current 
input symbol and stack top symbol at ezich move until an error state or an accepting state is 
reached. Such a repetitive table lookup process involves content-based pattern matching and 
retrieval wherein the indices of the table are used as input patterns to retrieve the next state. 
This process can exploit the massively parallel associative processing capabilities of the neural 
associative memory proposed in Chapter 2. 
Figure 5.2 shows the proposed modular neural network architecture for realizing a DPDA. 
The current and next states are denoted by state{t) and state{t + 1) respectively. This NN 
DPDA module consists of three BMP modules, one AST module, one stack mechanism mod­
ule and four buffers. One buffer stores current state state{t), one stores current input symbol 
input(t), another stores stack top symbol stacktop^ and the other stores next state state{t -|-1) 
which exists only logicaily but not physically. The first three buffers operate under synchro­
nization control which enforces discrete time 0, 1, ..., t. The reset link resets the NN DPDA 
to Initial state. 
BMP module 1, called NN DPDA transition module, realizes the transition function of a 
DPDA. Each state transition is coded as an ordered pair of binary mapping codes. There 
are two push/pop connections from the NN DPDA transition module to the stack mechanism 
module. These links inform the stack mechanism module whether to pop or push. The AST 
module is optional and can be implemented by a BMP module. It enables BMP module 2 to 
produce output only when the NN DPDA goes into an accepting state. A connection from 
the AST module to upper-layer control would be needed to alert it when the AST module 
traps a rejecting state, i.e., when this NN DPDA goes into a rejecting state. BMP module 2 
83 
output 
push/pop 
enable 
I state(t+l) 
synchronization 
control input(t) state(t) 
state(t+l) 
stack 
top 
AST module 
BMP module 3 
BMP module 2 
stack 
mechanism 
module 
BMP module 1 
NN DPDA transition module 
reset input 
Figure 5.2 The proposed modular neural network architecture for DPDA 
84 
is optional, and it allows the output from the NN DPDA to be remapped from the output of 
the NN DPDA transition module. BMP module 3 is optional and provides remapping of stack 
symbol produced from the NN DPDA transition module. Note that any unspecified transition 
will have the next state < 0*" > given our implementation of a BMP module. 
5.4 Neural Network Design for Stack (NN Stack) 
This section first briefly discuss the symbolic computing model for stack and then presents 
a method to systematically design neural network architectures for realizing stacks [22]. 
5.4.1 Symbolic representation of stack 
A stack can be coded as a string over a stack alphabet, with its top element at one end 
of the string and its bottom element at the other end. Pop and push are the main actions of 
a stack. In the implementation of a stack, these actions can be performed by a DFA which 
is augmented with memory to store stack symbols which are accessed sequentially using a 
stack top pointer (SP) which points to the top symbol of the stack. The stack top pointer is 
maintained by the current state of the DFA, and the current action of the stack by the input 
to the DFA. Let A = { pop, push, noop } be the set of possible stack actions, C the set of 
possible stack configurations (contents), S the set of stack symbols. P = {0,1,2,.... n} the set 
of possible positions of stack top pointer, and n the maximal depth (capacity) of a given stack. 
Let X be stack bottom symbol and c - s denote the stack configuration after a stack symbol s is 
pushed onto the stack configuration c. Note that C = {a | a 61 -5' and | a |< n}, where | a [ 
denotes the number of stack symbols in the stack configuration a. Assume that the value of 
stack top pointer doesn't change on a noop action, and it is incremented on a push action and 
decremented on a pop action. The operation of a stack and the retrieval of stack top symbol 
f r o m  a  s t a c k  c a n  b e  c h a r a c t e r i z e d  b y  t h e  s y m b o l i c  f u n c t i o n s  f s t a c k  l A x S x C x P - i - C x P  
and frop : C x P —> 5 U {J.} respectively. They are defined as follows. 
85 
/stacit (push, S, C , p )  =  
/5tocit(noop, *,c,p) = 
/stacJk(pop. * , C ,p) = < 
(c-s,p+l) if s € 5, c e C, 
p € F, and p < n — 1 
arror otherwise 
(c', p - 1) if c 6 C and c = c' • s for 
some s e 5 and some c € C; 
and p £ P and p > 1 
error 
fTopic, p) = ^ 
where * stands for a don't care. 
otherwise 
(5.8) 
(5.9) 
(c, p) if c € C and p £ P 
error otherwise 
± if c =± and p = 0 
s if c € C and c = c' • s for some s G S 
and some c 6 C; and p E P and | c |= p 
error otherwise 
(5.10) 
(5.11) 
5.4.2 Architecture of NN Stack 
This subsection discusses the neural network realization of a stack in terms of symbolic 
functions fstack and fxap- A. design for NN Stack obtained by adding a write control module to 
an .NN DFA is shown in Figure 5.3. (The use of such a circuit might be considered by some to 
be somewhat unconventional given the implicit assumption of lack of explicit control in many 
neural network models. However, the operation of most existing neural networks implicitly 
assume at least some form of control. Given the rich panoply of controls found in biological 
neural networks, there is no reason not to build in a variety of control and coordination 
structures into neural networks whenever it is beneficial to do so [71]). NN Stack has an n-bit 
binary output corresponding to the element popped from the stack, and four sets of binary 
inputs: 
86 
stack top symbol 
pointer{t+I) 
mput 
stack 
symbol 
synchronization 
control push/pop pointer(t) 
write control 
module 
BMP 2 
stack memory module 
BMP 1 
pointer control module 
reset action 
Figure 5.3 The proposed neural network architecture for stack mechanism 
87 
• Reset which is a 1-bit signal which resets pointer{i) (current SP) to point to the bottom 
of the stack at the beginning. 
• Synchronization control which is a 1-bit signaJ that synchronizes NN Stack with the 
discrete time line, denoted by 0,1, • • •, t -|-1, • • 
• Action which is a 2-bit binary code so that 
— 01 denotes push. 
— 10 denotes pop. 
— 00 denotes no action. 
• Input stack symbol which is an n-bit binary code for the symbol to be pushed onto the 
stack during a stack operation. 
An NN Stack consists of a pointer control module, a stack memory module, a write con­
trol module and two buffers. The first buffer stores current SP value {pointer[t)) and the 
second stores the current stack action {push/pop). In Figure 5.3, the dotted box labeled 
with pomfer(t4-l) exists only logically but not physically, and pointer{t) and pointer ( t+l )  
respectively denote SP before and after a stack action. SP is coded into an m-bit binary 
number. 
5.4.2.1 Pointer control module 
The pointer control module (BMP module 1) realizes a symbolic function fpcontroi  ' •  A  x  
P —¥ P and controls the movement of SP which is incremented on a push and decremented on 
a pop. The pointer control module uses m + 2 input, 3x2'" hidden, and m output neurons. 
m of the input neurons represent pointer{t) (current SP value), and the remaining 2 input 
neurons encodes the stack action. There are 2*" possible SP values. The m output neurons 
represent pointer[t+l) (the SP value after a stack action). Each change in SP value can be 
realized by a binary mapping (with one hidden neuron per change). Since noop (no action) is 
one of legal stack actions, 3x2"* hidden neurons are used in the pointer control module. 
88 
5.4.2.2 Stack memory module 
The stack memory module (BMP module 2) realizes the symbolic function frop- It uses 
m input neurons, n output neurons, and 2"* hidden neurons which together allow storage of 
2"* stack symbols at 2"* SP positions. The stack symbols stored in stack memory module are 
accessed through pointer{t+l) (the output of the pointer control module). Note that the BMP 
module 2 uses its 2nd-layer connections associated with a hidden neuron to store a symbol (see 
Chapter 2). 
5.4.2.3 Write control module 
The write control module (plus stack memory module) realizes a symbolic function fswrite '• 
AxSxCxP—i^C. Physically, it receives m binary inputs from the buffer labeled with 
pointer[t) (denoting current SP), 1 binary input from the second output line of the buffer 
labeled with push/pop (denoting current stack action), and n binary inputs (denoting the 
stack symbol to be pushed onto the stack) from environment. Stack memory module is used 
to store current stack configuration. The module does nothing when a pop is performed. The 
n dotted output lines from the write control module write the n-bit binary-coded stack symbol 
into n of the 2nd-layer connections associated with a corresponding hidden neuron in the 
stack memory module when a push is performed. The hidden neuron and its n associated 
connections are located by using current SP value {pointer{X.)). (The processing of stack 
overflow and underflow is not discussed here. It has to be taken care of by appropriate error 
handling mechanisms). 
5.4.2.4 Timing considerations 
The proposed design for NN Stack shown in Figure 5.3 is based on the assumption that 
the write control module finishes updating the 2nd-layer connection weights associated with 
a hidden neuron of stack memory module before the signals from pointer control module are 
passed to stack memory module during a push stack action. If this assumption fails to hold, 
the original design needs to be modified by adding: n links from input stack symbol (buffer) 
89 
to output stack symbol (buffer); an inhibition latch, which is activated by the leftmost output 
line of the push/pop buffer, on the links to inhibit signal passing from input stack symbol 
(buffer) to output stack symbol (buffer) at a pop operation; a second inhibition latch, which is 
activated by the rightmost output line of the push/pop buffer between pointer control module 
and stack memory module to inhibit signal transmission between these two modules at a push 
operation. 
5.4.3 NN Stack in action 
This subsection symbolically illustrates how the modules of NN Stack together realize a 
stack by considering several successive stack actions. Symbolic function fstack is a compo­
sition of symbolic functions Jpcontroi and fswriu such that V(a, s, c, p) € A x S x C x P, 
fstack{o-,s,c,p) = ifsWrite(0'^s,c,p),fpcontroliO'iP))- Consider the following sequence of stack 
operations: 
1. At time = fi, suppose the vaJue of stack top pointer (current SP value) is 4 and the 
stack action to be performed is a push on a stack symbol a. Let c^, be current stack 
configuration. At this time step, NN Stack computes /54acjb(push,a,Q,,4) = (c£, - a,5) 
and fTopiCii • a, 5) = a. i.e., 
• the pointer control module computes /pconJro/(push, 4) = 5, 
• the write control module (plus stack memory module) computes /5vvrjfe(push. a. Ct,.4) = 
Ct^ - a, and 
• the stack memory module computes fTopi^ti * a, 5) = a. 
2. At time = -f 1, suppose the stack action to be performed is a push on a stack symbol 
b. At this time step, NN Stack computes fstack{p^sh,h,Cl^ • a, 5) = (cj, • a • b, 6) and 
/rop(Cf, •a-b,6) = b, i.e.. 
• the pointer control module computes /pcon<ro/(push,5) = 6, 
• the write control module (plus stack memory module) computes /sw'rt<e(piish,b, c^, • 
a, 5) = Ct, • a • b, and 
90 
• the stack memory module computes fropicti • a • b, 6) = b. 
3. At time =  t i  + 2 ,  suppose the stack ciction to be performed is a pop. At this time step, 
NN Stack computes fstack{pop, *, C(, • a • b, 6) = (ci^ • a, 5) and fropicti • a, 5) = a, i.e., 
• the pointer control module computes fpcon.trol{V°'Vi^) — 5, 
• the write control module does nothing, and 
• the stack memory module computes fropicti • a, 5) = a. 
5.5 Neural Network Design for Nondeterministic Finite Automata (NN 
NFA) 
This section explores how to exploit the inherent parallelism and versatile representation 
in ANN to reduce the operational and implementational time overhead of nondeterministic 
finite automata (NFA) which are a basic model of symbolic computing in computer science 
and provide a typical model suitable for the exploration of parallel symbolic computing via 
ANN. A recurrent neural network (RNN) is systematically synthesized to concurrently track 
all the possible nondeterministic computations of a given NFA. Such a concurrent breadth-first 
tracking is facilitated by two types of parallel symbolic computations executed by the proposed 
RNN. One of the types is parallel content-based pattern matching, and the other is parallel 
union operations of sets. The RNN acts like a cost-effective SIMD computer system dedicated 
to the two types of parallel symbolic computations. The proposed RNN is provably correctly 
assembled from two kinds of neural assemblies. One of the neural assemblies computes a logical 
AND, and the other computes a logical OR. 
Although the concept of nondeterministicism embedded in NFA provides an elegantly sim­
ple and intuitive description for sequence processing, it results in much computational and 
implementational overhead in single-CPU computer systems. Thus the concept of nondeter­
ministicism in NFA, which plays a central role in both the theory of languages and the theory 
of computation [74], provides a typical model suitable for the exploration of parallel symbolic 
91 
computing via neural networks. The reduced operation time complexity of NFA realized by 
the proposed RNN is due to the parallel operations of the neural assemblies in the RNN. 
It is well known that DFA and NFA are equivalent, and every NFA can be converted into 
its equivalent DFA [74]. NFA seem to be of no practical interest in direct application imple­
mentations since they are embedded with nondeterministicism and don't correspond naturally 
to deterministic algorithms. But, NFA have a variety of practical applications in computer sci­
ence, linguistics, systems modeling and control, and artificial intelligence; and NFA are simpler 
and more intuitive to design than their equivalent DFA for a given task due to the powerful 
concept of nondeterministicism embedded in NFA, especiaJly for pattern matching [195]. NFA 
are rarely directly implemented in conventional computer systems because the nondeterminis­
ticism in NFA causes operational and implementational overhead. Usually, they are converted 
into their equivalent DFA for implementation. So, for syntax analysis on regular languages, 
an NFA could be constructed for a given language first, and then its equivalent DFA is imple­
mented to recognize the language. The direct construction of an NFA is as simple as that of a 
DFA using the proposed RNN in which the power of nondeterministicism in NFA is retained, 
and there is no need to convert an NFA into its equivalent DFA before its construction. Note 
also that every DFA is an NFA. Therefore, the proposed RNN can be used as a general neural 
architecture for realizing finite automata including DFA and NFA. 
5.5.1 Nondeterministic finite automata (NFA) 
A nondeterministic finite automaton is a 5-tuple [74], where Q, P. 
qo, and F have same meaning as for a DFA, but S' is a mapping from QxF to 2^. Note that 
2'^ is the power set of Q, and S'{q^a) is the set of all states p such that there is a transition, 
denoted as (q, a, p), from g to p on an input symbol a. Also note that there could be more than 
one transition which is applicable for each combination of state and input symbol in an NFA, 
and |5'(9, a)| is bounded by \Q\, where |i4| denotes the cardinality of set A. An input string is 
accepted by MNFA 'f there is a computation on the input string by M^FA which processes the 
entire input string and halts in an accepting state; otherwise it is rejected. The set of strings 
92 
accepted by MNFA in F* is denoted as L(MNFA)I called the language accepted by MNFA. 
5.5.1.1 Advantages of NFA for applications 
It is well known that NFA and DFA are equivalent [74]. Two automata are said equivalent 
if they accept the same language. Any language accepted by an NFA can also be accepted 
by a DFA, and every NFA can be converted into an equivalent DFA [74]. However, an NFA 
is usually simpler and more intuitive to design than its equivalent DFA for a given language 
due to the powerful concept of nondeterministicism inherent in NFA. Figures 5.4 and 5.5 
respectively show the state diagrams of an NFA and its equivalent DFA. Both of them accept 
input strings that contain the sub-string abaa [195]. These two automata are equivalent, but 
apparently the language the NFA accepts is much easier to understand. The state diagram of 
an NFA or a DFA is a labeled directed graph in which the nodes denote the states of the NFA 
or DFA, and the arcs are obtained from their transition functions. An arc from node g,- to qj 
is  labeled a i f  S{q i ,a)  = q j  for  a  DFA or  q j  € 6{q i ,a)  for  an NFA, and the t ransi t ion {q i ,a , q j )  
is a fan-in transition for state qj on input symbol a. Note that if an NFA has Q states, then 
the number of possible states of its equivalent DFA could be as large as 2'^' and the number 
of possible transitions in the DFA could also be the same order. 
5.5.2 Model for concurrently tracking all the possible nondeterministic moves 
in the operation of an NFA using RNN 
The deterministic and linear-time operation of a given NFA which is realized by the pro­
posed RNN can be modeled conventionally by its equivalent DFA which is constructed ac­
cording to Subset Construction algorithm [195]. The main idea of Subset Construction is to 
concurrently track all the possible states that can be reached at each step of an NFA. In the 
computation of an NFA, tracking all the reachable states at each step induces much overhead in 
single-CPU computer systems, whereas the proposed RNN efficiently computes in parallel, all 
the reachable states at each step by exploiting the parallelism of ANN. In Subset Construction, 
for a given NFA MNFA = {Q, F, S', QO, F), a DFA = (2^^, F, 5", QQ, F") is defined from 
93 
a 
b 
Figure 5.4 The state diagram of an NFA that accepts any input string 
containing the sub-string abaa 
b 
b a 
Figure 5.5 The state diagram of a DFA that accepts any input string con­
taining the sub-string abaa 
such that L{MNFA) = where QQ = {90}, F" = {K | A' CQ K K N F 0}, 
and S" : 2'^  X R —>• 2*^ is defined by 
Qj = ^  a), if Qj  = Li(,^Q, S ' {q ,  a) for all Q,-  C  Q  &:  a  e f (5.12) 
One main problem with Subset Construction, which views every Qi as an individual state in 
implementation, is the exponential increase in the number of states ((9(21'^')) and the number 
of possible transitions defined in transition function S" (0(2^''^'x (r|)). This situation can often 
be somewhat alleviated by Iterative Subset Construction [195] which only includes the states 
that can be reached from initial state qo. Let = (Q'Sy5',QQ,F') be defined from 
MNFA according to Iterative Subset Construction such that L{MNPA) = ^{^DFA)- Since 
Iterative Subset Construction eliminates the states which can not be reached from initial state 
90t ^DFA smaller than M^pj^ in terms of the number of defined states and transitions. 
The major drawback of both subset algorithms is the bookkeeping overhead associated with 
maintaining S", F", Q', S', and F', which are derived from MNFA- The direct realization of a 
94 
given NFA by the proposed RNN avoids this problem since the transition function module of 
the proposed NN NFA captures the regularity of S " in expression 5.12 via S' without actually 
knowing in advance the legal transitions defined by Such simplification is partly facilitated 
by representationally viewing every Qi as an individual set of states denoted by localist neural 
representation. The transition module of the proposed NN NFA realizes not only 8" but also 
S'. Since the proposed RNN always starts from initial state go for any input string, the states 
which can not be reached from qo will not appear in the transition module of the proposed NN 
NFA during input processing, i.e., only the states in Q' will appear in the proposed transition 
module during input processing. Hereafter, we only discuss instead of 
Let 0, 1, ..., t, t+l, ... denote a succession of points along the discrete time line. Then, 
for an NFA, let us call QactW = Qo the initial set of active states, Qactii) the current set of 
active states which corresponds to the set of reachable states from qo by MNFA {^DFA) 
time t (current time), and Qact{t +1) the next set of active states which corresponds to the set 
of reachable states from qo at time f + 1. Qact{t) and Qactit + 1) are derived recursively from 
Qact(O) by expression Qact{t+ 1) = a) during the processing of the input string, 
where a is input symbol at time t. Qact{t) is bounded in a way that Qact(i} C Q for f > 0, 
i.e., all the sets of reachable states from initial state during the processing of the input string 
are bounded by a same set of states, the number of reachable states at each step does not 
proliferate indefinitely or exponentially during the processing of the input string, and thus the 
nondeterministicism shown during the processing of the input string is globally hounded. The 
proposed RNN directly constructs a given NFA MMFA without the need to convert the given 
NFA into its equivalent DFA M^p^. It concurrently tracks all the possible moves during the 
processing of the input string in the NFA by simulating the deterministic move of the M^p^. 
According to Iterative Subset Construction and expression 5.12, the transition function 5' 
and every move of M^jp^ can be characterized by 
Va G rVf > 0 [Qact[ t  + 1) = (5.13) 
where Qact(O) = {go} and Qact{t + 1) = U,6Q„,(f)(J'(g, a) 
95 
For an input string, the recursive evaluation of Qact{i + 1) along the moves of 
{MI^FA) involves two kinds of repetitive symbolic computations, one of which computes the 
sets of reachable states from every state in Qact(0 and the other of which computes the union 
of the sets of the reachable states. In the proposed NN NFA, the former is computed by the 
first layer of the transition module of the proposed NN NFA by parallel content-based pattern 
matching and the later by the second layer by parallel logical QR operations. In applications 
of realizing an NFA by the proposed RNN, a special symbol $ ^ T might need to be appended 
at the end of the input string to acknowledge the end of input. When the $ is encountered, 
the RNN terminates input processing and tests the acceptance of the input string. 
5.5.3 Architecture of NN NFA 
This subsection describes the symbolic and neural representations in the proposed NN 
NFA, and presents a method for assembling the proposed NN NFA using the neural assemblies 
proposed in Chapter 4 
Figure 5.6 shows the partially recurrent neural network architecture for concurrently track­
ing all the nondeterministic computations of a given NFA. The entire architecture essentially 
consists of one IVIV NFA transition module, one acceptance testing module, one end-of-input 
testing module which is not shown In the figure, three buffers, and recurrent links from the 
output neurons of the .NN NFA transition module to the buffer storing Qact{t) (which could 
be part of the input neurons of the NN NF.A. transition module, depending on the applications 
implemented). One buffer stores current set of active states Qact{.t), another buffer (which 
could also be part of the input neurons of the NN NFA transition module, depending on ap>-
plicatlons implemented) stores current input symbol a(i). and the other buffer (which exists 
only logically but not physically) represents the next set of active states Qacti^ + !)• The first 
two buffers are under centralized synchronization control which enforces discrete time 0, 1, ..., 
t, t+1, .... The link "reset" resets the NN NFA to its initial set of active states. 
96 
accepting/rejecting 
a(t) 
act 
Acceptance testing 
module 
BMP module 
NN NFA transition module 
synchronization 
control 
reset input 
Figure 5.6 The proposed recurrent neural network architecture for concur­
rently tracking all the nondeterministic computations of a given 
NFA 
97 
5.5.3.1 Symbolic representation in the NN NFA transition module 
The realization of the symbolic function S' by the NN NFA transition module is central to 
the construction of the proposed NN NFA. The notations used here follow those described in 
previous subsections. The symbolic representations of the NN NFA transition module which 
is a 2-layer Perceptron are described as follows. 
• The output from every neuron and the input to every input neuron are binary value. 
• Every transition defined by the transition function S' (expression 5.13) is represented 
as an ordered binary mapping pair < Qact{t) X + 1) > stored in the NN NFA 
transition module. 
• The input neurons together denote Qact{t) x a(0 and are divided into two groups. One 
group uses a distributed representation and the other uses a local representation. The 
former group has no recurrent connection and denotes the binary-coded current input 
symbol a(f). There are [log(| F j +1)] such input neurons. The latter group has recurrent 
connections and denotes the current set of active states Qact{t)- There are | Q | such 
input neurons, the zth neuron of which denotes whether state g,_i is in Qact{t)- If the 
value at the zth neuron of this group is 1, then state g,_i is in Qact{t)- Otherwise state 
q,-i is not in Qactit)- In this group, the zth input neuron has a recurrent link from the 
ith output neuron. 
• The hidden neurons along with their associated Ist-Iayer connections are used to concur­
rently recognize all the applicable transitions for the states in the current set of active 
states on current input symbol. The hidden layer uses a local representation, and one 
hidden neuron is used for one uniquely defined transition. The number of activated hid­
den neurons at each step of the proposed NN NFA equals ^qeQacdt) I I- The 
activated hidden neurons in turn activate some of the output neurons which together 
denote the next set of active states. 
98 
• The output layer uses a local representation, and the output neurons together denote 
the next set of active states Qact[t + !)• There are 1 Q | output neurons, the zth neuron 
of which denotes whether state g,_i is in Qact{t+ 1)- If the value at the ith neuron is 1, 
then state is in Qact{t + 1)- Otherwise state 9,_i is not in Qact{t + 1)- The output 
neurons along with their associated 2nd-layer connections (which get their input from 
the hidden neurons) operate together to compute the next set of active states according 
to expression Qact{t + 1) = o) (the union of the sets of reachable states 
reached by the states in the current set of active states on current input symbol). 
• The recurrent connections from the output neurons to part of the input neurons facilitate 
the continuous execution of the proposed NN NFA. 
5.5.3.2 Neural representation in the NN NFA transition module 
Let riT = Il,gQEagr I <^'(9, a) I, = pogd ^ I +1)1» and ua =\  Q \  be respectively 
the total number of defined transitions of a given NFA, the number of input neurons used 
for denoting current input symbol, and the number of input neurons used for denoting the 
current set of active states in the NN NFA transition module. Then the NN NFA transition 
module has {UA + ra/) input neurons, NR hidden neurons, and UA output neurons. The hidden 
neurons along with their associated Ist-layer connections are used to identify all the transitions 
applicable for the states in the current set of active states on current input symbol. One hidden 
neuron is used for one uniquely defined transition in the given NFA. The NN NFA transition 
module is constructed directly from the transition function S' of the given NFA MMFA-
Let binary vectors u =< ui,> and u =< ui,...,u„^ > respectively denote the 
ordered values at input neurons and output neurons in the NN NFA transition module. The 
first Ua components of vector u, being < Ui,..., >, together represent Qact{t)'i and the 
last n/ components of vector u, being < Un^i+i, "n^+n/ >, together represent current input 
symbol a. The vectors u and v respectively represent Qact(t) x a and Qact{t-\-1) for the given 
NFA. Let = {i + l,nA + 1, ua + 2,..., ua + be an interest set for 0 < j — 1. 
Totally UA interest sets are defined. Let current input symbol a be 
99 
encoded as a binary vector < ai,>, where ak € {0,1} for 1 < A: < n/. If u,+i = 1, then 
(fi is in Qactij') s-nd "(«/,• ^ i "-i ^ !• flii •••1 '^iif ^ denotes •{(j'tj x o., 
where 0 < f < — 1. 
5.5.3.3 Pfirallel symbolic computations in the NN NFA transition module 
The realization of every move (the move from Qact{t) to QactiP + 1)) of MQP^ in the NN 
NFA transition module can be reasoned in two steps. The first step is computed by the hidden 
neurons which serve as parallel recognizers of multiple input sub-patterns, and the second step 
is computed by the output layer which serves as a parallel union operator of sets. 
1. The hidden layer consists of a fixed number of neural assemblies which operate in parallel 
and independently of each other. Thus the hidden neurons serve as parallel recognizers 
of multiple sub-patterns contained in the input. Such a neural assembly (equivalent 
to an AND neural assembly) for partial pattern recognition is proposed in Section 4. 
Each hidden neuron h and its associated Ist-layer connections serve as a neural assembly 
for recognizing a certain J"^"^^-set partial input pattern. Each such neural assembly 
checks for a certain transition {qi,a,qj) (0 < i,j < — 1) whether current input 
vector u contains the J"'"^^-set partial pattern =< l,ai,...,a„^ > (denoting 
{g,} X a. and qi 6 Qadii}] according to e.xpression 4.2. If it is, the hidden neuron h is 
activated by the partial input pattern {9,} x a at time t and the transition {qi,a,qj) is 
applied. Totally there are nj such neural assemblies operating in parallel to identify all 
the possible transitions which are applicable for the states in Qact{t) on current input 
symbol, and thus they operate together like a simplified, cost-effective SIMD computer 
system dedicated to parallel partial pattern matching. 
2. The output layer consists of a fixed number of the monotone OR neural assemblies pro­
posed in 4. Each of the assemblies computes a logical OR operation in paraJlel with each 
other on shared inputs. Each output neuron and its associated 2nd-layer connections 
compose such a neural assembly. Each such neural assembly is used to check for a cer­
tain s tate  qj  whether any of i ts  fan-in transi t ions is  applicable for  the states in Qact{ t )  
100 
on current input symbol a .  If it is, then output neuron j + 1 is activated and q j  is in 
Qact{t + !)• Totally there are UA such neural assemblies (output neurons) which share 
their input, and operate in parallel to compute the next set of active states Qact{t + 1) 
according to expression Qactit + 1) = a). Such neural assemblies operate 
together to compute Qactit + 1) like a simplified, cost-effective SIMD computer system 
dedicated to parallel union computations of sets. 
When the representations of Qact{t) and Qact{t + i) are viewed locaJly (i.e., each of them is 
viewed as a set of states), the NN NFA transition module realizes the transition function S' of 
the given NFA MI^FA if it is restricted that | Qact(i) I = 1- Note that 6' maps from Q x T to 2^. 
Such local representations facilitate the parallel recognition of all the transitions applicable for 
the states in Qact{i) on current input symbol even if | Qactii) I > 1- Thus the representations 
facilitate the concurrent tracking of all the possible nondeterministic paths at each move of an 
NFA. When the representations of Qactit) and Qactit-i-1) are viewed distributedly (i.e., each 
of them is viewed as a state), the NN NFA transition module realizes the transition function 
S' of It means that the NN NFA transition module concurrently realizes the transition 
functions S and S'. Such a concurrent realization facilitates not only the direct construction 
of the NN NFA transition module from the transition function 5' but also the linear operation 
time complexity of the proposed NN NFA for the processing of input strings. 
5.5.3.4 Settings of connection weights and thresholds in the NN NFA tran­
sition module 
Note that every transition defined by expression 5.13 is represented as an ordered binary 
mapping pair < Qactit) x ait),Qactit + 1) > at the input and output layers of the NN NFA 
transition module, and such mappings are achieved by capturing the regularity in expression 
Qactit + 1) = using a 2-layer Perceptron. 
Suppose 9,, € Q, a 6 F, and 6 <J'(g,-, a), where 0 < z, j < UA —1- In order to identify the 
transition (?,•, a, qj) which is applicable on an input containing the partial input pattern {gj} xa 
in the NN NFA transition module, the interest set = {i + 1, n.^ + 1, + 2,..., riA + nj} 
101 
is used for the identiiication of the J"'^"'"^-set partial input vector =< l,ai On, > 
which denotes {7,} x a .  
According to expressions 4.2, 4.16 and their corresponding Perceptron implementation, a 
hidden neuron h is created, and its associated connection weights as well as threshold are set 
for every transition (9,-, a, qj) of the given NFA MNPA in the NN NFA transition module as 
follows: 
1. In the Ist-layer connections, according to expression 4.2, 
• the connection weight from the (z + l)th input neuron to the hidden neuron h is set 
to 1, 
• the connection weight from the [UA +A:)th input neuron to the hidden neuron is set 
to 2afc — 1 for 1 < A: < 71/, and 
• the connection weights from other input neurons (which are not in to the 
hidden neuron are set to 0. 
2. The threshold of the hidden neuron is set to + 1-
3. In the 2nd-layer connections, 
• the connection weight from the hidden neuron to the { j  + l)th output neuron is set 
to 1, and 
• the connection weights from the hidden neuron to other output neurons are set to 
0. 
4. The thresholds of all output neurons are set to 1 in the NN NFA transition module. 
Therefore, if q i  6 Qact{ t )  and a  is current input symbol; then u,+i = 1, = at for 
1 < A: < n/ at time t ,  an input containing sub-pattern =< 1, oi,..., > is identified 
by hidden neuron h and its associated Ist-layer connections, the hidden neuron h is activated, 
and in turn the {j -f- l)th output neuron is activated (i.e., Vj+i = 1 at time f 4- 1, and thus 
€ Qact{t -F 1)) according to above settings. So, the transition {qi,a,qj) is applied in the NN 
NFA transition module. 
102 
5.5.3.5 Settings of connection weights and thresholds in the acceptance 
testing module 
The acceptance testing module of the proposed NN NFA tests whether an input string is 
accepted by the NN NFA at the end of input processing. It is a l-layer Perceptron which has 
UA input neurons and an output neuron. 
The output neuron tests whether Qact{i + 1) € F' by checking whether any state of F is 
in Qactit + 1) at the end of input processing. Such a test can be characterized by a monotone 
logical OR operation (expression 4.17) on the values of the neurons denoting accepting states, 
and hence it can be realized by a Perceptron according to expression 4.16 with w,- denoting 
whether state g,_i is in F. The connection weights and threshold of the accepting neuron are 
set as follows: 
• If qi G F, then the connection weight from the {i + l)th input neuron to the output 
neuron is set to 1 for 0 < f < — 1. Otherwise it is set to 0. 
• The threshold of the output neuron is set to 1. 
5.5.3.6 The end-of-input testing module 
In the proposed NN NFA, an end-of-input testing module is used to test the end of input 
string. The end-of-input testing module is a neural assembly (a l-Iayer/l-output Perceptron) 
which recognizes the end-of-input symbol S that is encoded as binary vector < 1"' >. By 
expression 4.1 and its corresponding Perceptron implementation, all connection weights are 
set to 1 and the threshold at output neuron is set to n/ in the l-layer/1-output Perceptron to 
recognize $. The end-of-input testing module is not shown in Figure 5.6. 
5.5.3.7 Operation time complexity of the proposed NN NFA 
The time complexity of processing an input string of length n by an NFA directly im­
plemented in single-processor computer systems is 0[m^n) [163], where m is the number of 
states in the NFA. The proposed NN NFA concurrently tracks all the possible nondeterministic 
103 
transitions during the processing of an input string for a given NFA by exploiting the inher­
ent parallelism in ANN. In such a computation, the proposed NN NFA retains the powerful 
concept of nondeterministicism of NFA, and it also has the advantage of DFA which run in 
linear time proportional to the length of the input string. Since the NN NFA transition module 
realizes both the transition functions S' and 5', the time complexity of processing an input 
string by such a parallel and deterministic computation in the proposed NN NFA is linearly 
proportional to the length of the input string, i.e., for an input string of length n the processing 
time complexity in the proposed NN NFA is 0{n). Therefore the computational overhead of 
input processing due to the nondeterministicism in NFA can be eliminated by taking advantage 
of the inherent parallelism of ANN as shown by the proposed NN NFA. 
5.5.4 Proof of correctness 
This subsection proves the correctness in the construction of the proposed NN NFA for a 
given NFA. 
Theorem 5.1: The proposed NN NFA can correctly realize a given NFA Mnfa = (<3- T, 5' ,  qo ,  F) .  
Proof; The theorem is proved by showing that the NN NFA transition module of the proposed 
NN NFA correctly realizes the transition function S' of which is re-defined from the 
given NFA MI^FA according to Iterative Subset Construction (please refer to [195] for the proof 
of equivalence between a given NFA MNFA and its equivalent DFA Note that every 
transition defined by the transition function S'  is represented as an ordered binary mapping 
pair < Qactit) X 1) > 3-^ the input and output layers of the NN NFA transition 
module. The mappings are implemented in the NN NFA transition module without knowing 
in advance every possible mapping pair (legal transition) in transition function S'. Instead, 
they are realized by capturing the regularity in expression Qact{t + 1) = a(f)) 
using a 2-layer Perceptron, the first layer of which consists of AND neural assemblies and the 
second layer of which consists of OR neural assemblies. 
104 
All the notations and representations here follow previous subsections. The transition 
function S' is defined by expression 5.13 as follows: 
Va G rV« > 0 [Qactit + 1) = 5'{Qactit),a)] 
where Qact{0) = {?o} and Qaa{t  + 1) = U,e<j„,(()5'(g,a). Since Qact{t  +• 1) >s computed from 
Qactii) and a, the above expression can be denoted as following; 
Va € r Vt > 0 [Q a c t{t + 1) = [q, a) = {qj \ qj € S (g,-, a) & qi 6 Q a c t{t)}] (5.14) 
where Qact{0) = {^o}- Expression 5.14 is equivalent to 
Va € r Vi > 0 [Vz Vj g,- 6 Qact{t) &: qj € S'{qi, a) => qj € Qact{t + 1) ] (5.15) 
where QactiO) = {9o}- We want to show that the NN NFA transition module realizes the 
transition function 6' of by proving that expression 5.15 is preserved by the NN NFA 
transition module. 
The NN NFA transition module is represented in such a way (see Section 5.5.3) that 
the conditions a € T and 0 < t,j < - 1 always hold in expression 5.15. Let u =< 
ui,.... Un^+Tif > and v =< > be respectively the binary input value and output 
value of the NN NFA transition module, and a 6 T be current input symbol encoded as a 
binary value < ai,.,.,a„, >. where Ok G {0,1} for 1 < Ar < nr. Let wjj., 0^. and 
respectively denote the Ist-layer connection weight from input neuron i to hidden neuron k, 
the 2nd-layer connection weight from hidden neuron k to output neuron j, the threshold of 
hidden neuron k, and the threshold of output neuron j in the NN NFA transition module, 
where 1 < z < n>i + nf, I < k < nr and 1 < J < 
For all i  and j  (0 < i , j  < — 1), if qj € [qi, a), a hidden neuron h with an interest set 
= {i + 1, Ryi + 1, TIA + 2,..., TIA + n/} should have been created by the proposed method 
presented in Section 5.5.3 for the transition (qi, a, qj) for recognizing the Jf^'*'^-set partial input 
value ^(7"'"^^) =< l,ai,...,a„^ >, denoting {^i} x a, by following settings: 
• 
105 
• ^A,k = 0 for 1 < k < tia ^ k ^ (i + 1) (where k ^ 
• ^h,k = 2ajk — 1 for n>i + 1 < A: < + n/ (where k e 
• 0] = 1, 
• ^m.h = 0 for 1 < m < riyi & m ^ (j + 1). 
For any moment f > 0, if ?,• € Q a c t ( t }  (i-e. u,+i = 1), the J"'^"*'^-set partial input vector 
=< l.fli, ...,ani >, denoting {9,} x a, is recognized and hidden neuron h is activated 
when the input u denoting Qact{i) x a is fed into the NN NFA transition module. The hidden 
neuron h in turn activates output neuron j + 1. So vj+i = 1 at time ^ + 1, and thus qj 6 
Qact{t + !)• Therefore expression 5.15, i.e., expression 5.13, is preserved by the NN NFA 
transition module <0> 
5.5.5 NN NFA in Action 
This subsection constructs an NN NFA transition module of the proposed NN NFA for the 
NFA defined in Figure 5.4 (see Section 5.5.1). In the NFA, Q = {90,91,927 93, 94}, 9o is initial 
state, r = {a, b}, and F = {94}. The transitions defined in the NFA are (90,0,^ 0). (QQ.B.QO). 
(9o,a,gi), (91,6,^2), (92,1,93), (93,1,94), (94,1,94), and (94,6,94). Then nr = 8, n/ =2, and 
TIA = 5. The NN NFA transition module has 7 input, 8 hidden, and 5 output neurons. Let 
wl i, respectively denote the Ist-layer connection weight from input neuron i 
to hidden neuron k, the 2nd-layer connection weight from hidden neuron k to output neuron 
j. the threshold of hidden neuron k, and the threshold of output neuron j in the NN NFA 
transition module, where 1^J<7, 1<A:<8 and 1 < j < 5. Let input symbol a be encoded 
as < 0,0 > and b as < 0,1 >. Then the connection weights and neuron thresholds of the NN 
NFA transition module are set as follows: 
= 1, ^1.1 = 1, "'i.2 = 0, W.3 = 0, ^1,4 = 0, wi s = 0, w} e = -1, to} - = -1, 
= 2, wh = 1, WI2 = 0, wis = 0, wl^ = 0, wis = 0, wle = -1, u'i? = 1, 
106 
= 1, U7^ 1 = 1, W^ 2 = 0' = 0' ^^4 = 0' ^is = 0' "'is = -li ^ ^7 = -1' 
EL = 2, = 0, W{^2 = 1' ^ \,Z = Oi "'4,4 = 0, W\^S = 0^ "'is = -1' = 1' 
= 1, u;| 1 = 0, u;^,2 = 0' "'is = 1- "'^4 = 0- "'is = 0^ "'is = "l- "'i? = -1' 
= 1, wl i = 0, U7^,2 = 0' "'is = OT «'i4 = 1. wis = 0' "'is = -1^ "'i? = -1^ 
d\ = 1, w\ i = 0, toij = 0, U7^_3 = 0, w\^ = 0, u;^_5 = 1, = -I, w^~ = -1, 
^8 = 2, W'ii = 0, W8,2 = 0' "'8.3 = Oi ^8A = 0' "'is = 1' "'S.S = "1' "'8,7 = 
0j = I {OT 1 < j < 5, 
toll = 1, wf 2 = 1, "'1.3 = 0' "'1.4 = 0' "'l.S = 0' "'l.S = 0' "'1.7 = Oi "'1.8 = 0' 
wj i = 0, = 0, t^'2.3 = 1' ^2A = 0' "'2,5 = 0' "'2.6 = 0^ "'1,7 = 0' "'2.8 = 0' 
u?| 1 = 0, wl ^ = 0, wl^ = 0, u;| 4 = 1, u;§ 5 = 0, q = 0, «;§ - = 0, u;§ g = 0, 
wl i = 0, u;^ 2 = 0> "'I3 = 0, t«4,4 = 0, U!|,5 = 1, t«4,6 = 0, wlj = 0, wis = 0' 
ti7 2 1 = 0, u;f 2 = 0, «;|,3 = 0, K;| 4 = 0, wl^ = 0, g = 1. "'5.7 = 1' "'5.8 = 1-
Note that this NN NFA starts from initial state Q a c t { 0 )  =  {90} which is encoded as < 
1 . 0 , 0 , 0 , 0 > .  
5.6 Summary and Discussion 
In conventional computer systems, computer programs for real world applications are usu­
ally large and complex. They are typically built from a set of pre-defined modules (or ob­
jects/classes in object-oriented paradigms). Such modules allow code reuse and rapid imple­
mentation with fewer errors. We advocate a similar approach to the construction of complex 
neural networks. In this chapter, we have constructed a given DFA using an RNN which in 
turn is assembled from BMP modules and recurrent links: a stack using an RNN which is as­
sembled from BMP modules, recurrent links and a write control module; a given DPDA using 
an RNN which is assembled from BMP modules, an NN Stack and recurrent links; and a given 
NFA using an RNN which is assembled from basic neural assemblies that realize logical AND 
and OR operations on Boolean variables. The proposed NN NFA demonstrates the potential 
benefits of ANN in the design of high performance systems for parallel symbolic computing 
applications. Our other attempts for more complex symbolic processing includes neural net­
107 
works designed respectively for simple database query processing (see Chapter 3) and syntax 
analysis (see Chapter 6). We expect a similar approach to be applicable in the construction 
of neural networks for a variety of important applications. 
108 
6 NEURAL ARCHITECTURES FOR SYNTAX ANALYSIS 
6.1 Introduction 
This chapter explores the synthesis of neural architectures for syntax analysis using pre-
specified grammars — a prototypical symbol processing task with applications in interactive 
programming environments (using interpreted languages such as LISP and JAVA), analysis 
of symbolic expressions (e.g., in real-time knowledge based systems and database query pro­
cessing), and high-performance compilers. This chapter does not address machine learning of 
unknown grammars (which finds applications in tasks such as natural language acquisition). 
A more general goal of this chapter is to explore the design of massively parallel archi­
tectures for symbol processing using the neural associative memories proposed in Chapter 2 
as key components. Pattern-directed associative inference is an essential part of most AI sys­
tems [54, 97, 181] and dominates the computational requirements of many AI applications 
[55, 97, 127]. 
The proposed high performance neural architectures for syntajc analysis are systematically 
(and provably correctly) synthesized through composition of the necessary symbolic functions 
using a set of component symbolic functions each of which is realized using a neural associative 
processor (memory). It takes advantage of massively parallel pattern matching and retrieval 
capabilities of neural associative processors (memories) to speed up syntax analysis for real­
time applications. The rest of the chapter is organized as follows: 
• The remainder of Section 6.1 reviews related research on neural architectures for syntax 
analysis. 
109 
• Sections 6.2 and 6.3 respectively develop modular neural network architectures for lexical 
anaiysis and parsing. 
• Section 6.4 compares the estimated performance of the proposed neural architectures for 
syntax analysis (based on current CMOS VLSI technology) with that of commonly used 
approaches to syntax analysis in conventional computer systems that rely on inherently 
sequential index or matrix structure for pattern matching. 
• Section 6.5 concludes with a summary and discussion. 
6.1.1 Review of related reseeirch on neural architectures for syntax analysis 
To the best of our knowledge, to date, most of the research on neural architectures for syn­
tax analysis has focused on the investigation of neural networks that are designed to leam to 
parse particular classes of syntactic structures (e.g., strings from deterministic context-free lan­
guages (DCFL) or natural language sentences constructed using limited vocabulary). Notable 
exceptions are: connectionist realizations of Turing Machines (wherein a stack is simulated 
using binary representation of a fractional number) [143, 169]; a few neural architectures de­
signed for parsing based on a known grammar [34, 164]; and neural network realizations of finite 
state automata [20. 134]. Nevertheless, it is informative to e.xamine the various proposals for 
neural architectures for syntax analysis (regardless of whether the grammar is preprogrammed 
or learned). The remainder of this subsection explores some of the proposed architectures in 
the literature for syntajc analysis in terms of how each of them addresses the key subtasks of 
syntax analysis. 
[34] proposes a neural network to parse input strings of fixed maximum length for known 
context-free grammars (CFG). The whole input string is presented at one time to the neural 
parser which is a layered network of logical AND and OR nodes with connections set by an 
algorithm based on CYK algorithm [74]. 
PARSEC [80] is a modular neural parser consisting of six neural network modules. It 
transforms a semantically rich and therefore fairly complex English sentence into three output 
representations produced by its respective output modules. The three output modules are 
110 
role labeler which associates case-role labels with each phrase block in each clause, interclause 
labeler which indicates subordinate and relative clause relationships, and mood labeler which 
indicates the overall sentence mood (declarative or interrogative). Each neural module is 
trained individually by a variation of Backpropagation algorithm. The input is a sequence 
of syntactically as well as semantically tagged words in the form of binary vectors and is 
sequentiaJly presented to PARSEC, one word at a time. PARSEC exploits generalization as 
well as noise tolerance capabilities of neural networks to reportedly attain 78% correct labeling 
on a test set of 117 sentences when trained with a training set of 240 sentences. Both the test 
and training sets were based on conference registration dialogs from a vocabulary of about 400 
words. 
SPEC [116] is a modular neural parser which parses variable-length sentences with em­
bedded clauses and produces case-role representations as output. SPEC consists of a parser 
which is a simple recurrent network, a stack which is realized using a recursive auto-associative 
memory (RAAM) [144], and a segmenter which controls the push/pop operations of the stack 
using a 2-layer Perceptron. 
RAAM has been used by several researchers to implement stacks in connectionist designs 
for parsers [13, 66, 116]. A RAAM is a 2-layer Perceptron with recurrent links from hidden 
neurons to part of input neurons and from part of output neurons to hidden neurons. The 
performance of a RAAM stack is known to degrade substantially with increase in depth of the 
stack, and the number of hidden neurons needed for encoding a stack of a given depth has to 
be determined through a process of trial and error [116]. A RAAM stack has to be trained for 
each application. Other drawbacks associated with the use of RAAM as a stack are discussed 
in [174]. 
Each module of SPEC is trained individually using Backpropagation algorithm to approx­
imate a mapping function as follows: Let Q be a finite non-empty set of states, V a finite 
non-empty input alphabet, VCRV a finite non-empty set of case-role vectors, A = {output, 
push, pop) the set of stack actions, and Vatack the set of compressed stack representations at 
the hidden layer of a RAAM. Then the first and second connection layers of the parser ap­
Il l  
proximate the transition function of a DFA (see Section 5) f p i  i F  x  Q  Q  and a symbolic 
mapping function fp2 : Q VcRV respectively; the segmenter approximates a symbolic func­
tion fs'.TxQ-^QxA; and the first and second connection layers of the RAAM approximate 
the push function fpuah • Vstack x (J -»• Vstack and the pop function /pop : Vstack ->• Ktack xQ of 
a RAAM stack respectively. The input string is sequentially presented to SPEC and is a se­
quence of syntactically untagged English words represented as fixed-length distributive vectors 
of gray-scaJe values between 0 and 1. The emphasis of SPEC was on exploring the generaliza­
tion as well as noise tolerance capabilities of a neural parser. SPEC uses implicit central control 
to integrate its different modules and reportedly achieves 100% generalization performance on 
a whole test set of 98100 English relative clause sentences with up to 4 clauses. Since the 
words (terminals) in the CFG which generates the test sentences are not pre-translated by a 
lexical analyzer into syntactically tagged tokens, the number of production rules and terminals 
tend to increase linearly with the size of the vocabulary in the CFG. Augmenting SPEC with 
a lexical analyzer offers a way around this problem. 
[27, 174, 197] propose higher-order recurrent neural network equipped with an external 
stack to learn to recognize deterministic CFG, i.e., to learn to simulate a deterministic push­
down automata (DPDA). [27, 174] use an analog network coupled with a continuous stack and 
use a variant of a real time recurrent network learning algorithm to train the network. [197] 
uses a discrete network coupled with a discrete stack and employs a pseudo-gradient learning 
method to train the network. The input to the network is a sequentially presented, unary-
coded string of variable length. Let Q be a. finite non-empty set of states, T a finite non-empty 
input alphabet, A a finite non-empty stack alphabet, A = {push, pop, no-operation} the set of 
stack actions, and Boolean the set {false, true}. These recurrent neural networks approximate 
the transition function of a DPDA, i.e., JDPDA -.QXTXA-^QXAX A. The networks 
are trained to approximate a language recognizer function //, : F" -v Boolean. Strings gener­
ated from CFG including balanced parenthesis grammar, a'^b^, a^b^cb^a^, postfix 
grammar, and/or palindrome grammar were used to evaluate the generalization performance 
of the proposed networks. 
112 
The proposed neural architecture for syntax analysis is composed of neural network mod­
ules for stack, lexical analysis, parsing, and parse tree construction. It differs from most of 
the neural network realizations of parsers in that it is systematically assembled using neural 
associative processors (memories) as primary building blocks. It is able to exploit massively 
parallel content-based pattern matching and retrieval capabilities of neural associative pro­
cessors (memories). This offers an opportunity to explore the potential benefits of massively 
parallel pattern matching in the design of high performance computing systems for real time 
symbol processing applications. 
6.2 Neural Network Design for a Lexical Analyzer (NNLexAn) 
A lexical analyzer is defined by a recursive symbolic function fLexAn • T'S A'$. F is 
the input alphabet, $ is a special symbol denoting " end of inpuf, and A is the set of lexical 
tokens. F'S (or A'$) denotes the set of strings obtained by Eidding the suffix $ to each of 
the strings over the alphabet F (or A). The conventional approach to implementing a lexical 
analyzer using a DFA (in particular, a Mealy machine) can be realized quite simply using an 
NN DFA [20]. However, a major drawback of this approach is that all legal transitions have 
to be exhaustively specified in the DFA. For example. Figure 6.1 shows a simplified state 
diagram without all legal transitions specified for a lexical analyzer which recognizes keywords 
of a programming language: begin, and, if, then, and else. 
Suppose the lexical analyzer is in a state that corresponds to the end of a keyword. Then 
its current state would be state 7, 11, 15, 18, or 23. If the next input character is 6, there 
should be legal transitions defined from those states to state 2. That is the same for states 
8, 16 and 19 in order to handle the next input characters e, i, and t. Thus, this extremely 
simple lexical analyzer with 22 explicitly defined legal transitions has 20 unspecified transitions. 
The realization of such a simple 5-word (23-state) lexical analyzer by an NN DFA requires 
20+22=42 hidden neurons. Additional transitions have to be defined in order to allow multiple 
blanks between two consecutive words in the input stream, and for error handling. These 
drawbacks are further exacerbated in applications involving languages with large vocabularies. 
113 
blank 
blank 
blank 
Figure 6.1 The simplified state diagram of a DFA which recognizes key­
words: begin, end, if, then, and else 
A better alternative is to use a Dictionary (or a database) to serve as a lexicon. The 
proposed design for NNLexAn consists of a word segmenter for carving an input stream of 
characters into a stream of words, and a word lookup table for translating the carved words 
of variable length into syntactically tagged tokens of fixed length. The syntactically tagged 
tokens of fixed length are to be used as single logical units in parsing. Such a translation can 
be realized by a simple query to a database using a key. Such database query processing can 
be efficiently implemented using neural associative memories (see Chapter 2). 
6.2.1 Neural network design for a word segmenter (NNSeg) 
In program translation, the primary function of a word segmenter is to identify illegal 
words and to group input stream into legal words including ke5rwords, identifiers, constants, 
operators, and punctuation symbols. A word segmenter can be defined by a recursive symbolic 
function fwordSeg • T'S A"S, where F is the input aiphabet, $ is a special symbol denoting 
''end of inpuf, and A is the set of legal words. F'S (or A'$) denotes the set of strings obtained 
by adding the suffix S to each of the strings over the alphabet F (or A). 
Figure 6.2 shows the state diagram of a DFA simulating a simple word segmenter which 
carves continuous input stream of characters into integer constants, keywords, and identifiers. 
114 
a ~ z  
A - Z  
a ~ z, A - Z 
blank 
/ a -  z  
A ~ Z  blank 
blank 
blank 
a ~ z, A - Z 
0 - 9  
blank 
blank 
0 - 9  
a - z 
A - Z  1 - 9  
Figure 6.2 The state diagram of a DFA which simulates a simple word 
segmenter carving continuous input stream of characters into 
words including integer constants, keywords and identifiers 
Both the keywords and identifiers are defined as strings of English characters. For simplicity, 
the handling of end-of-input is not shown in the figure. The word segmenter terminates pro­
cessing upon encountering the end-of-input symbol $. Each time when the word segmenter 
goes into an accepting state, it instructs the word lookup table to look up a word that has 
been extracted from the input stream and stored in a buffer. 
Since syntax error handling is not discussed here, it may be assumed that any illegal word 
is discarded by the word segmenter and is also discarded from the buffer which temporarily 
stores the illegal word being extracted from the input stream. Such a word segmenter can 
also be realized by an NN DFA. Since any undefined (un-implemented) transition moves into 
a binary-coded state of all zeros automatically in an NN DFA, it would be expedient to en­
code the garbage state (state G in Figure 6.2) using a string of all zeros. Although the most 
straightforward implementation of NN DFA (see Section 5) uses one hidden neuron per tran­
sition, one can do better. In Figure 6.2 the 10 transitions from state 4 on ASCII-coded input 
symbols 0,1,...,9 can be realized by only two hidden neurons in an NN DFA using partial pat-
115 
tem recognition (see Chapters 2 and 3). Other transitions on input symbols 0,1,...,9, a,b,...,z, 
and can be handled in a similar fashion. 
6.2.2 Neural network design for a word lookup table (NNLTab) 
During lexical analysis in program compilation or similar applications, each word of variable 
length (extracted by the word segmenter) is translated into a token of fixed length. Each such 
token is treated as a single logical entity: an identifier, a keyword, a constant, an operator 
or a punctuation symbol. Such a translation can be defined by a simple symbolic function 
fwordTran • Au{$} —)• Au{$}. Here, A, $, and A denote the same entities as in the definitions 
of fwordSeg and fiexAn above. Note that fwordTran can be realized by a BMP module. In 
other lexical analysis applications, a word may be translated into a token having two sub­
parts: category code denoting the syntactic category of a word, and feature code denoting the 
syntactic features of a word. 
Conventional approach to doing such translation (dictionary lookup) is to perform a simple 
query on a suitably organized database (with the segmented word being used as the key). This 
content-based pattern matching and retrieval process can be efficiently and effectively realized 
by neural associative memories. Database query processing using neural associative memories 
is discussed in detail in Chapter 3 and is summarized briefly in what follows. Each word and 
its corresponding token are stored as an association pair in a neural associative memory. Each 
such association is implemented by a hidden neuron and its associated connections. A query is 
processed in two steps: identification and recall. During the identification step, a given word is 
compared to all stored words in parallel by the hidden neurons and their associated Ist-layer 
connections in the memory. Once a match is found, one of the hidden neurons is activated to 
recall the corresponding token using the 2nd-layer connections associated with the activated 
hidden neuron. The time required for processing such a query is of the order of 20 ns (at best) 
to 100 ns (at worst) given the current CMOS technology for implementation of artificial neural 
networks (see Section 3.2.1.1). 
116 
6.3 A Modular Neural Architecture for LR Parser (NNLR Parser) 
LR(A:) grammars generate the so-called deterministic context-free languages which can be 
accepted by deterministic push-down automata [74]. Such grammars find extensive applications 
in programming languages and compilers. LR parsing is a linear time table-driven algorithm 
which is widely used for syntax analysis of computer programs [2, 19, 170]. This aJgorithm 
involves extensive pattern matching which suggests the consideration of a neural network 
implementation using associative memories. This section proposes a modular neural network 
architecture for parsing LR(1) grammars. LR(Ar) parsers scan input from left to right and 
produce a rightmost derivation tree by using lookahead of k unscanned input symbols. Since 
any LR(k) grammar for k > 1 can be transformed into an LR(1) grammar [170], LR(1) parsers 
are sufficient for practical applications [74]. 
An LR(1) grammar can be defined as GIR(I) = (V, T, T, 0) [74], where V and T are finite 
sets of variables (nonterminals) and terminals respectively, T is a finite set of production rules, 
and 0 G V is a special variable called the start symbol. V and T are disjoint. Each production 
rule is of the form A a, where A £ V and a £ {V uT)*. An LR(1) parser can be defined by 
a recursive symbolic function fLRParser '• A*S -*• T', where A (A = T" in the context), $, and 
A' are as in fLex An, and T' denotes the set of all sequences of production rules over the rule 
alphabet T. Although fiRPaaer corresponds in form to the recursive symbolic function fuxAn 
in Section 6, it can not be realized simply by a Mealy machine which implements fLexAn- This 
is due to the fact that the one-to-one mapping relationship between every input symbol of the 
input string and the output symbol of the output string at corresponding position in a Mealy 
machine does not hold for fLRParser- A stack is required to store intermediate results of the 
parsing process in order to realize an LR(1) parser which is characterized by fLRParser-
6.3.1 Representation of parse table 
Logically, an LR parser consists of two parts; a driver routine which is the same for all LR 
parsers and a parse table which is grammar-dependent [2]. LR parsing algorithm pre-compiles 
an LR grammar into a parse table which is referred by the driver routine for deterministically 
117 
parsing input string of lexical tokens by shift/reduce moves [2,19]. Such a parsing mechanism 
can be simulated by a DPDA (deterministic pushdown automata) with e-moves [74]. An e-move 
does not consume the input symbol, and the input head is not advanced after the move. This 
enables a DPDA to manipulate a stack without reading input symbols. The neural network 
architecture for DPDA (NN DPDA) proposed in Section 5, augmented with an NN Stack 
(see Section 5), is able to parse DCFL. However, the proposed NN DPDA architecture cannot 
efficiently handle 6-moves because of the need to check for the possibility of an €-move at every 
state. Therefore, a modified design for LR(1) parsing is discussed below. 
Parse table and stack are two main components of an LR(1) parser. The access of parse 
table can be defined by the symbolic function fparseTable : Q x (AU VU {$}) A x Q U {*} x 
Tu{#} X N \J {*} X V U{*} X Z in terms of binary mapping. Here, Q is the finite set of 
states; A, V, $, and T have the same meaning as in the definition of and fLRParaer 
given above; A = {shift, reduce} is the set of parsing actions; * denotes a don't care; N 
is the set of natural numbers; and Z = {error, in-progress, accept} is the set of possible 
parsing status values. 
A parse table can be realized using a BMP module as described in Section 2.2.5 in terms 
of binary mapping. The next move of the parsing automaton is determined by current input 
symbol a and the state q that is stored at the top of the stack. It is given by the parse 
table entry corresponding to [9,0]. Each such 2-dimensional parse table entry action[q,a] is 
implemented as a 6-tuple binary code < action, state, rule, length, Ihs, status > in the BMP 
module for parse table where 
• action is a 2-bit binary code denoting one of two possible actions, 01 (shift) or 10 
(reduce); 
• state is an 5-bit binary number denoting " the next state"; 
• ru/e is an i?-bit binary number denoting the grammar production rule r to be applied if 
the consulted action is a reduce; 
118 
• length is an L-bit binary number denoting the length of the right hand side of the grammar 
production rule r to be applied if the consulted action is a reduce; 
• Ihs is an H-hit binary code encoding the grammar nonterminal symbol at the left hand 
side of the grammar production rule r to be applied if the consulted action is a reduce 
and 
• status is a 2-bit binary code denoting one of three possible parsing status codes, 00: 
error, 01: in progress, or 10: accept (used by higher-level control to acknowledge the 
success or failure of a parsing). 
Note that the order of the tuple's elements arranged in Figure 6.3 is different from above. 
A canonical LR(1) parse table is relatively large and would typically have several thousand 
states for a programming language like C. SLR(l) and LALR(l) tables, which are far smaller 
than LR(1) table, typically have several hundreds of states for the same size of language, and 
they always have the same number of states for a given grammar [3] (The differences among 
LR, SLR, and LALR parsers are discussed in [19]). The number of states in the parse table 
of LALR(l) parsers for most programming languages is between about 200 and 450, and the 
number of symbols (lexical tokens) is around 50 [19], i.e., the number of table entries is between 
about 10000 and 22500. 
Typically a parse table is realized as a 2-dimensional array in current computer systems. 
Memory is allocated for every entry of the parse table, and the access of an entry is via its offset 
in the memory, which is computed efficiently by the size of the fixed memory space for each 
entry and the indices of an entry in the array. However, it is much more natural to retrieve an 
entry in a table using content-based pattern matching on the indices of the entry. As described 
in Section 2.2.5, a BMP module can effectively and efficiently realize such content-based table 
lookup. 
LR grammars used in practical applications typically produce parse tables with between 
80% and 95% undefined error entries [19]. The size of the table is reduced by using lists which 
can result in a significant performance penalty. The use of a BMP for such table lookup help 
119 
' state(t+l) ' 
NN LR(1) Parser 
rule 
NN shift/reduce stack 
for state 
NN stack 
for parse tree 
1 length J I state J i action J i rule J Ihs 
I ... I I ... ^ 1 I I !••• 
BMP (Binary Mapping Perceptron) 
(Parsing table 
I 
state(t) 
. . .  
input(t) 
queue 
mechanism 
(optional) 
next 
processing unit 
status 
lexical token 
inhibited by a reduce action 
NNfor 
lexical 
analyzer 
input 
stream 
Figure 6.3 The proposed neural network architecture for LR(1) parser 
120 
overcome this problem since undefined mappings are naturally realized by a BMP module 
without the need for extra space and without incurring any performance penalties. Thus, 
LALR(l) parsing (which is generally the technique of choice for parsing computer programs) 
table can be realized using at most about 22500 x 20% = 4500 hidden neurons. 
6.3.2 Representation of parsing moves and parse trees 
A configuration of an LR parser is an ordered pair whose first component corresponds to 
the stack contents and whose second component is the part of the input that remains to be 
parsed. A configuration can be denoted by (go9i o-jo-j+i where 9, is the state 
on top of the stack, qo is the stack bottom symbol, Oj is current input symbol, and $ is a 
special symbol denoting "'end of inpuf. The initial configuration is [qo, OiOj Let 0^ 
be a A:-bit binary number (code) of all zeros denoting a value of don't care for > 1. In 
the proposed NNLR Parser, the configurations resulting from one of four types of moves on 
parsing an input lexical token are as follows: 
• If action[qi, Uj] =< 01,9,0^,0^,0^,01 >, the parser performs a shift move and en­
ters the configuration {qoqi •••qiq, o^j+i •••anS)- Such a shift move is realized in one 
operation cycle in the proposed NNLR Parser. 
• If action[qi,aj] =< 10,0'^, r,/,/i,01 >, the parser performs a reduce by producing a 
binary number r (which denotes a grammar production rule A 0 being applied, 
where the grammar nonterminal A is denoted by the binary code h, and I Is the num­
ber of non-empty grammar symbols in 0) as part of the parse tree, popping / sym­
bols off the stack, consulting parse table entry [9,-/, /i] and entering the configuration 
(qofi • ••qi-iq, o-j • • -anS) where action[qi-i, h] =< 01, q, 0^, 0^, O'', 01 >. Such a reduce 
move is reaJized in two operation cycles in the proposed NNLR Parser since the parse 
table is consulted twice for simulating the move. 
• If action{qi, aj] =< 0^, 0^, O'^, 0^, 0^, 10 >, parsing is completed. 
121 
• If action\qi,aj] =< 0^,0*^,0^,0^,0^,00 >, an error is discovered and the parser stop>s. 
Note that such an entry is a binary code of all zeros. (We do not discuss error handling 
any further in this chapter). 
An LR parser scans input string from left to right and performs bottom-up parsing which 
results in a rightmost derivation tree in reverse. Thus, a stack can be used to store the parse 
tree (derivation tree) which is a sequence of grammar production rules (in reverse order) applied 
in the derivation of the scanned input string. The rule on top of the final stack which stores 
a successfully parsed derivation tree is a grammar production rule with the stari symbol of 
an LR grammar at its left hand side. Note that each rule is represented by an R-hit binary 
number and the mapping from a binary-coded rule to the rule itself can be realized by a BMP 
module. 
6.3.3 Architecture of NNLR parser 
Figure 6.3 shows the architecture of a modular neural network design for an LR(1) parser 
which takes axivantage of the efficient shift/reduce technique. The NNLR Parser uses an 
optional queue handler module and an NN stack which stores the parse tree (derivation tree). 
The queue handler stores lexical tokens extracted by the NN lexical analyzer described in 
Section 6 and facilitates the operation of lexical analyzer and parser in parallel. To extract 
the binary-coded grammar production rules in derivation order sequentially out of the NN stack 
which stores parse tree, the next processing unit connected to the NN stack sends binary-coded 
stack pop actions to the stack in an appropriate order. 
6.3.3.1 Modules of NNLR Peirser 
The proposed NNLR Parser consists of a BMP module implementing the parse table, an 
NN shift/reduce stack storing states during shift/reduce simulation, a buffer (stateCt)) 
storing the current state (from the top of the NN shift/reduce stack), and a buffer (input (t)) 
storing either current input lexical token or a grammar nonterminal symbol produced by last 
consulted parsing action which is a reduce. When the last consulted parsing action is a reduce 
122 
encoded as 10; the grammar production rule to be reduced is pushed onto the stack for parse 
tree, the transmission of input (t) is from the latched buffer Ihs, and the input from the queue 
mechanism is inhibited by the leftmost bit of the binary-coded reduce action. When the last 
consulted parsing action is a shift encoded as 01, the transmission of input (t) is from the 
queue mechanism and the input from the latched buffer Ihs is inhibited by the rightmost bit 
of the binary-coded shift action. 
Parsing is initiated by reset signals to the NN shif t/reduce stack and the NN stack storing 
parse tree. The signals reset the SPs of these two stacks to stack bottom and hence state (t) 
is reset to initial state. To avoid clutter, the reset signal lines are not shown in Figure 6.3. The 
current state buffer state(t) and the current input buffer input (t) need to be synchronized 
but the necessary synchronization circuit is omitted from Figure 6.3. 
The operations of an LR parser can be viewed in terms of a sequence of transitions from an 
initial configuration to a final configuration. The transition from one configuration to another 
can be divided into two steps: the first involves consulting the parse table for next action using 
current input symbol and current state on top of the stack; the second step involves execution 
of the action — either a shift or a reduce — as specified by the parse table. In the NNLR 
Parser, the first step is realized by a BMP module which implements the parse table lookup; 
and the second step is executed by a combination of an NN shift/reduce stack which stores 
states, and an NN stack which stores the parse tree (and the BMP module when the next 
action is a reduce). 
6.3.3.2 Complexity of the BMP module for p£irse table 
Let M be the number of defined action entries in the parse table. All grammar symbols are 
encoded into i/-bit binary codes. The BMP module for parse table uses S + H input neurons, 
M hidden neurons, and 4 + S + R + L + H output neurons. Note that the BMP module 
produces a binary output of all zeros, denoting a parsing error (see previous description of 
parsing status code in an action entry of the parse table), for any undefined action entry in 
the parse table. The i?-bit binary-coded grammar production rule is used as the stack symbol 
123 
for the NN stack which stores the parse tree. 
6.3.3.3 Complexity of the NN Stack for parse tree 
Assume the pointer control module of the NN stack for parse tree use nip bits to encode 
its SP values. Then the pointer control module of the NN stack for parse tree uses nip + 2 
input neurons, 3 x 2^^ hidden neurons, and rrip output neurons. The stack memory module 
uses nip input neurons, 2"*^ hidden neurons, and R output neurons. The write control module 
receives nip + l binary inputs (the stack pointer + push/pop signal) and R binary inputs (the 
grammar production rule). 
6.3.3.4 Complexity of the Shift/Reduce NN Stack 
To efficiently implement the raduce action in LR parsing, the NN shift/reduce stack can 
be slightly modified from the NN stack described in Section 5 to allow multiple stack pops 
in one operation cycle of the NNLR Parser. The number of pops is coded as an L-bit binary 
number and equals the number of non-empty grammar symbols at the right hand side of the 
grammar production rule being reduced. It is used as input to the pointer control module 
and write control module in the NN shift/reduce stack. Thus, each of the modules use L 
additional input neurons in the NN shift/reduce stack as compared to the NN stack proposed 
in Section 5. The output from the NN parse table, namely, the 5-bit binary code for state, 
is used as the stack symbol to the NN shift/reduce stack. Let the maximum number of 
non-empty grammar symbols that appear in the right hand side of a production rule in the LR 
grammar being parsed be Lm- Then k multiple pops are implemented in the NN shift/reduce 
stack in a manner similar to a single pop in the NN stack proposed in Section 5 except that 
the SP value is decreased by k instead of 1, 1 < A: < Hence for each SP value, Z-m - 1 
additional hidden neurons are required to allow multiple pops in the pointer control module. 
124 
6.3.4 NNLR Parser in action 
This subsection illustrates the operation of the proposed NNLR Parser for a given LR(1) 
grammar. The example of LR(1) grammar (Gi) used here is taken from [3]. The BNF (Backus-
Naur Form) description of the grammar Gi is as follows: 
expression —)• expression + term j term 
term —y term x factor | factor 
factor —>• ( expression ) | identifier 
Using E, T, F, and I to denote expression, term, factor, and identifier (respectively), these 
rules can be rewritten in the form of production rules pi through pe'. 
Production rule Pi E —>• E + T 
Production rule P2 E -¥ T 
Production rule P3 T —>• T X F 
Production rule P4 T ->• F 
Production rule P5 F —Y ( E )  
Production rule P6 F —¥ 1 
Then { I, +, x, (, ) } is the set of terminals (i.e. the set of possible lexical tokens from the 
lexical analyzer), { E, T, F } is the set of nonterminals, { pi, p2, p3, p4, ps, pe } is the set of 
production rules, and E is the start symbol of the grammar Gi. 
The operation of the parser is shown in terms of symbolic codes (instead of the binary codes 
used by the NN implementation) to make it easy to understand. Note however that the trans­
formation of symbolic codes into binary form used by NNLR Parser is rather straightforward 
and has been explained in the preceding sections. 
Let s and r denote the parsing actions shift and reduce and a, e, and i the parsing 
statuses accept, error, and in-progress respectively. The parse table of the LR(1) parser 
(more specifically, SLR(l) parser) for grammar Gi is shown in Table 6.1. 
The implementation and operations of the NN shift/reduce stack and the NN stax:k for 
parse tree follow the discussion and examples in Section 5 and they are not discussed here. 
125 
State 
Table 6.1 The parse table of the LR(1) parser for grammar Gi 
r + X ( ) 
9o (s,94,*,*,*,i) 
9i 
92 (r,*,p2,l,E,i) (r,*,p2,l,E,i) 
93 (r,*,p4,l,T,i) (r,*,p4,l,T,i) (r,*,p4,l,T,i) 
94 (s,95,*,*,*,i) (S,94,*,*,*,i) 
95 (r,*,p6,l,F,i) (r,*,p6,l,F,i) (r,*,p6,l,F,i) 
96 (s,95,*,*,*,i) (s,94,*,*,*,i) 
97 (S,94,*,*,*,i) 
98 (s,9ii,*,*,*,i) 
99 (r,#,pi,3,E,i) (s,97,*,»,*,i) (r,*,pi,3,E,i) 
9io (r,*,p3,3,T,i) (r,*,p3,3,T,i) (r,*,p3,3,T.i) 
911 (r,*,p5,3,F,i) (r,*,p5,3,F,i) (r,*,p5,3,F,i) 
State $ E T F 
90 (s,93,*i*,*,i) 
9i a) 
92 (r,*,p2,l,E,i) 
93 (r.*,P4,l,T,i) 
94 
95 (r,*,p6,l,F.i) 
96 
97 (s,9iOi*,*,*,i) 
98 
99 (r,*,pi,3,E,i) 
9io (r,*,p3,3,T,i) 
9u (r,*,p5,3,F,i) 
126 
The parse table can be represented by a binary mapping which in turn can be easily realized by 
a BMP module (see Section 2.2.5 for details). Following the notation introduced in Section 6 
for the NN realization of the parse table, we have: M = 45 since there are 45 defined entries 
in the parse table; S = \log2i2] = 4 since there are 12 states; H = \log2{S + 2)] =4 since 
there are 8 grammar symbols plus a null symbol e and an additional end-of-string symbol $; 
R = {log^Gi] = 3 since there are 6 production rules; and L = f/o^23] = 2 since the maximum 
number of non-empty grammar symbols that appear in the right hand side of a production 
rule in the LR grammar G\ is 3. Therefore, the BMP for parse table of Gi has S + H = ^ 
input, 45 hidden, and 4: + S+R + L + H = ll output neurons. 
Assume every identifier I is translated from a string of lower case English characters. The 
lexical analyzer Li which translates input strings of +, x, (, ), $, blank, and lower case 
English characters into strings of lexical tokens can be realized by an NN DFA. Figure 6.4 
shows the state diagram of the DFA Mi,^ for Li. Note that additional machinery needed for 
error handling is not included in the DFA ; and when the DFA sees a S. it stops the 
processing of the input string and appends a $ at the end of the output string. 
The transition function 6ci of the DFA is shown in Table 6.2. This function can be 
expressed as a binary mapping which in turn can be easily realized by a BMP module (see 
Section 2.2.5 for details). In the NN DFA, BMP module 1 realizes the transition function 
Sli : Q X r Q and BMP module 2 reaJizes a translation function A' : Q -> A s.t. A'(72) = 
I, A (94) = +, A (^e) = -^'(^S) = (, -^'(910) =). and A'(g) = e (null symbol, which is 
discarded) for other Q eQ, where Q = {go, 9i, ?2,93,94,95, 96,97,98,99,9io} is the set of states, 
R = {a,6, ...,2,+, X, (, ),$, blank} is the input alphabet, and A = {I,+, x, (,)} is the output 
alphabet (i.e., the set of lexical tokens). The symbolic functions SL^ and A' can be expressed as 
binary mappings which in turn can be realized by BMP modules (see Section 2.2.5 for details). 
Let us now consider the operation of the LR(1) parser when it is presented with the input 
string aa X bb + cc. This string is first translated by the lexical analyzer L\ into a string 
of lexical tokens Ixl+I which is then provided to the LR(1) parser. This translation is quite 
straightforward, given the state diagram and transition function (Table 6.2) of and 
127 
a~z 
blank 
a-z 
blank 
blank 
blank blank 
blank 
blank 
blank 
blank 
blank 
Figure 6.4 The state diagram of the DFA Ml,^ for the lexical analyzer Li 
Table 6.2 The transition function of the DFA 
State a, b, z + X ( ) blank S 
9o 9i 93 95 97 99 90 90 
9i 9i 92 92 
92 9i 93 95 97 99 90 9o 
93 94 94 
1A 9i 93 95 97 99 90 9o 
95 96 96 
96 91 93 95 97 99 90 90 
97 98 98 
98 9i 93 95 97 99 90 9o 
99 9io 9io 
9io 91 93 95 97 99 90 9o 
128 
its translation function A'. Note that there is a space between each p^r of consecutive words 
in the input character string, and there is no space token between each pair of consecutive 
lexical tokens in the string of lexical tokens. 
The string of lexical tokens is parsed by the LR(1) parser whose moves are shown in 
Table 6.3. At step 1, the parse table entry corresponding to (^o, I) is consulted. Its value is 
(s, 95, i). This results in shifting I and pushing state qs onto the shift/reduce stack. 
At step 2, the table entry corresponding to (gs, x) is consulted first. Its value is (r, * , p e ,  1, F, i) 
which indicates a reduce on production rule pe- Therefore, state qs is popped off the stack, 
and table entry corresponding to (go, F) is consulted next. The entry is (s, 93, *, *, »,i) which 
means shifting F and pushing state qz onto the stack. The remaining steps are executed in 
a similar fashion. At the end of the moves (step 14), the sequence of production rules stored 
in the stack for parse tree can be applied in reverse order to derive the string Ixl+I from 
grammar start symbol E. 
6.4 Performance Analysis 
This section explores potential performance advantages of the proposed neural network 
architecture for syntax analysis in comparison with that of current computer systems that 
employ inherently sequential index or matrix structure for information matching and retrieval. 
The performance estimates for the NNLR Parser assume hardware realization based on current 
CMOS VLSI technology. In the analysis that follows, it is assumed that the two systems have 
comparable I/O performance and error handling capabilities. 
To simplify the comparison, it is assumed that each instruction on a conventional computer 
takes T ns (nanoseconds) on an average. For instance, on a relatively cost-effective 100 MIPS 
processor, a typical instruction would take 10 ns to complete. (The MIPS measure for speed 
combines clock speed, eflFect of caching, pipelining and superscalar design into a single figure 
for speed of a microprocessor). Similarly, we will assume that a single identification and recall 
operation by a neural associative memory takes a ns. Assuming hardware implementation 
based on current CMOS VLSI technology, a = 20 ns (see Section 3.2.1.1). 
129 
Table 6.3 Moves of the LR(1) parser for grammar G\ on input string Ix I+I 
Step Content of Remaining Referred entries Content of 
shift/reduce stack input of parse table parse tree stack 
(1) 90 (90.1) X 
(2) 9095 X1-1-1$ (95,X), (go,F) 
(3) 9o93 xl-i-l$ (93,X), (90,T) -Lpe 
(4) 9o92 xH-I$ (92, x) -LP6P4 
(5) 909297 1-1-1$ (97,1) -1-P6P4 
(6) 90929795 +1$ (95,+), (97,F) -LP6P4 
(7) 9o92979io +1$ (910,+), (90rT) -1-P6P4P6 
(8) 9o92 +1$ (92,+), (90,E) -1-P6P4P6P3 
(9) 9o9i +n (91,+) -1-P6P4P6P3P2 
(10) 9O9I96 1$ (96,1) -LP6P4P6P3P2 
(11) 9O9I9695 $ (95,$), (96,F) J-P6P4P6P3P2 
(12) 9O9I9693 $ (93,$), (96,T) -I-P6P4P6P3P2P6 
(13) 9O9I9699 $ (99,$), (90,E) -1-P6P4P6P3P2P6P4 
(14) 9o9i $ (9i,$) -LP6P4P6P3P2P6P4P1 
Syntax analysis in a conventional computer typically involves; lexical analysis, grammar 
parsing, parse tree construction and error handling. These four processes are generally coded 
into two modules [2]. Error handling is usually embedded in grammar parsing and lexical 
analysis respectively, and parse tree construction is often embedded in grammar parsing. The 
procedure for grammar parsing is the main module. In single-CPU computer systems, even 
assuming negligible overhead for parameter passing, a procedure call entails, at the very min­
imum, (1) saving the context of the caller procedure and activation of the callee procedure 
which typically requires 6 instructions [105]; and (2) context restoration and resumption of 
caller procedure upon the return (exit) of the callee procedure, which typically requires at 
least 3 instructions [105]). Thus, a procedure call entails a penalty of 9 instructions or about 
9r ns. 
6.4.1 Performance anfdysis of lexical analyzer 
Lexical analysis can be performed by a DFA whose transition function can be represented as 
a 2-dimensional table with current state and current input symbol as indices. The continuous 
transition moves of such a DFA involve repetitive lookup of its next state from the table using 
130 
current state and current input symbol at each move until an error state or an accepting state is 
reached. Such a repetitive table lookup involves content-based pattern matching and retrieval 
which can be performed potentially more efficiently by neural associative memories. 
Each entry of the DFA transition table implemented on conventional computers usually 
contains three parts: the next state; a code for whether the next state is an accepting state, 
an error state, or neither; and the lexical token to use if the next state is an accepting state. 
Implementing such a repetitive table lookup on conventional computers requires, at a minimum, 
six instructions; one (or two) multiplication and one addition to compute the offset in the 
transition table (to access the location where the next state is stored), one memory access to 
fetch the next state from the table, one addition to compute the offset of the second part in the 
transition table (based on the known offset of the first part), one memory access to fetch the 
second part from the table, and one branch-on-comparison instruction to jump back to the first 
instruction of the loop if the ne.xt state is neither an error state nor an accepting state. (Note 
that this analysis ignores I/O processing requirements). Thus, each state transition takes 6 
instructions or 6r ns. 
In contrast, the proposed NN architecture for lexical analyzer computes the next state 
using associative (content-addressed) pattern matching-and-retrieval in a single identification-
and-recall cycle of a BMP module. In the 2-dimensional table, the values of the two indices 
for an entry provide a unique pattern - the index pattern, for accessing the table entry. In the 
BMP module, each index pattern and the corresponding entry are stored as an association pair 
by a hidden neuron and its associated connections. The BMP performs a table lookup in two 
steps: identification and recall. In the identification step, a given index pattern is compared 
to all stored index patterns in parallel by the hidden neurons and their associated Ist-layer 
connections. Once a match is found, one of the hidden neurons is ax:tivated to recall the 
associated entry value using the 2nd-layer connections associated with the activated hidden 
neuron. 
In program compilation, a segmented word is translated into a syntactically tagged token 
when the DFA for lexical analysis enters an accepting state. On conventional computers, this 
131 
translation step costs, at the very minimum, three instructions (or 3r ns): one addition to 
compute the offset of the third part in the transition table (based on the known offset of 
the first part), one memory arcess to fetch the lexical token from the table, and one branch 
instruction to jump back to the first instruction of the loop for carving next word. 
In other syntax analysis applications that involve large vocabularies, a database lookup is 
typically used to translate a word into a syntactically tagged token. In this case, depending on 
the size of the vocabulary and the organization of the database, it would generally take more 
than 10 instructions to perform this translation. (See Chapter 3 for a comparison of database 
query processing using neural associative memories as opposed to conventional computers). 
A BMP module is capable of translating a carved word into a token as described in Section 
2.2.5 in a single cycle of ideniification-and-recall with a time delay of a ns. Note that this step 
can be pipelined (see the NNLR Parser in action in Section 6). 
In summary, if we assume the average length of words in input string being W symbols and 
we ignore I/O, error handling and the overhead associated with procedure calls, it would take 
{6W + 3)r ns on average to perform lexical analysis of a word on a conventional computer. 
In contrast, it would take [W + l)a ns using the proposed NN lexical analyzer. This analysis 
ignores I/O and error handling. For example, assuming a 100 MIPS conventional computer 
(r = 10 ns), and current CMOS VLSI implementation of neural associative memories (a = 20 
nsj, with M'' = 5, then the former takes 330 ns and the latter 120 ns. 
6.4.2 Performance analysis of LR parser 
LR parsing also involves repetitive table lookup which can be performed efficiently by neural 
associative memories. LR parser is driven by a 2-dimensional table (parse table) with current 
state and current input symbol as indices. Once a next state is retrieved, it is stored on a stack 
and is used as the current state for the next move. Parsing involves repetitive application of 
a sequence of shift and reduce moves. A shift move would take at least 6 instructions, 
or equivalently 6r ns on a conventional computer. This includes 3 instructions to consult the 
parse table, 1 instruction to push the next state onto the stack, 1 instruction to increment the 
132 
stack pointer, and 1 instruction to go back to the first instruction of the repetitive loop for 
next move. A typical reduce move involves a parse table lookup, a pop of the state stack, a 
push to store a rule into the stack for parse tree, and a shift: move. Thus, a typical reduce 
would take at least 3 + 1 + 2 + 6 = 12 instructions, or equivalently 12r ns, on a conventional 
computer. 
In the proposed NNLR Parser, the computation delay consists of the delays contributed 
by the operation of the two NN Stacks and the BMP which stores the parse table. An NN 
StZLck consists of two BMP modules, one of which is augmented with a write control module. 
Assuming that the computation delay of an NN stack is roughly equal to that of two sequentially 
linked BMPs (2a ns), a shift move (which takes one operation cycle of the NNLR Parser) and 
a typical reduce move (which takes two operation cycles of the NNLR Parser) would consume 
3a ns and 6a ns respectively. (This analysis ignores the effect of queuing between the NNLR 
Parser and the NN lexical analyzer). 
Assuming that the average length of words in input string be W symbols, and ignoring 
I/O, error handling and the overhead associated with procedure calls, parsing a word (a word 
has to be translated into a lexical token by lexical analysis first) by shift and reduce moves 
would take {QW + 9)r ns and {&W + 15)r ns respectively on a conventional computer. 
In contrast, because the NNLR Parser and NN lexical analyzer can operate in parallel, 
shift and reduce moves take 3q ns or (W^ + l)a ns (whichever is larger) and 6a ns or 
(W + l)a ns (whichever is larger) respectively on the NNLR Parser. 
Thus, as shown in Table 6.4, for typical values of a, r and W, the proposed NNLR Parser 
offers a potentially attractive alternative to conventional computers for syntax analysis. 
It should be noted that the preceding performance comparison has not considered alter­
native hardware realizations of syntax analyzers. These include hardware implementations of 
parsers using conventional building blocks used for building today's serial computers. We are 
not aware of any such implementations although clearly, they can be built. In this context 
it is worth noting that the neural architecture for syntax analysis proposed in this chapter 
makes extensive use of massively parallel processing capabilities of neural associative proces-
133 
Table 6.4 A comparison of the estimated performance of the proposed 
NNLR Parser with that of convention^ computer systems for 
syntax anzdysis 
Type of overhead NNLR Parser Conventional computers 
time for lexical analysis of a word 
time for a shift move of parsing 
time for a reduce move of parsing 
(ly-f-l)Qr 
max [3a, {W + l)a] 
max [6q, {W -|- l)a] 
(6fF + 3)r 
{6W + 9)T 
{%W -1- 15)r 
sors (memories). It is quite possible that other paraJlel (possibly non neural network) hard­
ware realizations of syntax analyzers offer performance that compares favorably with that of 
the proposed neural network reaiization. We can only speculate as to why there appears to 
have been little research on parallel architectures for syntax analysis. Historically, research in 
high performance computing has focused primarily on speeding up the execution of numeric 
computations, typically performed by programs written in compiled languages such as C and 
FORTRAN. In such applications, syntax analysis is done during program compilation which is 
relatively infrequently compared to program execution. The situation is quite different in sym­
bol processing (e.g., knowledge based systems of AI, analysis of mathematical expressions in 
software designed for symbolic integration, algebraic simplification, theorem proving) and in­
teractive programming environments based on interpreted programming languages (e.g., LISP, 
J.WA). Massively parallel architectures for such tasks are only beginning to be explored. 
6.5 Summary and Discussion 
This chapter has explored the design of a neural architecture for syntax analysis of lan­
guages with known (a-priori specified) grammars. Syntax analysis is a prototypical symbol 
processing task with a diverse range of applications in artificial intelligence, cognitive mod­
elling, and computer science. Examples of such applications include: language interpreters for 
interactive programming environments using interpreted languages (e.g., LISP, JAVA), parsing 
of symbolic expressions (e.g., in real-time knowledge based systems, database query processing, 
and mathematical problem solving environments), syntactic or structural analysis of large col­
lections of data (e.g., molecular structures, engineering drawings, etc.), and high-performance 
134 
compilers for program compilation and behavior-based robotics. Indeed, one would be hard-
pressed to find a computing application that does not rely on syntax analysis at some level. 
The need for syntax analysis in real time calls for novel solutions that can deliver the desired 
performance at an affordable cost. Artificial neural networks, due to their potential advantages 
for real-time applications on account of their inherent parallelism, offer an attractive approach 
to the design of high performance syntax analyzers. 
The proposed neural architecture for syntax analysis is obtained through systematic and 
provably correct composition of a suitable set of component symbolic functions which are ulti­
mately realized using neural associative processor (memory) modules. The neural associative 
processor (memory) is essentially a 2-layer perceptron which can store and retrieve arbitrary 
binary pattern associations [21]. It is a cost-effective SIMD (single instruction, multiple data) 
computer system for massively parallel pattern matching and retrieval. Since each component 
in the proposed neural architecture computes a well-defined symbolic function, it facilitates 
the systematic synthesis as well as analysis of the desired computation at a fairly abstract 
(symbolic) level. Realization of the component symbolic functions using neural associative 
processors (memories) allows one to exploit massive parallelism to support applications that 
require syntax analysis to be performed in real time. 
The proposed neural network for syntax analysis is capable of handling sequentially pre­
sented character strings of variable length, and it is assembled from neural network modules 
for lexical analysis, stack processing, parsing, and parse tree construction. The neural network 
stack can realize stacks of arbitrary depths (limited only by the number of neurons available). 
The parser outputs the parse tree resulting from syntax analysis of strings from widely used 
subsets of deterministic context-free languages (i.e., those generated by LR grammars). Since 
logically an LR parser consists of two parts, a driver routine which is the same for all LR 
parsers, and a parse table which varies from one application to the next [3], the proposed 
NNLR Parser can be used as a general-purpose neural architecture for LR parsing. 
It is relatively straightforward to estimate the cost and performance of the proposed neu­
ral architecture for syntajc analysis based on the known computation delays associated with 
135 
the component modules (using known facts or a suitable set of assumptions regarding current 
VLSI technology for implementing the component modules). Our estimates suggest that the 
proposed system offers a systematic and provably correct approach to designing cost-effective 
high-performance syntajc analyzers for real-time syntax analysis using known (a-priori speci­
fied) grammars. 
The choice of the neural associative processors (memories) as the primary building blocks 
for the synthesis of the proposed neural architecture for syntajc analysis was influenced, among 
other things, by the fact that they find use in a wide range of systems in computer science, 
artificial intelligence, and cognitive modelling. This is because associative pattern matching 
and recaJl is central to pattern-directed processing which is at the heart of many problem 
solving paradigms in AI (e.g., knowledge based expert systems, case based reasoning) as well 
as computer science (e.g., database query processing, information retrieval). As a result, design, 
VLSI implementation, and applications of associative processors have been studied extensively 
in the literature [21, 23, 68, 78, 88, 97, 110, 124,127, 151,153]. The neural network architecture 
proposed in this chapter for syntax analysis demonstrates the versatility of neural associative 
processors (memories) as generic building blocks for systematic synthesis of modular massively 
parallel architectures for symbol processing applications. 
It should be noted that the primary focus of this chapter was on taking advantage of massive 
parallelism and associative pattern storage, matching, and recall properties of a particular class 
of neural associative memories in designing high performance syntax analyzers for a-priori 
specified grammars. Consequently, it has not addressed several other potential advantages 
of neural network architectures for intelligent systems. Notable among these are inductive 
learning and fault tolerance. 
Machine learning of grammars or grammar inference is a major research topic which has 
been, and continues to be, the subject of investigation by a large number of researchers in 
artificial intelligence, machine learning, syntactic pattern recognition, neural networks, com­
putational learning theory, natural language processing, and related areas. The surveys of 
grammar inference in general can be found in [69, 96, 115, 137], and the recent results on 
136 
grammar inference using neural networks can be found in [6, 13, 27, 38, 45, 44, 66, 77, 80, 116, 
122, 123, 129, 159, 161, 166, 174, 192, 194, 197]. 
Fault tolerance capabilities of neural architectures under different fault models (neuron 
faults, connection faults, etc) has been the topic of considerable research [21, 165, 180] and is 
beyond the scope of this chapter. However, it is worth noting that the proposed neural network 
design for syntax analysis inherits some of the fault tolerance capabilities of its primary building 
block, the neural associative processor (memory) (see Section 2.3.3 for details). 
137 
r CONCLUSION 
TraditionaJ symbol processing models of AI and ANN have been viewed by many as radi­
cally (and perhaps even irreconcilably) different paradigms for the design of intelligent systems. 
But given the fact that they are essentially equivalent in terms of their computing capabilities, 
a more reasonable view is that they each represent different architectural commitments and 
hence different cost-performance tradeoffs within the space of possible designs for intelligent 
systems. This latter viewpoint argues for a somewhat systematic exploration of this design 
space in search of novel and efficient computational architectures for such systems. This dis­
sertation takes a few small steps in this direction and adds to the growing body of literature 
[47, 72, 99, 179] that demonstrates the potential benefits of integrated neural-symbolic archi­
tectures that overcome some of the limitations of today's ANN and AI systems. 
More specifically, this dissertation develops the theory and implementation of a neural 
architecture for associative memory which is capable of massively parallel best match, exact 
match, and partial match. It also demonstrates systematic, provably correct synthesis of effi­
cient neural architectures respectively for information retrieval and database query processing, 
elementary logical inference, sequence processing, and syntax analysis using neural associative 
memories as the primary building blocks for massively parallel pattern-directed symbol pro­
cessing. This facilitates the systematic analysis of the resulting computation performed by the 
resulting neural systems at a fairly abstract (symbolic) level. 
138 
APPENDIX. ACRONYMS 
AI: Artificial Intelligence 
AM: Associative Memory 
ANN; Artificial Neural Networks 
BiCMOS: Bipolar Complementary Metal Oxide Semiconductor 
BMP: Binary Mapping Perceptron 
BNF: Backus-Naur Form 
CFG: Context-Free Grammars 
CFL: Context-Free Languages 
CMOS: Complementary Metal Oxide Semiconductor 
DCFL: Deterministic Context-Free Languages 
DFA: Deterministic Finite Automata 
DNF: Disjunctive Normal Form 
DPDA: Deterministic Pushdown Automata 
MIPS: Million Instructions Per Second 
NFA: Nondeterministic Finite Automata 
NLP: Natural Language Processing 
NN DFA: Neural Network for Deterministic Finite Automata 
NN DPDA: Neural Network for Deterministic Pushdown Automata 
NN NFA: Neural Network for Nondeterministic Finite Automata 
NN Stack: Neural Network for Stack 
NNLR Parser: Neural Network for LR(1) Parser 
PLA: Programmable Logic Array 
139 
RAAM: Recursive Auto-Associative Memory 
RNN: Recurrent Neural Networks 
SIMD: Single Instruction Multiple Data 
VLSI: Very Large Scale Integration 
140 
BIBLIOGRAPHY 
[1] Ackley, D. H., Hinton, G. D. and Sejnowski, T. J., A Learning Algorithm for Boltzmann 
Machines, Cognitive Science, vol. 9, pp. 147-169, 1985. 
[2] Aho, A. v., Sethi, R. and Ullman, J. D., Compilers: Principles, techniques, and Tools, 
Addison-Wesley, Reading, MA, 1986. 
[3] Aho, A. V. and Ullman, J. D., Principles of Compiler Design, Addison-Wesley, Reading, 
MA, 1977. 
[4] Aitchison, J., Words in the Mind, in: An Introduction to the Mental Lexicon, Basil 
Blackwell, Oxford, 1987. 
[5] Ajjanagadde, V. and Shastri, L., Efficient Inference with Multi-Place Predicates and 
Variables in a Connectionist System. Proceedings of 11th Cognitive Science Society Con­
ference. pp. 396-403, Erlbaum, Hillsdale, NJ, 1989. 
[6] Allen, R. B., Connectionist Language Users, Connection Science, vol. 2, no. 4, p. 279, 
1990. 
[7] Amari, S., Characteristics of Random Nets of Analog !*^euron-like Elements, IEEE Trans­
actions on Systems, Man, and Cybernetics, vol. 2, pp. 643-657, 1972. 
[8] .A.mari, S., Learning Patterns and Pattern Sequences by Self-Organizing Nets of Thresh­
old Elements, IEEE Transactions on Computers, vol. 21, no. 11, pp. 1197-1206, 1972. 
[9] Amari, S., Neural Theory of Association and Concept-Formation, Biological Cybernetics, 
vol. 26, pp. 175-185, 1977. 
141 
[10] Anderson, J. A., Silverstein, J. W., Ritz, S. A. and Jones, R. S., Distinctive Feature, 
Categorical Perception, and Probability Learning; Some Applications of a NeuraJ Model, 
Psychological Review^ vol. 84, pp. 413-451, 1977. 
[11] Arbib, M., Schema Theory: Cooperative Computation for Brain Theory and Distributed 
AI, in; Artificial Intelligence and Neural Networks: Steps Toward Principled Integration, 
Honavar, V. and Uhr, L. (Ed.), pp. 51-74, Academic Press, San Diego, CA, 1994. 
[12] Bently, J. L., Multidimensional Binary Search Trees Used for Associative Searching, 
Communications of the ACM, vol. 18, no. 9, pp. 507-517, 1975. 
[13] Berg, G., A Connectionist Parser with Recursive Sentence Structure and Lexical Disam­
biguation, Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 
32-37, MIT Press, Cambridge, MA, 1992. 
[14] Bookman, L. A., A Framework for Integrating Relational and Associational Knowledge 
for Comprehension, in; Computational Architectures Integrating Neural and Symbolic 
Processes: A Perspective on the State of the Art, Sun, R. and Bookman, L. (Ed.), Chapter 
9, pp. 283-318. Kluwer Academic Publishers, Norwell, MA, 1995. 
[15] Butterworth, B., Lexical Representation, in; Language Production Volume 2: Devel­
opment, Writing and Other Language Processes. Butterworth, B. (Ed.), pp. 257-294. 
Academic Press, London, 1983. 
[16] Carpenter, G. and Grossberg, S., Adaptive Resonance Theory; Stable Self-Organization 
of Neural Recognition Codes in Response to Arbitrary Lists of Input Patterns, 8th Annual 
Conference of the Cognitive Science Society, pp. 45-62, Lawrence Erlbaum Associates, 
Hillsdale, NJ, 1986. 
[17] Carpenter, G. and Grossberg, S., A Massively Parallel Architecture for a Self-Organizing 
Neural Pattern Recognition Machine, Computer Vision, Graphics, and Image Under­
standing, vol. 37, pp. 54-116, 1987. 
142 
[18] Carpenter, G. and Grossberg, S., ART2: Self-Organization of Stable Category Recogni­
tion Codes for Analog Input Patterns, Applied Optics, vol. 26, pp. 4919-4930, 1987. 
[19] Chapman, N. P., LR Parsing: Theory and Practice, Cambridge University Press, Cam­
bridge, MA, 1987. 
[20] Chen, C. and Honavar, V., Neural Network Automata, Proc. of World Congress on 
Neural Networks, vol. 4, pp. 470-477, San Diego, June 1994. 
[21] Chen, C. and Honavar, V., A Neural Architecture for Content as well as Address-Based 
Storage and Recall: Theory and Applications, Connection Science, vol. 7, no. 3 & 4, pp. 
281-300, 1995a. 
[22] Chen, C. and Honavar, V., A Neural Network Architecture for Syntax Analysis. Accepted 
by IEEE Transactions on Neural Networks. Preliminary version available as Iowa State 
University Dept. of Computer Science Tech. Rep. ISU-CS-TR 95-18, 1995b. 
[23] Chen, C. and Honavar, V., A Neural Network Architecture for High-Speed Database 
Query Processing System, Microcomputer Applications vol. 15, no. 1, pp. 7-13, 1996. 
[24] Chen, C. and Honavar, V., Neural Architectures for Information Retrieval and Query 
Processing, in: Handbook of Natural Language Processing: Techniques and .Applications 
for the Processing of Language as Text, Moisl, H., Dale, R. and Somers, H. (Ed.), Marcel 
Dekker, New York, 1998. 
[25] Chen, C. and Honavar, V., A Neural Architecture for Parallel Set Operations. Paper in 
preparation. 
[26] Cohen, D., Introduction to Computer Theory, Wiley, New York, 1986. 
[27] Das, S., Giles, C. L. and Sun, G. Z., Using Prior Knowledge in a NNDPA to Learn 
Context-Free Languages, in: Advances in Neural Information Processing Systems 5, Han­
son, S. J., Cowan, J. D. and Giles, C. L. (Ekl.), pp. 65-72, Morgan Kaufmann, San Mateo, 
CA, 1993. 
143 
[28] Z?'Autrechy , C. L. and Reggia, J. A., An Overview of Sequence Processing by Connec-
tionist Models, Technical Report UMIACS-TR-89-82, University of Maryland, College 
Park, MD, 1989. 
[29] DayhofF, J., Neural Network Architectures: An Introduction, Van Nostrand Reinhold, 
New York, 1990. 
[30] Defiore, C. and Berra, P. B., A Data Management System Utilizing an Associative Mem­
ory, AFIPS, Proceedings of the National Computer Conference, vol. 42, pp. 181-185,1973. 
[31] Dolan, C. P. and Smolensky, P., Tensor Product Production System: A Modular Archi­
tecture and Representation, Connection Science, vol. 1, pp. 53-58, 1989. 
[32] Dyer, M. G., Connectionist Natural Language Processing: A Status Report, in: Com­
putational Architectures Integrating Neural and Symbolic Processes: A Perspective on 
the State of the Art, Sun, R. and Bookman, L. (Ed.), Chapter 12, pp. 389-429, Kluwer 
Academic Publishers, Norwell, MA, 1995. 
[33] Elman, J. L., Finding Structure in Time, Cognitive Science, vol. 14., pp. 179-211, 1990. 
[34] Fanty, M. A., Context-free Parsing with Connectionist Networks, Proceedings of AIP 
Neural Networks for Computing, Conference No. 151, pp. 140-145. Snowbird. UT, 1986. 
[35] Fodor, J. and Pylyshyn, Z., Connectionism and Cognitive Architecture: A Critical Analy­
sis, in: Connections and Symbols, Pinker, S. and Mehler, J. (Ed.), MIT Press, Cambridge, 
MA, 1988. 
[36] Forster, K. I., Accessing the Mental Lexicon, in: New Approaches to Language Mech­
anisms, Walker, R. and Wales, R. J. (Ed.), pp. 257-287, North-Holland, Amsterdam. 
1976. 
[37] Frakes, W. B. and Baeza-Yates R. (Ed.), Information Retrieval: Data Structures & 
Algorithms, Prentice Hall, Englewood Cliffs, NJ, 1992. 
144 
[38] Frasconi, P., Gori, M., Maggini, M. and Soda, G., Unified Integration of Explicit Rules 
and Learning by Example in Recurrent Networks, IEEE Transactions on Knowledge and 
Data Engineering, vol. 7, no. 2, pp. 340-346, 1995. 
[39] Fukushima, K., Cognitron: A Self-Organizing Multilayered Neural Network, Biological 
Cybernetics, vol. 20, pp. 121-136, Nov. 1975. 
[40] Fukushima, K., Miyake, S. and Ito, T., Neocognitron: A Neural Network Model for 
a Mechanism of Visual Pattern Recognition, IEEE Transactions on System, Man, and 
Cybernetics, SMC-13, no. 5, pp. 826-834, Sep./Oct., 1983. 
[41] Fukushima, K., Neocognitron: A Hierarchical Neural Network Capable of Visual Pattern 
Recognition. Neural Networks, vol. 1 , pp. 119-130, 1988. 
[42] Gallant, S. I., Connectionist Expert Systems, Communications of the ACM, vol. 31, pp. 
152-169, February 1988. 
[43] Gallant, S. I., Neural Network Learning and Expert Systems, MIT Press, Cambridge, 
MA, 1993. 
[44] Giles, C. L., Home, B. W. and Lin, T., Learning a Class of Large Finite State Machines 
With a Recurrent Neural Network. Neural Networks, vol. 8, no. 9, pp. 1359-1365. 1995. 
[45] Giles, C. L., Miller, C. B., Chen, D., Sun, G. Z. and Lee, Y. C., Learning and E.x-
tracting Finite State Automata with Second-Order Recurrent Neural Networks, Neural 
Computation, vol. 4., no. 3., p. 380, 1992. 
[46] Goldfarb, L. and Nigam, S., The Unified Learning Paradigm: A Foundation for AI, 
in: Artificial Intelligence and Neural Networks: Steps Toward Principled Integration, 
Honavar, V. and Uhr, L. (Ed.), pp. 533-559, Academic Press, San Diego, CA, 1994. 
[47] Goonatilake, S. and Khebbal, S. (Ed.), Intelligent Hybrid Systems, Wiley, London, 1995. 
[48] Gowda, S. M. et al., Design and Characterization of Analog VLSI Neural Network Mod­
ules, IEEE Journal of Solid-State Circuits, vol. 28, no. 3, pp. 301-313, 1993. 
145 
[49] Graf, H. P. and Henderson, D., A Reconfigurable CMOS Neural Network, ISSCC Dig. 
Tech. Papers, pp. 144-145, San Francisco, CA, 1990. 
[50] Grant, D. et al.. Design, Implementation and Evaluation of a High-Speed Integrated 
Hamming Neural Classifier, IEEE Journal of Solid-State Circuits, vol. 29, no. 9, pp. 
1154-1157, Sep. 1994. 
[51] Grossberg, S., Some Networks That Can Learn, Remember, and Reproduce Any Number 
of Space-Time Patterns II, Studies in Applied Mathematics, vol. 49, pp. 135-166, 1970. 
[52] Grossberg, S., Contour Enhancement, Short-Term Memory, and Constancies in Rever­
berating Networks, Studies in Applied Mathematics, vol.52, pp. 217-257, 1973. 
[53] Grossberg, S., Adaptive Pattern Classification and Universal Recording II: Feedback, 
Oscillation, Ilfaction, and Illusions, Biological Cybernetics, vol. 23, pp. 187-207, 1976. 
[54] Grosspietsch, K. E., Intelligent Systems by Means of Associative Processing, in: Fuzzy, 
Holographic, and Parallel Intelligence, Soucek, B. and the IRIS Group (Ed.), pp. 179-214, 
John Wiley & Sons, New York, 1992. 
[55] Gupta, A., Parallelism in Production Systems, Ph.D. Thesis, Carnegie-Mellon University, 
Pittsburgh, Mar. 1986. 
[56] Gupta, M. M. and Knopf, G. K., Neuro-Vision Systems: A Tutorial, in: Neuro-Vision 
Systems: Principles and Applications, Gupta, M. and Knopf, G. (Ed.), pp. 1-34, IEEE 
Press, New York, 1994. 
[57] Hamilton, A. et aJ., Integrated Pulse Stream Neural Networks: Results, Issues, and 
Pointers, IEEE Transactions on Neural Networks, vol. 3, no. 3, pp. 385-393, May 1992. 
[58] Handke, I., The Structure of Lexicon: Human versus Machine, Mouton de Gruyter, 
Berlin, 1995. 
[59] Hao, J. and Vandewalle, J., A New Model of Neural Associative Memories, International 
Journal of Neural Systems, vol. 5, no. 1, pp. 39-47, Mar. 1994. 
146 
[60] Hassoun, M., Fundamentals of Artificial Neural Networks, MIT Press, Cambridge, MA, 
1995. 
[61] Hayes-Roth, F., The Role of Partial and Best Matches in Knowledge Systems, in: 
Pattern-Directed Inference Systems, Waterman, D. A. and Hayes-Roth, F. (Ed.), pp. 
557-574, Academic Press, New York, NY, 1978. 
[62] Haykin, S., Neural Networks, MacMillan, New York, 1994. 
[63] Hebb, D. O., The Organization of Behavior, John Wiley & Sons, New York, 1949. 
[64] Hecht-Nielsen, R., Counterpropagation Networks, Proceedings of IEEE First Interna­
tional Conference on Neural Networks, vol. II, pp. 19-32, 1987. 
[65] Hendler, J., Beyond the Fifth Generation: Parallel AI Research in Japan, IEEE Expert, 
pp. 2-7, Feb. 1994. 
[66] Hester, K. A. et al.. The Predictive RAAM: A RAAM That Can Learn to Distinguish 
Sequences from a Continuous Input Stream, Proceedings of World Congress on Neural 
Networks, vol. 4, pp. 97-103, San Diego, CA, June 1994. 
[67] Hinton, G. D. and Sejnowski, T. J., Learning and Relearning in Boltzmann machines, 
in: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 
I: Foundations, Rumelhart, D. E., McClelland, J. L. and the PDP Research Group, pp. 
282-318, MIT Press, Cambridge, MA, 1986. 
[68] Hinton, G. E., Implementing Semantic Networks in Parallel Hardware, in; Parallel Mod­
els of Associative Memory, Hinton, G. E. and Anderson, J. A. (updated Ed.), Lawrence 
Erlbaum Associates, Hillsdale, NJ, 1989. 
[69] Honavar, V., Toward Learning Systems That Integrate Different Strategies and Rep­
resentations, in; Artificial Intelligence and Neural Networks: Steps Toward Principled 
Integration, Honavar, V. and Uhr, L. (Ed.), pp. 615-644, Academic Press, San Diego, 
CA, 1994. 
147 
[70] Honavar, V., Symbolic Artificial Intelligence and Numeric Artificial Neural Networks: 
Toward A Resolution of the Dichotomy, in: Computational Architectures Integrating 
Symbolic and Neural Processes, Sun, R. and Bookman, L. (Ed.), PP- 351-388, Kluwer 
Academic Publishers, Norwell, MA, 1995. 
[71] Honavar, V. and Uhr, L., Coordination and Control structures and Processes: Possi­
bilities for Connectionist Networks, Journal of Experimental and Theoretical Artificial 
Intelligence 2: 277-302, 1990. 
[72] Honavar, V. and Uhr, L. (Ed.), Artificial Intelligence and Neural Networks: Steps Toward 
Principled Integration, Academic Press, New York, NY, 1994. 
[73] Honavar, V. and Uhr, L., Integrating Symbol Processing Systems and Connectionist 
Networks, in: Intelligent Hybrid Systems, Goonatllake, S. and Khebbal, S. (Ed.), pp. 
177-208. Wiley, London, 1995. 
[74] Hopcroft, J. E. and Ullman, J. D., Introduction to Automata Theory, Languages, and 
Computation, Addison-Wesley, 1979. 
[75] Hopfield, J. J., Neural Networks and Physical Systems with Emergent Collective Com­
putational Abilities, Proc. Natl. Acad. Sci. USA, vol. 79, pp. 2554-2558, Apr. 1982. 
[76] Hopfield, J. J. and Tank, D., Neural Computation of Decision in Optimization Problems. 
Biological Cybernetics, vol. 52, pp. 141-152, 1985. 
[77] Home, B., Hush, D. R. and Abdallah, C., A State Space Recurrent Neural Network 
with Application to Regular Grammatical Inference, UNM Tech. Rep. No. EECE 92-
002, Department of Electrical and Computer Engineering, University of New Mexico, 
Albuquerque, NM, 1992. 
[78] Howe, D. B. and Asanovic, K., SPACE: Symbolic Processing in Associative Computing 
Elements, in: VLSI for Neural Networks and Artificial Intelligence, Delgado-Frias, J. G. 
(Ed.), pp. 243-252, Plenum Press, New York, 1994 
148 
[79] Jackson, T. and Austin, J., The Representation of Knowledge and Rules in Hierarchi­
cal Neural Networks, in: Neural Networks for Knowledge Representation and Inference, 
Levine, D. S. and Aparicio IV, M.(Ekl.), Chapter 8, pp. 206-238, Lawrence Erlbaum 
Associates, HillsdaJe, NJ, 1994. 
[80] Jain, A. N., Waibel, A. and Touretzky, D. S., PARSEC; A Structured Connectionist 
Parsing System for Spoken Language, IEEE Proceedings of the International Conference 
on Acoustics, Speech, and Signal Processing, pp. 205-208, San Francisco, CA, Mar. 1992. 
[81] Jordan, M., Attractor Dynamics and Parallelism in a Connectionism Sequential Machine, 
Program of the Eighth Annual Conference of the Cognitive Science Society, pp. 531-546, 
Lawrence Erlbaum Associates, Hillsdale, NJ, 1986. 
[82] Kitano, H., Speech-to-Speech Translation: A Massively Parallel Memory Based Approach, 
Kluwer Academic Publishers, Norwell, MA, 1994. 
[83] Kleene, S. C., Representation of Events in Nerve Nets and Finite Automata, in: Au­
tomata Studies, Shannon, C. E. and McCarthy, J. (Ed.), pp. 3-42, Princeton University 
Press, Princeton, NJ, 1956. 
[84] Kohonen, T., Correlation matrix memories, IEEE Transactions on Computers, vol. c-21, 
no. 4, pp. 353-359. Apr. 1972. 
[85] Kohonen, T., Associative Memory: A System-Theoretical Approach, Springer, New York, 
1977. 
[86] Kohonen, T., Self-Organization and Associative Memory, Springer-Verlag, Berlin, 1984. 
[87] Kohonen, T., Content-Addressable Memories, 2nd ed.. Springer-Verlag, Berlin, 1987. 
[88] Kogge, P., Oldfield, J., Brule, M. and Stormon, C., VLSI and Rule-based Systems, in: 
VLSI for Artificial Intelligence, Delgado-Frias, J. G. and Moore, W. R. (Ed.), pp. 95-108, 
Kluwer Academic Publishers, Norwell, MA, 1989. 
149 
[89] Kosko, B., Adaptive Bidirectionai Associative Memories, Applied Optics, vol. 26, no. 23, 
pp. 4947-4960, Dec. 1987. 
[90] Kosko, B., Bidirectional Associative Memories, IEEE Transactions on System, Man, and 
Cybernetics, vol. 18, no. 1, pp. 49-60, Jan./Feb. 1988. 
[91] Kumagai, Y., Kamruzzaman, J. and Hikita, H., Further Cross Talk Reduction of Asso­
ciative Memory and Exact Data Retrieval, Proc. of IJCNN, vol 3, pp. 1371-1378, San 
Francisco, 1993. 
[92] Kumar, R., NCMOS: A High Performance CMOS Logic, IEEE Journal of Solid-State 
Circuits, vol. 29, no. 5, pp. 631-633, 1994. 
[93] Kung, S. Y., Digital Neural Networks, Prentice Hall, Englewood Cliffs, NJ, 1993. 
[94] Lacher, R. C. and Nguyen, K. D., Hierarchical Architectures for Reasoning, in: Com­
putational Architectures Integrating Neural and Symbolic Processes: A Perspective on 
the State of the Art, Sun, R. and Bookman, L. (Ed.), Chapter 4, pp. 117-150, Kluwer 
Academic Publishers, Norwell, MA, 1995. 
[95] Lange, T. and Dyer, M.. Frame Selection in a Connectionist Model. Proceedings of the 
11th Cognitive Science Conference, pp. 706-713, Erlbaum, Hillsdale. NJ. 1989. 
[96] Langley, P., Elements of Machine Learning, Morgan Kaufmann, Palo Alto, CA, 1995. 
[97] Lavington, S. H., Wang, C. J.. Kasabov, N. and Lin, S., Hardware Support for Data Par­
allelism in Production Systems, in: VLSI for Neural Networks and Artificial Intelligence, 
Delgado-Frias, J. G. (Ed.), pp. 231-242, Plenum Press, New York, 1994. 
[98] LeCun, Y., Une Procedure D'apprentissage Pour Reseau a Seuil Assymetrique, Proceed­
ings of Cognitiva, pp. 599-604, Paris, 1985. 
[99] Levine, D. 5. and Aparicio IV, M.(Ed.), Neural Networks for Knowledge Representation 
and Inference, Lawrence Eribaum Associates, Hillsdale, NJ, 1994. 
150 
[100] Lewis, H. R. and Papadimitriou, C. H., Elements of the Theory of Computation, Prentice 
Hall, Englewood Clilfs, NJ, 1981. 
[101] Linde, R., Gates, R. and Peng, T., Associative Processor Application to Real-time Data 
Management, AFIPS, Proceedings of the National Computer Conference, vol. 42, pp. 
187-195, 1973. 
[102] Lippmann, R. P., An Introduction to Computing with Neural Nets, IEEE ASSP Maga­
zine, pp. 4-22, Apr. 1987. 
[103] Lont, J. B. and Guggenbiihl W., Analog CMOS Implementation of a Multilayer Perce{>-
tron with Nonlinear Synapses, IEEE Transactions on Neural Networks, vol. 3. no. 3, pp. 
457-465, 1992. 
[104] Lu, F. and Samueli, H.. A 200-MHz CMOS Pipelined Multiplier-Accumulator Using a 
Quasi-Domino Dynamic Full-Adder Cell Design, IEEE Journal of Solid-State Circuits, 
vol. 28, no. 2, pp. 123-132, 1993. 
[105] MacLennan, B. J., Principles of Programming Languages: Design, Evaluation, and Im­
plementation, 2nd edition, CBS College Publishing, New York, NY, 1987. 
[106] Masa, P., Hoen, K. and Wallinga, H., 70 Input, 20 Nanosecond Pattern Classifier, IEEE 
International Joint Conference on Neural Networks, vol. 3, Orlando, FL, 1994. 
[107] Massengill, L. W. and Mundie, D. B., An Analog Neural Network Hardware Imple­
mentation Using Charge-Injection Multipliers and Neuron-Specific Gain Control, IEEE 
Transactions on Neural Networks, vol. 3, no. 3, pp. 354-362, May 1992. 
[108] McCulloch, W. S. and Pitts, W., A Logical Calculus of Ideas Immanent in Nervous 
Activity, Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943. 
[109] McEliece, R. J., Posner, E. C., Rodemich, E. R. and Venkatesh, S. S., The Capacity of 
the Hopfield Associative Memory, IEEE Trans. Inform. Theory, vol. IT-33, no. 4, pp. 
461-482, July 1987. 
151 
[110] McGregor, D., Mclnnes, S. and Henning, M., An Architecture for Associative Processing 
of Large Knowledge Bases (LKBs), Computer Journal, vol. 30, no. 5, pp. 404-412, Oct, 
1987. 
[111] McKenna, T. M., The Role of Interdisciplinary Research Involving Neuroscience in the 
Development of Intelligent Systems, in: Artificial Intelligence and Neural Networks: 
Steps Towards Principled Integration, Honavar, V. and Uhr, L. (Ed.), Academic Press. 
New York, NY, 1994. 
[112] Medsker, L. R., Hybrid Neural Network and Expert Systems, Chapter 1, Kluwer Academic 
Publishers, Norwell, MA, 1994. 
[113] Meyerowitz, A. L., Neural Networks: A Computer Science Perspective, Naval Research 
Review, vol. 43, No. 2. pp. 13-18. 
[114] Micchelli, C. A., Interpolation of Scattered Data: Distance Matrices and Conditionally 
Positive Definite Functions, Constructive Approximation, pp. 11-22, 1986. 
[115] Miclet, L., Structural Methods in Pattern Recognition, Springer-Verlag, New York, 1986. 
[116] Miikkulainen, R, Subsymbolic Parsing of Embedded Structures, in: Computational Ar­
chitectures Integrating Neural and Symbolic Processes: A Perspective on the State of 
the Art, Sun, R. and Bookman, L. (Ed.), Chapter 5, pp. 153-186, Kluwer Academic 
Publishers, Norwell, MA, 1995. 
[117] Minsky, M., Computation: Finite and Infinite Machines, Prentice Hall, Englewood Cliffs, 
NJ, 1967. 
[118] Minsky, M. and Papert, S., Perceptrons: An Introduction to Computational Geometry, 
MIT Press, Cambridge, MA, 1969. 
[119] Mjolsness, E., Connectionist Grammars for High-Level Vision, in: Artificial Intelligence 
and Neural Networks: Steps Toward Principled Integration, Honavar, V. and Uhr, L. 
(Ed.), pp. 423-451, Academic Press, San Diego, CA, 1994. 
152 
[120] Moon, G. et al., VLSI Implementation of Synaptic Weighting and Summing in Pulse 
Coded Neural-Type Cells, IEEE Transactions on Neural Networks, vol. 3, no. 3, pp. 
394-403, May 1992. 
[121] Moulder, R., An Implementation of a Data Management system on associative Processor, 
AFIPS, Proceedings of the National Computer Conference, vol. 42, pp. 171-179, 1973. 
[122] Mozer, M. C. and Bachrach, J., Discovering the Structure of a Reactive Environment by 
Exploration, Neural Computation, vol. 2., no. 4., p. 447, 1990. 
[123] Mozer, M. C. and Das, S., A Connectionist Symbol Manipulator that Discovers the Struc­
ture of Context-Free Languages, Advances in Neural Information Processing Systems 5, 
p. 863, Morgan Kaufmann, San Mateo, CA, 1993. 
[124] Naganuma, J., Ogura, T., Yamada, S. I. and Kimura, T., High-Speed CAM-Based Ar­
chitecture for a Prolog Machine (ASCA), IEEE transactions on Computers, vol. 37, no. 
11, pp. 1375-1383, Nov. 1988. 
[125] Nakano, K., Associatron - A Model of Associative Memory, IEEE Transactions on Sys­
tems, Man, and Cybernetics, vol. SMC-2, no. 3, pp. 380-388, July 1972. 
[126] Newell, A., Symbol Systems, Cognitive Science vol. 4. pp. 13.5-183, 1980. 
[127] Ng, Y. H., Glover, R. J. and Chng, C. L., Unify with active Memory, in; VLSI for 
Artificial Intelligence, Delgado-Frias, J. G. and Moore, W. R. (Ed.), pp. 109-118, Kluwer 
Academic Publishers, Norwell, MA, 1989. 
[128] Niranjan, M. and Fallside, F., Neural Networks and Radial Basis Functions in Classi­
fying Static Speech Patterns, Report CUED/FINFENG/TR 22, University Engineering 
Department, Cambridge University, England, 1988. 
[129] Noda, I. and Nagao, M., A Learning Method for Recurrent Neural Networks Based 
on Minimization of Finite Automata, Proceedings of International Joint Conference on 
Neural Networks, vol. 1, pp. 27-32, IEEE Press, Piscataway, NJ, 1992. 
153 
[130] Norman, D. A., Reflections on Cognition and Parallel Distributed Processing, in; Parallel 
Distributed Processing, McClelland, J., Rumelhard, D. and the PDP Research Group 
(Eki.), MIT Press, Cambridge, MA, 1986. 
[131] Oh, H., The Relaxation Method for Learning in Artificial Neural Networks, Ph.D. Dis­
sertation, Iowa State University, 1992. 
[132] Omlin, C. W. and Giles. C. L., Constructing Deterministic Finite-State Automata in 
Sparse Recurrent Neural Networks, IEEE International Conference on Neural Networks, 
vol. 3, pp. 1732- 1737, Orlando, FL, June 1994. 
[133] Omlin, C. and Giles, C. L., Extraction and Insertion of Symbolic Information in Re­
current Neural Networks, in: Artificial Intelligence and Neural Networks: Steps Toward 
Principled Integration, Honavar, V. and Uhr, L. (Ed.), pp. 271-299, Academic Press. 
New York, NY, 1994. 
[134] Omlin, C. and Giles, C. L., Stable Encoding of Large Finite-State Automata in Recurrent 
Neural Networks with Sigmoid Discriminants, Neural Computation, vol. 8, no. 4, pp. 675-
696, May 1996. 
[135] Ozkarahan, E. A.., Evolution and Implementation of the RAP Database Machine, New 
Generation Computing, vol. 3. pp. 237-271. 1985. 
[136] Palm, G. et al.. Knowledge Processing in Neural Architecture, in: VLSI for Neural 
Networks and Artificial Intelligence, Delgado-Frias, J. G. (Ed.), pp. 207-216. Plenum 
Press, New York, 1994 
[137] Parekh, R. G. and Honavar, V., Automata Induction, Grammar Inference, and Lan­
guage Acquisition, in: Handbook of Natural Language Processing, Moisl, H., Dale, R. 
and Somers, H. (Ed.), Marcel Dekker, New York, 1997. 
[138] Parker, D. B., Learning Logic, Invention Report, S81-64, File 1, Office of Technology 
Licensing, Stanford University, 1982. 
154 
[139] Parker, D. B., Learning Logic, Technical Report TR-47, Center for Computational Re­
search in Economics and Management Science, MIT, Apr. 1985. 
[140] Peter, R., Recursive Functions in Computer Theory, Halsted Press, New York, 1981. 
[141] Pinkas, G., A Fault-Tolerant Connectionist Architecture for Construction of Logic 
Proofs, in: Artificial Intelligence and Neural Networks: Steps Toward Principled In­
tegration, Honavar, V. and Uhr, L. (Ed.), pp. 321-340, Academic Press, San Diego, CA. 
1994. 
[142] Pinkas, G., Propositional Logic, Nonmonotonic Reasoning, and Symmetric Networks -
On Bridging the Gap Between Symbolic and Connectionist Knowledge Representation, 
in: Neural Networks for Knowledge Representation and Inference, Levine, D. S. and 
Aparicio IV, M.(Ed.), Chapter 7, pp. 175-203, Lawrence Erlbaum Associates, Hillsdale, 
NJ, 1994. 
[143] Pollack, J. B., On Connectionist Models of Language Processing, Ph.D. Dissertation, 
Computer Science Department, University of Illinois, Urbana-Champaign. IL, 1987. 
[144] Pollack, J. B., Recursive Distributed Representations, Artificial Intelligence, vol. 46, pp. 
77-105, 1990. 
[145] Popescu, I., Hierarchical Neural Networks for Rules Control in Knowledge-Based Expert 
Systems, Neural, Parallel & Scientific Computations, vol. 3, pp. 379-392, 1995. 
[146] Powell, M. J. D., Radial Basis Function for Multi-variable Interpolation: A Review, 
IMA Conference on Algorithms for the Approximation of Functions and Data, RMCS, 
Shrivenham, England. Also Report DAMTP/NA12, Department of Applied Mathematics 
and Theoretical Physics, University of Cambridge, 1985. 
[147] Powell, M. J. D., Radial Basis Function for Multi-variable Interpolation: A Review, 
Algorithms for Approximation, Mason, J. C. and Cox, M. G. (Ed.), pp. 143-167, Oxford; 
Clarendon Press, 1987. 
155 
[148] Raghupathi, "HP" W. et al., Toward Connectionist Representation of Legal Knowledge, 
in: Neural Networks for Knowledge Representation and Inference, Levine, D. S. and 
Aparicio IV, M.(Ed.), Chapter 10, pp. 269-282, Lawrence Erlbaum Associates, Hillsdale. 
NJ, 1994. 
[149] RenaJs, S. and Rohwer, R., Phoneme Classification Experiments Using Radial Basis 
Functions, Proceedings of the IEEE/INNS International Joint Conference on Neural Net­
works, vol. I, pp. 461-467, Washington, D. C., June 1989. 
[150] Ripley, B. D., Pattern Recognition and Neural Networks, Cambridge University Press. 
New York, 1996. 
[151] Robinson, L, The Pattern Addressable Memory: Hardware for Associative Processing, 
in: VLSI for Artificial Intelligence, Delgado-Frias, J. G. and Moore, W. R. (Ed.), pp. 
119-129., Kluwer Academic Publishers, Norwell, MA, 1989. 
[152] Robinson, M. E. et al., A Modular CMOS Design of a Hamming Network, IEEE Trans­
actions on Neural Networks, vol. 3, no. 3, pp. 444-456, 1992. 
[153] Rodohan, D. and Glover, R., A Distributed Parallel Associative Processor (DPAP) for the 
Execution of Logic Programs, in: VLSI for Neural Networks and Artificial Intelligence, 
Delgado-Frias, J. G. (Ed.), pp. 265-273, Plenum Press, New York, 1994. 
[154] Rogers, Jr., H., Theory of Recursive Functions and Effective Computability, MIT Press, 
Cambridge, MA, 1987. 
[155] Rosenblatt. F., Principles of Neurodynamics: Perceptrons and the Theory of Brain Mech­
anisms, Spartan Books, Washington, D.C., 1962. 
[156] Rumelhart, D. E., McClelland, J. L. and the PDP Research Group, Parallel Distributed 
Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations, MIT 
Press. Cambridge, MA. 1986. 
156 
[157] Salton, G. and McGill, M. J., Introduction to Modem Information Retrieval, McGraw-
Hill, New York, 1983. 
[158] Salton, G., Automatic Text Processing: The Transformation, Analysis, and Retrieval of 
Information by Computer, Addison-Wesly, Reading, MA, 1989. 
[159] Sanfeliu, A. and Alquezar, R., Understanding Neural Networks for Grammatical Infer­
ence and Recognition, in: Advances in Structural and Syntactic Pattern Recognition, 
Bunke, H. (Ed.), World Scientific, Singapore, 1992. 
[160] Schneider, W., Connectionism: Is it a Paradigm Shift for Psychology?, Behavior Research 
Methods, Instruments, and Computers, vol. 19, pp. 73-83, 1987. 
[161] Schulenburg, D., Sentence Processing with Realistic Feedback, IEEE/INNS International 
Joint Conference on Neural Networks, vol. IV, pp. 661-666, Baltimore, MD, 1992. 
[162] Schuster, S. A., Nguyen, H. B., Ozkarahan, E. A. and Smith, K. C., RAP.2 - An Asso­
ciative Processor for Databases and Its Applications, IEEE Transactions on Computers. 
vol. C-28, no. 6, pp. 446-458, 1979. 
[163] Sedgewick, R., Algorithms, 2nd ed., Addison-Wesley, Reading, MA, 1988. 
[164] Selman, B. and Hirst, G., A Rule-based Connectionist Parsing System, Proceedings of 
the Seventh Annual Conference of the Cognitive Science Society, Irvine, CA, 1985. 
[165] S^uin, C. H. and Clay, R. D., Fault Tolerance in Artificial Neural Networks, Proc. 
IJCNN, vol. 1, pp. 703-708, San Diego, 1990. 
[166] Servan-Schreiber, D., Cleeremans. A. and McClelland, J. L., Graded State Machines: 
The Representation of Temporal Contingencies in Simple Recurrent Neural Networks, 
in: Artificial Intelligence and Neural Networks: Steps Toward Principled Integration, 
Honavar, V. and Uhr, L. (Ed.), pp. 241-269, Academic Press, New York, NY, 1994. 
[167] Shastri, L., A Connectionist Approach to Knowledge Representation and Limited Infer­
ence, Cognitive Science, 12, pp. 331-392, 1988. 
157 
[168] Shavlik, J. W., A Framework for Combining Symbolic and Neural Learning, in: Artificial 
Intelligence and Neural Networks: Steps Toward Principled Integration, Honavar, V. and 
Uhr, L. (Ed.), pp. 561-580, Academic Press, San Diego, CA, 1994. 
[169] Siegelman, H. T. and Sontag, E. D., Turing-Computability with Neural Nets, Applied 
Mathematics Letters, vol. 4, no. 6, pp. 77-80, 1991. 
[170] Sippu, S. and Soisalon-Soininen, E., Parsing Theory, vol. II: LR(k) nad LL(k) Parsing, 
Springer-Verlag, Berlin, 1990. 
[171] Sloan, M. E., Computer Hardware and Organization, Science Research Associates, 
Chicago, 1976. 
[172] Soucek, B. and the IRIS Group (EM.), Neural and Intelligent Systems Integrations: Fifth 
and Sixth Generation Integrated Reasoning Information Systems, John Wiley & Sons, 
New York, 1992. 
[173] Stanfill, C. and Waltz, D., Toward Memory-Based Reasoning, Communications of the 
ACM, vol. 29, pp. 1213-1228, 1986. 
[174] Sun, G. Z., Giles, C. L., Chen, H. H. and Lee, Y. C., The Neural Network Pushdown 
.A.utomation: Model, Stack and Learning Simulations, Technical Report UMIA CS-TR-
93-77, Aug. 1993. 
[175] Sun, R., On Variable Binding in Connectionist Networks, Connection Science, vol. 4, no. 
2, pp. 93-124, 1992. 
[176] Sun, R., Logics and Variables in Connectionist Models: A Brief Overview, Symbolic 
Processors and Connectionist Networks for Artificial Intelligence and Cognitive Modeling, 
Academic Press, New York, NY, 1994. 
[177] Sun, R., Connectionist Models of Commonsense Reasoning, in: Neural Networks for 
Knowledge Representation and Inference, Levine, D. S. and Aparicio IV, M.(Ed.), Chap­
ter 9, pp. 241-268, Lawrence Erlbaum Associates, Hillsdale, NJ, 1994. 
158 
[178] Sun, R., A Two-Level Hybrid Architecture for Structuring Knowledge for Commonsense 
Reasoning, in: Computational Architectures Integrating Neural and Symbolic Processes: 
A Perspective on the State of the Art, Sun, R. and Bookman, L. (Ed.), Chapter 8, pp. 
247-281, Kluwer Academic Publishers, Norwell, MA, 1995. 
[179] Sun, R. and Bookman, L. (Eld.), Computational Architectures Integrating Symbolic and 
Neural Processes, Kluwer Academic Publishers, Norwell, MA, 1995. 
[180] Swaminathan, G., Srinivasan, S., Mitra, S., Minnix, J., Johnson, B. and liiigo, R., Fault 
Tolerance of Neural Networks, Proc. of IJCNN, vol 2, pp. 699-702, Washington DC, 
1990. 
[181] Thurber, K. J. and Wald, L. D., Associative and Parallel Processors, Computing Surveys, 
vol. 7, no. 4, pp. 215-255, 1975. 
[182] Touretzky, D. S. and Hinton, G. E., Symbols among the Neurons: Details of a Connec-
tionist Inference Architecture, Proceedings of the 9th International Joint Conference on 
Artificial Intelligence, pp. 238-243, Morgan Kaufman, 1985. 
[183] Turing, A. M., On Computable Numbers with an Application to the Entscheidungs-
problem, Proceedings of the London Mathematical Society, Ser. 2, vol. 2, pp. 230-265, 
1936. 
[184] Uchimura, K. et al.. An 8G Connection-per-second 54mW Digital Neural Network with 
Low-power Chain-Reaction Architecture, ISSCC Dig. Tech. Papers, pp. 134-135, San 
Francisco, CA, 1992. 
[185] Uhr, L. and Honavar, V., Artificial Intelligence and Neural Networks: Steps Toward 
Principled Integration, in: Artificial Intelligence and Neural Networks: Steps Toward 
Principled Integration, Honavar, V. and Uhr, L. (Ed.), pp. xvii-xxxii. Academic Press, 
New York, NY, 1994. 
[186] Ullman J. D., Principles of Databases and Knowledge-base Systems, vol. 1, Chapter 6, 
Computer Science Press, Maryland, 1988. 
159 
[187] Van der Velde, F., Symbol Manipulation with Neural Networks: Production of a Context-
free Language Using a Modifiable Working Memory, Connection Science, vol. 7, no. 3 k 
4, pp. 247-280, 1995. 
[188] Veezhinathan, J. and McCormick, B. H., Connectionist Plan Reminding in a Hybrid 
Planning Model, Proceedings of IEEE International Conference on Neural Networks, 
vol. II, pp. 515-523, San Diego, CA, 1988. 
[189] Wasserman, P. D., NeuralSource: The Bibliographic Guide to Artificial Neural Networks, 
Van Nostrand Reinhold, New York, 1990. 
[190] Watanabe, T. et al., A Single 1.5-V Digital Chip for a 10® Synapse Neural Network, 
IEEE Transactions on Neural Networks, vol. 4, no. 3, pp. 387-393, May 1993. 
[191] Waterman, D. A. and Hayes-Roth, F. (Ekl.), Pattern-Directed Inference Systems, Aca­
demic Press, New York, NY, 1978. 
[192] Watrous, R. L. and Kuhn, G. M., Induction of Finite-State Languages Using Second-
Order Recurrent Neural Networks, Neural Computation, vol. 4, No. 3, p. 406, 1992. 
[193] Werbos, P. J., Beyond Regression: New Tools for Prediction and Analysis in the Behav­
ioral Sciences, Ph.D. Dissertation, Harvard University, 1974. 
[194] Williams, R. J. and Zipser, D., A Learning Algorithm for Continually Running Fully 
Recurrent Neural Networks, Neural Computation, vol. 1, pp. 270-280, 1989. 
[195] Wood, D., Theory of Computation, John Wiley & Sons, New York, 1987. 
[196] Yau, S. S. and Fung, H. S., Associative Processor Architecture - A Survey, ACM Com­
puting Surveys, vol. 9, pp. 3-27, 1977. 
[197] Zeng, Z., Goodman, R. M. and Smyth, P., Discrete Recurrent Neural Networks for 
Grammatical Inference, IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 320-
330, Mar. 1994. 
IMAGE EVALUATION 
TEST TARGET (QA-3) 
150mm 
IIVMGE. Inc 
1653 East Main Street 
Rochester. NY 14609 USA 
Phone: 716/482-0300 
Fax: 716^88-5989 
01993. Applied Image. Inc.. All Rights Reserved 
