Search CORE

5 research outputs found

Embedding speech into virtual realities

Author: Bohn Christian-Arved
Krueger Wolfgang
Publication venue
Publication date
Field of study

In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment

NASA Technical Reports Server

Applying Levenberg-Marquardt algorithm with block-diagonal Hessian approximation to recurrent neural network training.

Author
Publication venue
Publication date: 01/01/1999
Field of study

by Chi-cheong Szeto.Thesis (M.Phil.)--Chinese University of Hong Kong, 1999.Includes bibliographical references (leaves 162-165).Abstracts in English and Chinese.Abstract --- p.iAcknowledgment --- p.iiTable of Contents --- p.iiiChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Time series prediction --- p.1Chapter 1.2 --- Forecasting models --- p.1Chapter 1.2.1 --- Networks using time delays --- p.2Chapter 1.2.1.1 --- Model description --- p.2Chapter 1.2.1.2 --- Limitation --- p.3Chapter 1.2.2 --- Networks using context units --- p.3Chapter 1.2.2.1 --- Model description --- p.3Chapter 1.2.2.2 --- Limitation --- p.6Chapter 1.2.3 --- Layered fully recurrent networks --- p.6Chapter 1.2.3.1 --- Model description --- p.6Chapter 1.2.3.2 --- Our selection and motivation --- p.8Chapter 1.2.4 --- Other models --- p.8Chapter 1.3 --- Learning methods --- p.8Chapter 1.3.1 --- First order and second order methods --- p.9Chapter 1.3.2 --- Nonlinear least squares methods --- p.11Chapter 1.3.2.1 --- Levenberg-Marquardt method ´ؤ our selection and motivation --- p.13Chapter 1.3.2.2 --- Levenberg-Marquardt method - algorithm --- p.13Chapter 1.3.3 --- "Batch mode, semi-sequential mode and sequential mode of updating" --- p.15Chapter 1.4 --- Jacobian matrix calculations in recurrent networks --- p.15Chapter 1.4.1 --- RTBPTT-like Jacobian matrix calculation --- p.15Chapter 1.4.2 --- RTRL-like Jacobian matrix calculation --- p.17Chapter 1.4.3 --- Comparison between RTBPTT-like and RTRL-like calculations --- p.18Chapter 1.5 --- Computation complexity reduction techniques in recurrent networks --- p.19Chapter 1.5.1 --- Architectural approach --- p.19Chapter 1.5.1.1 --- Recurrent connection reduction method --- p.20Chapter 1.5.1.2 --- Treating the feedback signals as additional inputs method --- p.20Chapter 1.5.1.3 --- Growing network method --- p.21Chapter 1.5.2 --- Algorithmic approach --- p.21Chapter 1.5.2.1 --- History cutoff method --- p.21Chapter 1.5.2.2 --- Changing the updating frequency from sequential mode to semi- sequential mode method --- p.22Chapter 1.6 --- Motivation for using block-diagonal Hessian matrix --- p.22Chapter 1.7 --- Objective --- p.23Chapter 1.8 --- Organization of the thesis --- p.24Chapter Chapter 2 --- Learning with the block-diagonal Hessian matrix --- p.25Chapter 2.1 --- Introduction --- p.25Chapter 2.2 --- General form and factors of block-diagonal Hessian matrices --- p.25Chapter 2.2.1 --- General form of block-diagonal Hessian matrices --- p.25Chapter 2.2.2 --- Factors of block-diagonal Hessian matrices --- p.27Chapter 2.3 --- Four particular block-diagonal Hessian matrices --- p.28Chapter 2.3.1 --- Correlation block-diagonal Hessian matrix --- p.29Chapter 2.3.2 --- One-unit block-diagonal Hessian matrix --- p.35Chapter 2.3.3 --- Sub-network block-diagonal Hessian matrix --- p.35Chapter 2.3.4 --- Layer block-diagonal Hessian matrix --- p.36Chapter 2.4 --- Updating methods --- p.40Chapter Chapter 3 --- Data set and setup of experiments --- p.41Chapter 3.1 --- Introduction --- p.41Chapter 3.2 --- Data set --- p.41Chapter 3.2.1 --- Single sine --- p.41Chapter 3.2.2 --- Composite sine --- p.42Chapter 3.2.3 --- Sunspot --- p.43Chapter 3.3 --- Choices of recurrent neural network parameters and initialization methods --- p.44Chapter 3.3.1 --- "Choices of numbers of input, hidden and output units" --- p.45Chapter 3.3.2 --- Initial hidden states --- p.45Chapter 3.3.3 --- Weight initialization method --- p.45Chapter 3.4 --- Method of dealing with over-fitting --- p.47Chapter Chapter 4 --- Updating methods --- p.48Chapter 4.1 --- Introduction --- p.48Chapter 4.2 --- Asynchronous updating method --- p.49Chapter 4.2.1 --- Algorithm --- p.49Chapter 4.2.2 --- Method of study --- p.50Chapter 4.2.3 --- Performance --- p.51Chapter 4.2.4 --- Investigation on poor generalization --- p.52Chapter 4.2.4.1 --- Hidden states --- p.52Chapter 4.2.4.2 --- Incoming weight magnitudes of the hidden units --- p.54Chapter 4.2.4.3 --- Weight change against time --- p.56Chapter 4.3 --- Asynchronous updating with constraint method --- p.68Chapter 4.3.1 --- Algorithm --- p.68Chapter 4.3.2 --- Method of study --- p.69Chapter 4.3.3 --- Performance --- p.70Chapter 4.3.3.1 --- Generalization performance --- p.70Chapter 4.3.3.2 --- Training time performance --- p.71Chapter 4.3.4 --- Hidden states and incoming weight magnitudes of the hidden units --- p.73Chapter 4.3.4.1 --- Hidden states --- p.73Chapter 4.3.4.2 --- Incoming weight magnitudes of the hidden units --- p.73Chapter 4.4 --- Synchronous updating methods --- p.84Chapter 4.4.1 --- Single λ and multiple λ's synchronous updating methods --- p.84Chapter 4.4.1.1 --- Algorithm of single λ synchronous updating method --- p.84Chapter 4.4.1.2 --- Algorithm of multiple λ's synchronous updating method --- p.85Chapter 4.4.1.3 --- Method of study --- p.87Chapter 4.4.1.4 --- Performance --- p.87Chapter 4.4.1.5 --- Investigation on long training time: analysis of λ --- p.89Chapter 4.4.2 --- Multiple λ's with line search synchronous updating method --- p.97Chapter 4.4.2.1 --- Algorithm --- p.97Chapter 4.4.2.2 --- Performance --- p.98Chapter 4.4.2.3 --- Comparison of λ --- p.100Chapter 4.5 --- Comparison between asynchronous and synchronous updating methods --- p.101Chapter 4.5.1 --- Final training time --- p.101Chapter 4.5.2 --- Computation load per complete weight update --- p.102Chapter 4.5.3 --- Convergence speed --- p.103Chapter 4.6 --- Comparison between our proposed methods and the gradient descent method with adaptive learning rate and momentum --- p.111Chapter Chapter 5 --- Number and sizes of the blocks --- p.113Chapter 5.1 --- Introduction --- p.113Chapter 5.2 --- Performance --- p.113Chapter 5.2.1 --- Method of study --- p.113Chapter 5.2.2 --- Trend of performance --- p.115Chapter 5.2.2.1 --- Asynchronous updating method --- p.115Chapter 5.2.2.2 --- Synchronous updating method --- p.116Chapter 5.3 --- Computation load per complete weight update --- p.116Chapter 5.4 --- Convergence speed --- p.117Chapter 5.4.1 --- Trend of inverse of convergence speed --- p.117Chapter 5.4.2 --- Factors affecting the convergence speed --- p.117Chapter Chapter 6 --- Weight-grouping methods --- p.125Chapter 6.1 --- Introduction --- p.125Chapter 6.2 --- Training time and generalization performance of different weight-grouping methods --- p.125Chapter 6.2.1 --- Method of study --- p.125Chapter 6.2.2 --- Performance --- p.126Chapter 6.3 --- Degree of approximation of block-diagonal Hessian matrix with different weight- grouping methods --- p.128Chapter 6.3.1 --- Method of study --- p.128Chapter 6.3.2 --- Performance --- p.128Chapter Chapter 7 --- Discussion --- p.150Chapter 7.1 --- Advantages and disadvantages of using block-diagonal Hessian matrix --- p.150Chapter 7.1.1 --- Advantages --- p.150Chapter 7.1.2 --- Disadvantages --- p.151Chapter 7.2 --- Analysis of computation complexity --- p.151Chapter 7.2.1 --- Trend of computation complexity of each calculation --- p.154Chapter 7.2.2 --- Batch mode of updating --- p.155Chapter 7.2.3 --- Sequential mode of updating --- p.155Chapter 7.3 --- Analysis of storage complexity --- p.156Chapter 7.3.1 --- Trend of storage complexity of each set of variables --- p.157Chapter 7.3.2 --- Trend of overall storage complexity --- p.157Chapter 7.4 --- Parallel implementation --- p.158Chapter 7.5 --- Alternative implementation of weight change constraint --- p.158Chapter Chapter 8 --- Conclusions --- p.160References --- p.16

CUHK Digital Repository

Proceedings of the 1993 Conference on Intelligent Computer-Aided Training and Virtual Environment Technology, Volume 1

Author: Hyde Patricia R.
Loftin R. Bowen
Publication venue
Publication date
Field of study

These proceedings are organized in the same manner as the conference's contributed sessions, with the papers grouped by topic area. These areas are as follows: VE (virtual environment) training for Space Flight, Virtual Environment Hardware, Knowledge Aquisition for ICAT (Intelligent Computer-Aided Training) & VE, Multimedia in ICAT Systems, VE in Training & Education (1 & 2), Virtual Environment Software (1 & 2), Models in ICAT systems, ICAT Commercial Applications, ICAT Architectures & Authoring Systems, ICAT Education & Medical Applications, Assessing VE for Training, VE & Human Systems (1 & 2), ICAT Theory & Natural Language, ICAT Applications in the Military, VE Applications in Engineering, Knowledge Acquisition for ICAT, and ICAT Applications in Aerospace

NASA Technical Reports Server

Evolving systems for connectionist-based speech recognition

Author: Kilgour Richard
Publication venue
Publication date: 19/02/2007
Field of study

xv, 519 p. ; 30 cm. Includes bibliographical references. University of Otago department: Information Science. "June 18, 2003".Although studied for several years, speech recognition is still a field that is developing. Recently several important researchers have pointed out areas within the field that need to be addressed. These include robustness to various environments, large or expandable vocabularies, user-friendliness, high recognition accuracy and the ability to recognise continuous speech. The ability to adapt is an important component of a speech recognition system. People new to the system should have the benefits mentioned above. The system should also manage recognition of different speaking rates. Also, novel environments may cause a drop in the system's performance if it lacks robustness or the ability to adapt. A common target for speech recognition algorithms is to detect the presence of speech units, commonly phonemes. This approach involves grouping speech sounds, or phones, into abstract groups that reflect meaning. Recently artificial neural networks have been applied to this task. Nevertheless, uncertainty and ambiguity are inherent in the neural network recognition process. Several novel techniques are proposed to aid in the recognition process, and to help to fulfil the requirements of a successful speech recognition system. The goal of this research is to investigate theories of speech and language processing that are relevant to speech recognition and spoken language understanding. These theories have their foundations in fields such as engineering, computer science, linguistics, natural language processing, psycholinguistics and psychology. An adaptive system is implemented to test the validity and usefulness of such work to the fields of speech recognition and spoken language understanding. For example, the development of abstract structures of the human auditory system and the auditory cortex are investigated, and applied towards better engineering methods for building adaptive speech and language systems. For the implementation of an adaptive speech recognition system, parameters are introduced that can be adjusted either manually or automatically. In this manner, the system can adapt to new speakers and environments. The architecture of the system is modular and hierarchical. Different methods are applied at various levels. For example, artificial neural networks are best suited for low-level processing. A discussion of how errors and uncertainty may be resolved in an unsupervised manner concludes the work. Ideally, the system will adapt to the situation, and the future occurrences of such phenomena may be reduced or eliminated.UnpublishedAbu Hosan, R., Boucher, P., Brugnara, F., De Mori, R., Galler, M., and Snow, M. (1995). Acoustic modeling. Annual report 1995, Centre for Intelligent Machines, McGill University. Barras, C., Caraty, M., and Montacie, C. (1995). Temporal control and training selection for hmm based system. In Eurospeech 95. Bartlett, C. (1992). Regional variation in New Zealand English: The case of Southland. New Zealand English Newsletter, 6:5-15. Bayard, D. and Bartlett, C. (1996). "you must be from Gorrre": Attitudinal effects of Southland rhotic accents and speaker gender on NZE listeners and the question of NZE regional variation. Te Reo, 39:25-45. Bell, A. (1997). Those short front vowels. New Zealand English Journal, 11:3-13. Bengio, Y. (1999). Markovian models for sequential data. Neural Computing Surveys, 2:129-162. Bengio, Y., De Mori, R., and Cardin, R. (1990). Speaker independent speech recognition with neural networks and speech knowledge. In Touretzky, D. E., editor, Advances in Neural Information Processing Systems 2, pages 218-225. Morgan Kaufmann. Bergland, G. D. (1969). A guided tour of the fast fourier transformation. IEEE Spectrum, pages 41-52. Berndt, R. S., Caramazza, A., and Zurif, E. (1983). Language functions: Syntax and semantics. In Segalowitz, S. J., editor, Language Functions and Brain Organization, pages 5-28. Academic Press, New York. Bertoncini, J. B., Bijeljac-Babic, R., Jusczyk, P. W., Kennedy, J. L., and Mehler, J. (1988). An investigation of young infants' perceptual representations of speech sounds. Journal of Experimental Psychology: General, 117(1):21-33. Black, A. W. and Taylor, P. (1994). CHATR: A generic speech synthesis system. In COLING-94, volume 2, pages 983-986, Kyoto, Japan. Black, A. W., Taylor, P., and Caley, R. (1999). The Festival speech synthesis system. System Documentation Edition 1.3, University of Edinburgh. Brennan, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1993). Classification and Regression Trees. The Wadsworth statistics/probability series. Chapman & Hall, New York, NY. Burgess, N. (1994). A constructive algorithm that converges for real-valued input patterns. International Journal of Neural Systems, 5(1):59-66. Campbell, N. (1996). CHATR: A high-definition speech re-sequencing system. In Acoustical Society of America and Acoustical Society of Japan Third Joint Meeting. Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., and Rosen, D. B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Transactions on Neural Networks, 3:698-713. Carpenter, G. A. and Tan, A. (1995). Rule extraction: From neural architecture to symbolic representation. Connection Science, 7(1):3-27. Cassidy, S. (1999). Compiling multi-tiered speech databases into the relational model: Experiments with the Emu system. In Proszeky, G., Nemeth, G., and Mandli, J., editors, EuroSpeech, volume 5, pages 2239-2242, Budapest, Hungary. Chang, J. and Glass, J. (1997). Segmentation and modeling in segment-based recognition. In Proc. Eurospeech 1997, pages 1199-1202. Chen, S. and Liao, Y. (1998). Modular recurrent neural networks for mandarin syllable recognition. IEEE Transactions on Neural Networks, 9(6):1430-1441. Clements, G. N. (1990). The role of the sonority cycle in core syllabification. In Kingston, J. and Beckman, M., editors, Papers in Laboratory Phonology I. Cambridge University Press, Cambridge. Cole, R., Hirschman, L., Atlas, L., Beckman, M., Biermann, A., Bush, M., Clements, M., Cohen, J., Garcia, 0., Hanson, B., Hermansky, H., Levinson, S., McKeown, K., Morgan, N., Novick, D., Ostendorf, M., Oviatt, S., Price, P., Silverman, H., Spitz, J., Waibel, A., Weinstein, C., Zahorian, S., and Zue, V. (1995). The challenge of spoken language systems research directions for the nineties. IEEE Transactions on Speech and Audio Processing, 3:1-21. Cole, R. A., Muthusamy, Y., and Fanty, M. A. (1990). The ISOLET spoken letter database. Technical Report 90-004, Oregon Graduate Institute. Craven, M. W. and Shavlik, J. W. (1993). Learning symbolic rules using artificial neural networks. In Proceedings of the Tenth International Conference on Machine Learning, pages 73-80, Amherst , MA. Craven, M. W. and Shavlik, J. W. (1994). Using sampling and queries to extract rules from trained neural networks. In Cohen, W. W. and Hirsh, H., editors, Machine Learning: Proceedings of the Eleventh International Conference, San Francisco, CA. Morgan Kaufmann. Craven, M. W. and Shavlik, J. W. (1997). Using neural networks for data mining. Future Generation Computer Systems, 13(Special Issue on Data Mining):211-229. Craven, M. W. and Shavlik, J. W. (1999). Rule extraction: Where do we go from here? Working Paper 99-1, University of Wisconsin Machine Learning Research Group. Date, C. J. (1990). An Introduction to Database Systems, volume 1. Addison-Wesley, Reading, MA, 5 edition. Davis, S. B. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357-366. Dehaene-Lambertz, G. and Baillet, S. (1998). A phonological representation in the infant brain. NeuroReport, 9(8):1885-1888. Deverson, T. (1990). `woman's consistancy': A distinctive zero plural in New Zealand English. Te Reo, 33:43-56. Dongxin, X., Taiyi, H., and Zhiwei, L. (1990). A hierarchical structure for feed-forward neural networks and its application to speaker-independent speech recognition. In 10th International Conference on Pattern Recognition. IEEE Computer Society Press. Elman, J. L. (1990). Finding structures in time. Cognitive Sciences, 14:179-211. Esparcia-Alcazar, A. I. and Sharman, K. C. (1996). Evolving recurrent neural network architectures by genetic programming. Technical Report CSC-96009, Centre for Systems and Control, University of Glasgow. Fahlman, S. E. and Lebiere, C. (1990). The cascade-correlation learning architecture. Technical Report CMU-CS-90-100, School of Computer Science, Carnegie Mellon University. Feldkamp, L. A., Puskorius, G. V., Yuan, F., and Davis, Jr., L. I. (1992). Architecture and training of a hybrid neural-fuzzy system. In international conference on Fuzzy Logic Neural Networks, pages 131-134, Iizuka, Japan. Fletcher, J. and Obradovic, Z. (1993). Combining prior symbolic knowledge and constructive neural network learning. Connection Science, 5(3 & 4):365-375. Foldi, N. S., Cicone, M., and Gardner, H. (1983). Pragmatic aspects of communication in brain damaged patients. In Segalowitz, S. J., editor, Language Functions and Brain Organization, pages 55-86. Academic Press, New York. Fowler, C. A., Best, C. T., and McRoberts, G. W. (1990). Young infants' perception of liquid coarticulatory influences on following stop consonants. Perception and Psychophysics, 48(6):559-570. Franzini, M. A., Witbrock, M. J., and Lee, K. (1989). Speaker-independent recognition of connected utterances using recurrent and non-recurrent neural networks. In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, pages 1-6. IEEE. Frean, M. (1990). The Upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation, 2:198-209. Gallinari, P. (1995). Training of modular neural net systems. In Arbib, M., editor, The Handbook of Brain Theory and Neural Networks, pages 582-585. MIT Press. Garrett, M. (1994). The structure of language processing: Neuropsychological evidence. In Gazzaniga, M., editor, The Cognitive Neurosciences, pages 881-899. MIT press, Cambridge, MA. Ghada, N. A. and Mohamed, A. S. (2000). Evolution of recurrent cascade correlation networks with a distributed collaborative species. In The First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks, page To Appear, San Antonio, TX,. Giegerich, H. J. (1992). English phonology: an Introduction. University Press. Gimson, A. C. and Cruttenden, A. (1994). Gimson's Pronunciation of English. Hodder and Stoughton, fifth edition. Goldberg, D. (1989). Genetic Algorithms is Search, Optimization and Machine Learning. Addison Wesley, New York. Gordon, E. and Deverson, T. (1985). New Zealand English. Heinemann. Gordon, E. and Maclagan, M. (1990). A longitudinal study of the 'ear'/'air' contrast in New Zealand speech. In Bell, A. and Holmes, J., editors, New Zealand ways of speaking English, pages 129-148. Multilingual Matters, Bristol. Gordon, E. and Maclagan, M. (1995). The changing sound of new zealand english. The New Zealand Speech-Language Therapists' Journal, 50:32-40. Gordon, E. and MacLagan, M. A. (1983). A study of the /ia/ - /ea/ contrast in New Zealnd English. The New Zealand Speech Language Therapists' Journal, November:16-26. Gordon, E. and Trudgill, P. (1999). Shades of things to come: Embryonic varients in New Zealand English sound changes. English World Wide, 21(1):111-124. Gori, M., Bengio, Y., and De Mori, R. (1989). BPS: A learning algorithm for capturing the dynamic nature of speech. In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, pages 417-423. IEEE. Haggo, D. (1984). Transcribing New Zealand English vowels. Te Reo, 27:63-67. Hampshire, J. B. and Waibel, A. (1990). Connectionist architectures for multi-speaker phoneme recognition. In Touretzky, D. E., editor, Advances in Neural Information Processing Systems 2, pages 203-210. Morgan Kaufmann. Handel, S. (1991). The physiology of listening. In Listening: An Introduction to the Perception of Auditory Events, pages 461-545. Bradford, second edition. Harris, F. J. (1978). On the use of windows for harmonic analysis with the discrete Fourier transform. Proceeding of the IEEE, 66:51-83. Hertz, J., Krough, A., and Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley. Hieronymus, J. L. (1994). ASCII phonetic symbols for the world's languages: Worldbet. Technical report, AT&T. Hilario, M., Pellegrini, C., and Alexandre, F. (1994). Modular integration of connectionist and symbolic processing in knowledge-based systems. In Proc. ISIKNH'94: International Symposium on Integrating Knowledge and Neural Heuristics, pages 123-132, Pensacola, Florida. Holmes, J. (1995a). Three chairs for New Zealand English: the EAR/AIR merger. English Today, 11(3):14-18. Holmes, J. (1995b). Two for /t/: flapping and glottal stops in New Zealand English. Te Reo: Journal of the Linguistic Society of New Zealand, 38:53-72. Holmes, J. (1995c). The Wellington Corpus of Spoken New Zealand English: A progress report. New Zealand English Journal, 9:5-8. Holmes, J. (1996). The New Zealand spoken component of ICE: Some methodological challanges. In Greenbaum, S., editor, Comparing English Worldwide, pages 163-181. Oxford University Press, Oxford. Holmes, J. (1997). T-time in New Zealand. English Today, 13(3):18-22. Jankowski, N. and Kadirkamanathan, V. (1997b). Statistical control of growing and pruning in rbf-like neural networks. In In Third Conference on Neural Networks and Their Applications, pages 663-670, Kule, Poland. Jankowski, N. and Kadirkamanathan, V. (1997a). Statistical control of rbf-like networks for classification. In In 7th International Conference on Artificial Neural Networks, pages 385- 390, Lausanne, Switzerland. Springer-Verlag. Jensen, J. T. (1993). English Phonology. John Benjamin. Jordan, M. I. (1989). Serial order: A parallel, distributed processing approach. In Elman and Rumelhart, editors, Advances in Connectionist Theory: Speech. Erlbaum. Jusczyk, P. W., Friederici, A. D., Wessles, J. M. I., Svenkerud, V. Y., and Jusczyk, A. M. (1993). Infants' sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32:402-420. Kadous, M. W. (1999). Learning comprehensible descriptions of multivariate time series. In Proc. 16th International Conf. on Machine Learning, pages 454-463. Kasabov, N. (1994). Towards using hybrid connectionist fuzzy production systems for speech recognition. In Proceedings of the WWW, Nagoya, Japan. Kasabov, N. (1996a). Adaptable connectionist productionist systems. Neurocomputing, 13:95- 117. Kasabov, N. (1996b). Foundations of Neural Networks, Fuzzy Systems and Knowledge Engineering. MIT Press, Cambridge, MA. Kasabov, N. (1996c). Learning and aproximate reasoning in fuzzy neural networks and hybrid systems. Fuzzy Sets and Systems, 82:135-149. Kasabov, N. (2000). Evolving fuzzy neural networks for supervised/unsupervised on-line, knowledge-based learning. IEEE Transactions on Man, Machine and Cybernetics; Part B: Cybernetics, To Appear. Kasabov, N., Kilgour, R., and Sinclair, S. (1999a). From hybrid adjustable neuro-fuzzy systems to adaptive connectionist-based systems for phoneme and word recognition. Fuzzy Sets and Systems, 103:349-367. Kasabov, N., Kim, J. S., Watts, M., and Gray, A. (1997). FuNN/2 - a fuzzy neural network architecture for adaptive learning and knowledge acquisition. Information Sciences - Applications, 101(3-4):155-175. Kasabov, N., Kozma, R., Kilgour, R., Laws, M., Watts, M., Gray, A., and Taylor, J. (1999b). Speech data analysis and recognition using fuzzy neural networks and self-organising maps. In Kasabov, N. and Kozma, R., editors, Neuro-Fuzzy Techniques for intelligent Information Systems, Studies in Fuzziness and Soft Computing, pages 241-263. Physica-Verlag, New York. Kasabov, N., Kozma, R., and Watts, M. (1998). Optimization and adaption of fuzzy neural networks through genetic algorithms and learning-with-forgetting methods and applications to phoneme based speech recognition. Information Sciences - Applications, 13. Kasabov, N., Sinclair, S., Kilgour, R., Watson, C., Laws, M., and Kasabova, D. (1995). Intelligent human to computer interfaces and the case study of building english-to-maori talking dictionary. In Kasabov, N. and Coghill, G., editors, Proceedings of ANNES '95, pages 294- 297. IEEE Computer Society Press. Kasabov, N., Watson, C., Sinclair, S., and Kilgour, R. (1994). Integrating neural networks and fuzzy systems for speech recognition. In Togneri, R., editor, Proceedings of the Fifth Australian International Conference on Speech Science and Technology, pages 462-467. Uniprint. Keegan, P. (1997). Kimikupu hou maori lexical database on the web: Reflections and possible future directions. In Proceedings NAMMSAT Conference, Massey University, Palmerston North. Kilgour, R. and Gray, A. (1997). A comparison of neural and statistical methods for phoneme recognition. In Proceeding of ICONIP (Addendum), pages 61-65. Kilgour, R. I. (1998). Hybrid Fuzzy Systems and Neural Networks for Speech Recognition. Masters thesis, University of Otago. Kilgour, R. I., Abdulla, W. H., and Kasabov, N. K. (1999). Using phoneme confusion and trigram matching to improve phoneme to grapheme transcription. In 6th International Conference on Neural Information Processing, page (Accepted), Perth. Kosko, B. (1992). Neural Networks and Fuzzy Systems. Prentice Hall. Kuhl, P. K. (1991). Human adults and human infants show a "perceptual magnet effect" for the prototypes of speech categories, monkeys do not. Perception and Psychophysics, 50(2):93- 107. Kuhl, P. K. (1994). Speech perception. In Ninifie, F., editor, Introduction to Communication Sciences and Disorders, pages 77-142. Singular Pub. Group, San Diago. Kuhn, R., Perronnin, F., Nguyen, P., Junqua, J., and Rigazio, L. (2001). Very fast adaptation with a compact context-dependent eigenvoice model. In ICASSP. Kwok, T. and Yeung, D. (1999). Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Transactions on Neural Networks, To Appear. Ladefoged, P. and Maddieson, I. (1996). The sounds of the world's languages. Blackwell, Cambridge, MA. Laws, M. (1997). Integrating text and speech into databases, information systems and knowledge engineering for human computer interaction. the progressive development of an integrated bilingual interface. In NAMMSAT '97, Massey University, Palmerston North. Laws, M. and Kilgour, R. (1998). MOOSE: Management of otago speech environment. In Proceedings of ICSLP '98, Sydney, Australia. Laws, M., Kilgour, R. I., and Watts, M. (2000). Analysis of the New Zealand English and Maori on-line translator. In Proceedings of JCIS 2000, volume 2, pages 848-851, Atlantic City, NJ. Laws, M. R. (1998). A Bilingual Speech Interface for New Zealand English to Maori. Masters thesis, University of Otago. Lee, S. J., Kim, C., Yoon, H., and Cho, J. W. (1991). Application of fully recurrent neural networks for speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages 77-80. IEEE. Lee, T. and Ching, P. C. (1996). On improving discrimination capability of an RNN based recognizer. In ICSLP 96, volume 1, pages 526-529, Philadelphia. Leerink, L. R. and Jabri, M. (1993). Improved phoneme recognition using multi-module recurrent neural networks. In Proceedings of the Fourth Australian Conference on Neural Networks, pages 26-29. Leung, H. C., Glass, J. R., Phillips, M. S., and Zue, V. W. (1990). Phonetic classification and recognition using the multi-layer perceptron. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages 525-528. IEEE. Lewis, C. (1996). The Origins of New Zealand English: A report on work in progress. New Zealand English Journal, 10:25-30. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., and Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74(6):431-461. Maclagan, M. and Gordon, E. (1999). Data for New Zealand social dialectology: the Canterbury Corpus. New Zealand English Journal, 13:50-58. Maclagan, M. A. (1998). Diphthongisation of /e/ in NZE: a change that went nowhere? New Zealand English Journal, 12:43-54. Mahoney, J. and Mooney, R. (1994). Modifying network architectures for certainty-factor rulebase revision. In Proceedings of the International Symposium on Integrating Knowledge and Neural Heuristics, pages 75-85, Pensacola, FL. Mahoney, J. J. and Mooney, R. J. (1993). Combining connectionist and symbolic learning to redine certainty factor rule bases. Connection Science, 5(3 & 4):339-364. Marean, G. C., Werner, L. A., and Kuhl, P. K. (1992). Vowel categorization by very young infants. Developmental Psychology, 28(3):396-405. Masters, T. (1995). Advanced Algorithms for Neural Networks: A C++ Sourcebook. John Wiley and Sons, New York, NY. Miller, G. A. (1981). Language and Speech. W. H. Freeman. Miller, G. A. and Nicely, P. E. (1955). An analysis of perceptual confusions among some english consonants. Journal of the Acoustical Society of America, 27:338-46. Mitra, S. and Pal, S. K. (1992). Rule generation and inferencing with a layer

Te Tumu Eprints Repository

Evolving systems for connectionist-based speech recognition

Author: Kilgour Richard
Publication venue
Publication date: 19/02/2007
Field of study

Otago University Research Archive