267,040 research outputs found

    Acoustic analysis of Polish nasals produced by deaf children

    Get PDF
    The development of computer-based diagnostic and teaching aids used in the revalidation of hearing-impaired children's speech and the construction of automatic speech recognition systems intended for deaf people require exhaustive knowledge in the area of deaf speech acoustic parameters. In the current research, Polish nasals produced by profoundly-deaf children were analyzed acoustically, and the parameter values obtained were then analyzed statistically. The results of the discriminant analysis show that nasals produced by deaf people involve similar identification difficulty as their realizations by normally hearing speakers. The development of computer-based diagnostic and teaching aids used in the revalidation of hearing-impaired children's speech and the construction of automatic speech recognition systems intended for deaf people require exhaustive knowledge in the area of deaf speech acoustic parameters. In the current research, Polish nasals produced by profoundly-deaf children were analyzed acoustically, and the parameter values obtained were then analyzed statistically. The results of the discriminant analysis show that nasals produced by deaf people involve similar identification difficulty as their realizations by normally hearing speakers.

    The role of speech technology in biometrics, forensics and man-machine interface

    Get PDF
    Day by day Optimism is growing that in the near future our society will witness the Man-Machine Interface (MMI) using voice technology. Computer manufacturers are building voice recognition sub-systems in their new product lines. Although, speech technology based MMI technique is widely used before, needs to gather and apply the deep knowledge of spoken language and performance during the electronic machine-based interaction. Biometric recognition refers to a system that is able to identify individuals based on their own behavior and biological characteristics. Fingerprint success in forensic science and law enforcement applications with growing concerns relating to border control, banking access fraud, machine access control and IT security, there has been great interest in the use of fingerprints and other biological symptoms for the automatic recognition. It is not surprising to see that the application of biometric systems is playing an important role in all areas of our society. Biometric applications include access to smartphone security, mobile payment, the international border, national citizen register and reserve facilities. The use of MMI by speech technology, which includes automated speech/speaker recognition and natural language processing, has the significant impact on all existing businesses based on personal computer applications. With the help of powerful and affordable microprocessors and artificial intelligence algorithms, the human being can talk to the machine to drive and control all computer-based applications. Today's applications show a small preview of a rich future for MMI based on voice technology, which will ultimately replace the keyboard and mouse with the microphone for easy access and make the machine more intelligent

    Fuzzy Logic Based Segmentation for Myanmar Continuous Speech Recognition System

    Get PDF
    Speech recognition is one of the next generation technologies for human-computer interaction. Automatic Speech Recognition (ASR) is a technology that allows a computer to recognize the words spoken by a person through telephone, microphone or other devices. The various stages of the speech recognition system are pre-processing, segmentation of speech signal, feature extraction of speech and recognition of word. Among many speech recognition systems, continuous speech recognition system is very important and most popular system. This paper proposes the time-domain features and frequency-domain features based on fuzzy knowledge for continuous speech segmentation task via a nonlinear speech analysis. Short-time Energy and Zero-crossing Rate are time-domain features, and Spectral Centroid is frequency-domain feature that the system will calculate in each point of speech signal in order to exploit relevant information for generating the significant segments. Fuzzy Logic technique will be used not only to fuzzify the calculated features into three complementary sets namely: low, middle, high but also to perform a matching phase using a set of fuzzy rules. The output of the Fuzzy Logic are phonemes, syllables and disyllables of Myanmar Language. The result of the system will recognize the continuous words of input speech

    Articulatory-WaveNet: Deep Autoregressive Model for Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-Articulatory Inversion, the estimation of articulatory kinematics from speech, is an important problem which has received significant attention in recent years. Estimated articulatory movements from such models can be used for many applications, including speech synthesis, automatic speech recognition, and facial kinematics for talking-head animation devices. Knowledge about the position of the articulators can also be extremely useful in speech therapy systems and Computer-Aided Language Learning (CALL) and Computer-Aided Pronunciation Training (CAPT) systems for second language learners. Acoustic-to-Articulatory Inversion is a challenging problem due to the complexity of articulation patterns and significant inter-speaker differences. This is even more challenging when applied to non-native speakers without any kinematic training data. This dissertation attempts to address these problems through the development of up-graded architectures for Articulatory Inversion. The proposed Articulatory-WaveNet architecture is based on a dilated causal convolutional layer structure that improves the Acoustic-to-Articulatory Inversion estimated results for both speaker-dependent and speaker-independent scenarios. The system has been evaluated on the ElectroMagnetic Articulography corpus of Mandarin Accented English (EMA-MAE) corpus, consisting of 39 speakers including both native English speakers and Mandarin accented English speakers. Results show that Articulatory-WaveNet improves the performance of the speaker-dependent and speaker-independent Acoustic-to-Articulatory Inversion systems significantly compared to the previously reported results

    The VEX-93 environment as a hybrid tool for developing knowledge systems with different problem solving techniques

    Get PDF
    The paper describes VEX-93 as a hybrid environment for developing knowledge-based and problem solver systems. It integrates methods and techniques from artificial intelligence, image and signal processing and data analysis, which can be mixed. Two hierarchical levels of reasoning contains an intelligent toolbox with one upper strategic inference engine and four lower ones containing specific reasoning models: truth-functional (rule-based), probabilistic (causal networks), fuzzy (rule-based) and case-based (frames). There are image/signal processing-analysis capabilities in the form of programming languages with more than one hundred primitive functions. User-made programs are embeddable within knowledge basis, allowing the combination of perception and reasoning. The data analyzer toolbox contains a collection of numerical classification, pattern recognition and ordination methods, with neural network tools and a data base query language at inference engines's disposal. VEX-93 is an open system able to communicate with external computer programs relevant to a particular application. Metaknowledge can be used for elaborate conclusions, and man-machine interaction includes, besides windows and graphical interfaces, acceptance of voice commands and production of speech output. The system was conceived for real-world applications in general domains, but an example of a concrete medical diagnostic support system at present under completion as a cuban-spanish project is mentioned. Present version of VEX-93 is a huge system composed by about one and half millions of lines of C code and runs in microcomputers under Windows 3.1.Postprint (published version

    An Introduction to Social Semantic Web Mining & Big Data Analytics for Political Attitudes and Mentalities Research

    Full text link
    The social web has become a major repository of social and behavioral data that is of exceptional interest to the social science and humanities research community. Computer science has only recently developed various technologies and techniques that allow for harvesting, organizing and analyzing such data and provide knowledge and insights into the structure and behavior or people on-line. Some of these techniques include social web mining, conceptual and social network analysis and modeling, tag clouds, topic maps, folksonomies, complex network visualizations, modeling of processes on networks, agent based models of social network emergence, speech recognition, computer vision, natural language processing, opinion mining and sentiment analysis, recommender systems, user profiling and semantic wikis. All of these techniques are briefly introduced, example studies are given and ideas as well as possible directions in the field of political attitudes and mentalities are given. In the end challenges for future studies are discussed

    Neural Collaborative Ranking

    Full text link
    Recommender systems are aimed at generating a personalized ranked list of items that an end user might be interested in. With the unprecedented success of deep learning in computer vision and speech recognition, recently it has been a hot topic to bridge the gap between recommender systems and deep neural network. And deep learning methods have been shown to achieve state-of-the-art on many recommendation tasks. For example, a recent model, NeuMF, first projects users and items into some shared low-dimensional latent feature space, and then employs neural nets to model the interaction between the user and item latent features to obtain state-of-the-art performance on the recommendation tasks. NeuMF assumes that the non-interacted items are inherent negative and uses negative sampling to relax this assumption. In this paper, we examine an alternative approach which does not assume that the non-interacted items are necessarily negative, just that they are less preferred than interacted items. Specifically, we develop a new classification strategy based on the widely used pairwise ranking assumption. We combine our classification strategy with the recently proposed neural collaborative filtering framework, and propose a general collaborative ranking framework called Neural Network based Collaborative Ranking (NCR). We resort to a neural network architecture to model a user's pairwise preference between items, with the belief that neural network will effectively capture the latent structure of latent factors. The experimental results on two real-world datasets show the superior performance of our models in comparison with several state-of-the-art approaches.Comment: Proceedings of the 2018 ACM on Conference on Information and Knowledge Managemen

    Articulatory features for robust visual speech recognition

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 99-105).This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic segment. These units are defined heuristically by mapping several visually similar phonemes to one visual phoneme, sometimes referred to as a viseme. However, experimental evidence shows that phonetic models trained from visual data are not synchronous in time with acoustic phonetic models, indicating that visemes may not be the most natural building blocks of visual speech. Instead, we propose to model the visual signal in terms of the underlying articulatory features. This approach is a natural extension of feature-based modeling of acoustic speech, which has been shown to increase robustness of audio-based speech recognition systems. We start by exploring ways of defining visual articulatory features: first in a data-driven manner, using a large, multi-speaker visual speech corpus, and then in a knowledge-driven manner, using the rules of speech production. Based on these studies, we propose a set of articulatory features, and describe a computational framework for feature-based visual speech recognition. Multiple feature streams are detected in the input image sequence using Support Vector Machines, and then incorporated in a Dynamic Bayesian Network to obtain the final word hypothesis. Preliminary experiments show that our approach increases viseme classification rates in visually noisy conditions, and improves visual word recognition through feature-based context modeling.by Ekaterina Saenko.S.M
    corecore