Search CORE

79 research outputs found

Evaluation of preprocessors for neural network speaker verification

Author: Salleh Sheikh-Hussain
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Analysis of derived features for the motion classification of a passive lower limb exoskeleton

Author: Costa I Ruiz Albert
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

Analysis of Derived Features for the Motion Classification of a PassiveLowerLimbExoskeleton The recognition of human motion intentions is a fundamental requirement to control efficiently an exoskeleton system. The exoskeleton control can be enhanced or subsequent motions can be predicted, if the current intended motion is known. At H2T research has been carried out with a classification system based on Hidden Markov Models (HMMs) to classify the multi-modal sensor data acquired from a unilateral passive lower-limb exoskeleton. The training data is formed of force vectors, linear accelerations and Euler angles provided by 7 3D-force sensors and 3 IMUs. The recordings consist of data of 10 subjects performing 14 different types of daily activities, each one carried out 10 times. This master thesis attempts to improve the motion classification by using physical meaningful derived features from the raw data aforementioned. The knee vector moment and the knee and ankle joint angles, which respectively give a kinematic and dynamic description of a motion, were the derived features considered. Firstly, these new features are analysed to study their patterns and the resemblance of the data among different subjects is quantified in order to check their consistency. Afterwards, the derived features are evaluated in the motion classification system to check their performance. Various configurations of the classifier were tested including different preprocessors of the data employed and the structure of the HMMs used to represent each motion. Some setups combining derived features and raw data led to good results (e.g. norm of the moment vector and IMUs got 89.39% of accuracy), but did not improve the best results of previous works (e.g. 2 IMUs and 1 Force Sensor got 90.73% of accuracy). Although the classification results are not improved, it is proved that these derived features are a good representation of their primary features and a suitable option if a dimensional reduction of the data is pursued. At the end, possible directions of improvement are suggested to improve the motion classification concerning the results obtained along the thesis.Outgoin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Eye Detection Using Wavelets and ANN

Author: Panda Deepti Ranjan
Publication venue
Publication date: 01/01/2007
Field of study

A Biometric system provides perfect identification of individual based on a unique biological feature or characteristic possessed by a person such as finger print, hand writing, heart beat, face recognition and eye detection. Among them eye detection is a better approach since Human Eye does not change throughout the life of an individual. It is regarded as the most reliable and accurate biometric identification system available. In our project we are going to develop a system for ‘eye detection using wavelets and ANN’ with software simulation package such as matlab 7.0 tool box in order to verify the uniqueness of the human eyes and its performance as a biometric. Eye detection involves first extracting the eye from a digital face image, and then encoding the unique patterns of the eye in such a way that they can be compared with preregistered eye patterns. The eye detection system consists of an automatic segmentation system that is based on the wavelet transform, and then the Wavelet analysis is used as a pre-processor for a back propagation neural network with conjugate gradient learning. The inputs to the neural network are the wavelet maxima neighborhood coefficients of face images at a particular scale. The output of the neural network is the classification of the input into an eye or non-eye region. An accuracy of 81% is observed for test images under different environment conditions not included during training

ethesis@nitr

Speech and neural network dynamics

Author: Renals Stephen John
Publication venue: The University of Edinburgh
Publication date: 01/01/1990
Field of study

Edinburgh Research Archive

The 5th Conference of PhD Students in Computer Science

Author
Publication venue
Publication date: 01/01/2006
Field of study

University of Szeged

Tone classification of syllable -segmented Thai speech based on multilayer perceptron

Author: Satravaha Nuttavudh
Publication venue: The Research Repository @ WVU
Publication date: 01/05/2002
Field of study

Thai is a monosyllabic and tonal language. Thai makes use of tone to convey lexical information about the meaning of a syllable. Thai has five distinctive tones and each tone is well represented by a single F0 contour pattern. In general, a Thai syllable with a different tone has a different lexical meaning. Thus, to completely recognize a spoken Thai syllable, a speech recognition system has not only to recognize a base syllable but also to correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system.;In this study, a tone classification of syllable-segmented Thai speech which incorporates the effects of tonal coarticulation, stress and intonation was developed. Automatic syllable segmentation, which performs the segmentation on the training and test utterances into syllable units, was also developed. The acoustical features including fundamental frequency (F0), duration, and energy extracted from the processing syllable and neighboring syllables were used as the main discriminating features. A multilayer perceptron (MLP) trained by backpropagation method was employed to classify these features. The proposed system was evaluated on 920 test utterances spoken by five male and three female Thai speakers who also uttered the training speech. The proposed system achieved an average accuracy rate of 91.36%

The Research Repository @ WVU (West Virginia University)

Recommended from our members

Auditory-based processing of communication sounds

Author: Walters Thomas C.
Publication venue: University of Cambridge
Publication date: 07/06/2011
Field of study

This thesis examines the possible benefits of adapting a biologically-inspired model of human auditory processing as part of a machine-hearing system. Features were generated by an auditory model, and used as input to machine learning systems to determine the content of the sound. Features were generated using the auditory image model (AIM) and were used for speech recognition and audio search. AIM comprises processing to simulate the human cochlea, and a ‘strobed temporal integration’ process which generates a stabilised auditory image (SAI) from the input sound. The communication sounds which are produced by humans, other animals, and many musical instruments take the form of a pulse-resonance signal: pulses excite resonances in the body, and the resonance following each pulse contains information both about the type of object producing the sound and its size. In the case of humans, vocal tract length (VTL) determines the size properties of the resonance. In the speech recognition experiments, an auditory filterbank was combined with a Gaussian fitting procedure to produce features which are invariant to changes in speaker VTL. These features were compared against standard mel-frequency cepstral coefficients (MFCCs) in a size-invariant syllable recognition task. The VTL-invariant representation was found to produce better results than MFCCs when the system was trained on syllables from simulated talkers of one range of VTLs and tested on those from simulated talkers with a different range of VTLs. The image stabilisation process of strobed temporal integration was analysed. Based on the properties of the auditory filterbank being used, theoretical constraints were placed on the properties of the dynamic thresholding function used to perform strobe detection. These constraints were used to specify a simple, yet robust, strobe detection algorithm. The syllable recognition system described above was then extended to produce features from profiles of the SAI and tested with the same syllable database as before. For clean speech, performance of the features was comparable to that of those generated from the filterbank output. However when pink noise was added to the stimuli, performance dropped more slowly as a function of signal-to-noise ratio when using the SAI-based AIM features, than when using either the filterbank-based features or the MFCCs, demonstrating the noise-robustness properties of the SAI representation. The properties of the auditory filterbank in AIM were also analysed. Three models of the cochlea were considered: the static gammatone filterbank, dynamic compressive gammachirp (dcGC) and the pole-zero filter cascade (PZFC). The dcGC and gammatone are standard filterbank models, whereas the PZFC is a filter cascade, which more accurately models signal propagation in the cochlea. However, while the architecture of the filterbanks is different, they have all been successfully fitted to psychophysical masking data from humans. The abilities of the filterbanks to measure pitch strength were assessed, using stimuli which evoke a weak pitch percept in humans, in order to ascertain whether there is any benefit in the use of the more computationally efficient PZFC. Finally, a complete sound effects search system using auditory features was constructed in collaboration with Google research. Features were computed from the SAI by sampling the SAI space with boxes of different scales. Vector quantization (VQ) was used to convert this multi-scale representation to a sparse code. The ‘passive-aggressive model for image retrieval’ (PAMIR) was used to learn the relationships between dictionary words and these auditory codewords. These auditory sparse codes were compared against sparse codes generated from MFCCs, and the best performance was found when using the auditory features

Apollo (Cambridge)

Model and structure of multiagent system for collection and processing sound information

Author: Shaya B. H.
Vishniakou U. A.
Вишняков В. А.
Шайя Б. Х.
Publication venue: БГУИР
Publication date: 01/01/2019
Field of study

A symbolic model of a multi-agent system (MAS) for collecting and processing sound information (CPSI) from the environment was proposed. The model includes agents of input and encoding of local sound information, database agents, knowledge base agents, agents of calculation of integral estimates of the sound situation, decision-making agent. On the basis of this model the structure of the MAC for CPSI consisting of preprocessors for audio input and encoding, wireless communication channels, server and operator console was developed

Belarusian State University of Informatics and Radioelectronics Repository

Development of a ROS environment for researching machine learning techniques applied to drones

Author: Millán Romera José Andrés
Publication venue
Publication date: 01/01/2019
Field of study

The first part of this dissertation presents ROS-MAGNA, a general framework for the definition and management of cooperative missions for multiple Unmanned Aircraft Systems (UAS) based on the Robot Operating System (ROS) [42]. This framework makes transparent the type of autopilot on-board and creates the state machines that control the behaviour of the different UAS from the specification of the multi-UAS mission. In addition, it integrates a virtual world generation tool to manage the information of the environment and visualize the geometrical objects of interest to properly follow the progress of the mission. The framework supports the coexistence of software-in-the-loop, hardware-in-the-loop and real UAS cooperating in the same arena, being a very useful testing tool for the developer of UAS advanced functionalities. To the best of our knowledge, it is the first framework which endows all these capabilities. The document also includes simulations and real experiments which show the main features of the framework. ROS-MAGNA is used to develop and test a machine learning tool. The information generated during a mission is used to train neural networks of different architecture for navigation purposes. The data treatment and training processes are accomplished in a testbench to select the best solution from different datasets. Tensorflow is the framework selected to implement every deep learning algorithm along with its Tensorboard tool for training understanding.Furthermore, an API with the pre-trained is used during a real mission in real time. The third part of this dissertation is the design and integration of a voice control assistant inside ROSMAGNA. Employing diverse online and offline tools, oral commands are processed to perform changes to the mission state and performance and to retrieve information.Universidad de Sevilla. Máster en Ingeniería Industria

idUS. Depósito de Investigación Universidad de Sevilla