416 research outputs found

    Towards an Online Fuzzy Modeling for Human Internal States Detection

    Get PDF
    International audienceIn human-robot interaction, a social intelligent robot should be capable of understanding the emotional internal state of the interacting human so as to behave in a proper manner. The main problem towards this approach is that human internal states can't be totally trained on, so the robot should be able to learn and classify emotional states online. This research paper focuses on developing a novel online incremental learning of human emotional states using Takagi-Sugeno (TS) fuzzy model. When new data is present, a decisive criterion decides if the new elements constitute a new cluster or if they confirm one of the previously existing clusters. If the new data is attributed to an existing cluster, the evolving fuzzy rules of the TS model may be updated whether by adding a new rule or by modifying existing rules according to the descriptive potential of the new data elements with respect to the entire existing cluster centers. However, if a new cluster is formed, a corresponding new TS fuzzy model is created and then updated when new data elements get attributed to it. The subtractive clustering algorithm is used to calculate the cluster centers that present the rules of the TS models. Experimental results show the effectiveness of the proposed method

    Single-Microphone Speech Enhancement and Separation Using Deep Learning

    Get PDF

    Single-Microphone Speech Enhancement and Separation Using Deep Learning

    Get PDF
    The cocktail party problem comprises the challenging task of understanding a speech signal in a complex acoustic environment, where multiple speakers and background noise signals simultaneously interfere with the speech signal of interest. A signal processing algorithm that can effectively increase the speech intelligibility and quality of speech signals in such complicated acoustic situations is highly desirable. Especially for applications involving mobile communication devices and hearing assistive devices. Due to the re-emergence of machine learning techniques, today, known as deep learning, the challenges involved with such algorithms might be overcome. In this PhD thesis, we study and develop deep learning-based techniques for two sub-disciplines of the cocktail party problem: single-microphone speech enhancement and single-microphone multi-talker speech separation. Specifically, we conduct in-depth empirical analysis of the generalizability capability of modern deep learning-based single-microphone speech enhancement algorithms. We show that performance of such algorithms is closely linked to the training data, and good generalizability can be achieved with carefully designed training data. Furthermore, we propose uPIT, a deep learning-based algorithm for single-microphone speech separation and we report state-of-the-art results on a speaker-independent multi-talker speech separation task. Additionally, we show that uPIT works well for joint speech separation and enhancement without explicit prior knowledge about the noise type or number of speakers. Finally, we show that deep learning-based speech enhancement algorithms designed to minimize the classical short-time spectral amplitude mean squared error leads to enhanced speech signals which are essentially optimal in terms of STOI, a state-of-the-art speech intelligibility estimator.Comment: PhD Thesis. 233 page

    SEGREGATION OF SPEECH SIGNALS IN NOISY ENVIRONMENTS

    Get PDF
    Automatic segregation of overlapping speech signals from single-channel recordings is a challenging problem in speech processing. Similarly, the problem of extracting speech signals from noisy speech is a problem that has attracted a variety of research for several years but is still unsolved. Speech extraction from noisy speech mixtures where the background interference could be either speech or noise is especially difficult when the task is to preserve perceptually salient properties of the recovered acoustic signals for use in human communication. In this work, we propose a speech segregation algorithm that can simultaneously deal with both background noise as well as interfering speech. We propose a feature-based, bottom-up algorithm which makes no assumptions about the nature of the interference or does not rely on any prior trained source models for speech extraction. As such, the algorithm should be applicable for a wide variety of problems, and also be useful for human communication since an aim of the system is to recover the target speech signals in the acoustic domain. The proposed algorithm can be compartmentalized into (1) a multi-pitch detection stage which extracts the pitch of the participating speakers, (2) a segregation stage which teases apart the harmonics of the participating sources, (3) a reliability and add-back stage which scales the estimates based on their reliability and adds back appropriate amounts of aperiodic energy for the unvoiced regions of speech and (4) a speaker assignment stage which assigns the extracted speech signals to their appropriate respective sources. The pitch of two overlapping speakers is extracted using a novel feature, the 2-D Average Magnitude Difference Function, which is also capable of giving a single pitch estimate when the input contains only one speaker. The segregation algorithm is based on a least squares framework relying on the estimated pitch values to give estimates of each speaker's contributions to the mixture. The reliability block is based on a non-linear function of the energy of the estimates, this non-linear function having been learnt from a variety of speech and noise data but being very generic in nature and applicability to different databases. With both single- and multiple- pitch extraction and segregation capabilities, the proposed algorithm is amenable to both speech-in-speech and speech-in-noise conditions. The algorithm is evaluated on several objective and subjective tests using both speech and noise interference from different databases. The proposed speech segregation system demonstrates performance comparable to or better than the state-of-the-art on most of the objective tasks. Subjective tests on the speech signals reconstructed by the algorithm, on normal hearing as well as users of hearing aids, indicate a significant improvement in the perceptual quality of the speech signal after being processed by our proposed algorithm, and suggest that the proposed segregation algorithm can be used as a pre-processing block within the signal processing of communication devices. The utility of the algorithm for both perceptual and automatic tasks, based on a single-channel solution, makes it a unique speech extraction tool and a first of its kind in contemporary technology

    Fuzzy machine vision based inspection

    Get PDF
    Machine vision system has been fostered to solve many realistic problems in various fields. Its role in achieving superior quality and productivity is of paramount importance. But, for such system to be attractive, it needs to be fast, accurate and cost-effective. This dissertation is based on a number of practical machine vision based inspection projects obtained from the automotive industry. It presents a collection of developed efficient fuzzy machine vision approaches endorsed with experimental results. It also covers the conceptual design, development and testing of various fuzzy machine vision based inspection approaches for different industrial applications. To assist in developing and evaluating the performance of the proposed approaches, several parts are tested under varying lighting conditions. This research deals with two important aspects of machine vision based inspection. In the first part, it concentrates on the topics of component detection and component orientation identification. The components used in this part are metal clips mounted on a dash panel frame that is installed in the door of trucks. Therefore, we propose a fuzzy machine vision based clip detection model and a fuzzy machine vision based clip orientation identification model to inspect the proper placement of clips on dash panels. Both models are efficient and fast in terms of accuracy and processing time. In the second part of the research, we are dealing with machined part defects such as broken edge, porosity and tool marks. The se defects occur on the surface of die cast aluminum automotive pump housings. As a result, an automated fuzzy machine vision based broken edge detection method, an efficient fuzzy machine vision based porosity detection technique and a neuro-fuzzy part classification model based on tool marks are developed. Computational results show that the proposed approaches are effective in yielding satisfactory results to the tested image databases. There are four main contributions to this work. The first contribution is the development of the concept of composite matrices in conjunction with XOR feature extractor using fuzzy subtractive clustering for clip detection. The second contribution is about a proposed model based on grouping and counting pixels in pre-selective areas which tracks pixel colors in separated RGB channels to determine whether the orientation of the clip is acceptable or not. The construction of three novel edge based features embedded in fuzzy C-means clustering for broken edge detection marks the third contribution. At last, the fourth contribution presents the core of porosity candidates concept and its correlation with twelve developed matrices. This, in turn, results in the development of five different features used in our fuzzy machine vision based porosity detection approach

    Remaining discharge energy estimation for lithium-ion batteries using pattern recognition and power prediction

    Get PDF
    The remaining discharge energy (RDE) of a battery is an important value for estimating the remaining range of a vehicle. Prediction based methods for calculating RDE have been proven to be suitable for improving energy estimation accuracy. This paper aims to further improve the estimation accuracy by incorporating novel load prediction techniques with pattern recognition into the RDE calculation. For the pattern recognition, driving segment data was categorised into different usage patterns, then a rule-based logic was designed to recognise these, based on features from each pattern. For the power prediction, a clustering and Markov modelling approach was used to group and define power levels from the data as states and find the probabilities of each state-to-state transition occurring. This data was defined for each pattern, so that the logic could inform what data should be used to predict the future power profile. From the predicted power profile, the RDE was calculated from the product of the predicted load and the predicted voltage, which was obtained from a first-order battery model. The proposed algorithm was tested in simulation and real-time using battery cycler data, and compared against other prediction-based methods. The proposed method was shown to have desirable accuracy and robustness to modelling errors. The primary conclusion from this research was using pattern recognition can improve the accuracy of RDE estimation

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed
    • …
    corecore