8 research outputs found

    Sparsity-Based Algorithms for Blind Separation of Convolutive Mixtures with Application to EMG Signals

    No full text
    International audienceIn this paper we propose two iterative algorithms for the blind separation of convolutive mixtures of sparse signals. The first one, called Iterative Sparse Blind Separation (ISBS), minimizes a sparsity cost function using an approximate Newton technique. The second algorithm, referred to as Givens-based Sparse Blind Separation (GSBS) computes the separation matrix as a product of a whitening matrix and a unitary matrix estimated, via a Jacobi-like process, as the product of Givens rotations which minimize the sparsity cost function. The two sparsity based algorithms show significantly improved performance with respect to the time coherence based SOBI algorithm as illustrated by the simulation results and comparative study provided at the end of the paper

    Transfer Learning for Improved Audio-Based Human Activity Recognition

    Get PDF
    Human activities are accompanied by characteristic sound events, the processing of which might provide valuable information for automated human activity recognition. This paper presents a novel approach addressing the case where one or more human activities are associated with limited audio data, resulting in a potentially highly imbalanced dataset. Data augmentation is based on transfer learning; more specifically, the proposed method: (a) identifies the classes which are statistically close to the ones associated with limited data; (b) learns a multiple input, multiple output transformation; and (c) transforms the data of the closest classes so that it can be used for modeling the ones associated with limited data. Furthermore, the proposed framework includes a feature set extracted out of signal representations of diverse domains, i.e., temporal, spectral, and wavelet. Extensive experiments demonstrate the relevance of the proposed data augmentation approach under a variety of generative recognition schemes

    Adaptive Sparsity Non-Negative Matrix Factorization for Single-Channel Source Separation

    No full text
    A novel method for adaptive sparsity non-negative matrix factorization is proposed. The proposed factorization decomposes an information-bearing matrix into two-dimensional convolution of factor matrices that represent the spectral dictionary and temporal codes. We derive a variational Bayesian approach to compute the sparsity parameters for optimizing the matrix factorization. The method is demonstrated on separating audio mixtures recorded from a single channel. In addition, we have proven that the extraction of the spectral dictionary and temporal codes is significantly more efficient with adaptive sparsity which subsequently leads to better source separation performance. Experimental tests and comparisons with other sparse factorization methods have been conducted to verify the efficacy of the proposed method

    Adaptive Sparsity Non-negative Matrix Factorization for Single-Channel Source Separation

    No full text

    Eddy current pulsed thermography for non-destructive evaluation of carbon fibre reinforced plastic for wind turbine blades

    Get PDF
    PhD ThesisThe use of Renewable energy such as wind power has grown rapidly over the past ten years. However, the poor reliability and high lifecycle costs of wind energy can limit power generation. Wind turbine blades suffer from relatively high failure rates resulting in long downtimes. The motivation of this research is to improve the reliability of wind turbine blades via non-destructive evaluation (NDE) for the early warning of faults and condition-based maintenance. Failure in wind turbine blades can be categorised as three types of major defect in carbon fibre reinforced plastic (CFRP), which are cracks, delaminations and impact damages. To detect and characterise those defects in their early stages, this thesis proposes eddy current pulsed thermography (ECPT) NDE method for CFRP-based wind turbine blades. The ECPT system is a redesigned extension of previous work. Directional excitation is applied to overcome the problems of non-homogeneous and anisotropic properties of composites in both numerical and experimental studies. Through the investigation of the multiple-physical phenomena of electromagnetic-thermal interaction, defects can be detected, classified and characterised via numerical simulation and experimental studies. An integrative multiple-physical ECPT system can provide transient thermal responses under eddy current heating inside a sample. It is applied for the measurement and characterisation of different samples. Samples with surface defects such as cracks are detected from hot-spots in thermal images, whereas internal defects, like delamination and impact damage, are detected through thermal or heat flow patterns. For quantitative NDE, defect detection, characterisation and classification are carried out at different levels to deal with various defect locations and fibre textures. Different approaches for different applications are tested and compared via samples with crack, delamination and impact damage. Comprehensive transient feature extraction at the three different levels of the pixel, local area and pattern are developed and implemented with respect to defect location in terms of the thickness and complexity of fibre texture. Three types of defects are detected and classified at those three levels. The transient responses at pixel level, flow patterns at local area level, and principal or independent components at pattern level are derived for defect classification. Features at the pixel and local area levels are extracted in order to gain quantitative information about the defects. Through comparison of the performance of evaluations at those three levels, the pixel level is shown to be good at evaluating surface defects, in particular within uni- directional fibres. Meanwhile the local area level has advantages for detecting deeper defects such as delamination and impact damage, and in specimens with multiple fibre orientations, the pattern level is useful for the separation of defective patterns and fibre texture, as well as in distinguishing multiple defects.Engineering and Physical Sciences Research Council(EPSRC), Frame Programme 7(FP7

    Identificación automática de cantante en música polifónica

    Get PDF
    La aplicación de la tecnología digital a la producción y distribución de música ha dado lugar a una verdadera revolución, facilitando el acceso de los artistas a los estudios de grabación, y generando un crecimiento exponencial de la cantidad de registros fonográficos. Esto ha generado que los sistemas de clasificación y sugerencia, basados en herramientas de procesamiento de señales y aprendizaje automático, se hayan transformado en puntos clave en la gestión de la oferta musical. En este contexto, es de especial relevancia automatizar algunas tareas, como la identificación del cantante a partir de un archivo de audio. La voz cantada es sin duda el instrumento musical más antiguo y familiar para nuestro sistema auditivo. Además, la voz suele transmitir mucha información en la música, porque generalmente interpreta la melodía principal, trasmite la letra y contiene características expresivas. Pero varios aspectos dificultan la tarea de reconocer automáticamente al cantante, en particular, a diferencia de la identificación del hablante, el acompañamiento musical es una señal de un nivel de energía similar al de la voz y no puede ser modelado como un ruido aleatorio independiente. En este trabajo se exploran las técnicas existentes de identificación de cantantes en archivos de audio de música polifónica. Varios trabajos abordan el problema sin realizar separación de fuentes, debido a las dificultades que esto conlleva, lo que genera que los algoritmos de clasificación aprendan a reconocer al cantante junto con su acompañamiento musical. La selección de la instrumentación, efectos de audio, mezcla y masterizado juegan un rol importante en el sonido final de las canciones que integran un álbum. En trabajos previos, los efectos vinculados a estos aspectos de la producción fonográfica han sido poco explorados. Para mostrar estos efectos y poder cuantificarlos, en este trabajo se crea la base de datos VoicesUy, en la cual canciones populares rioplatenses son cantadas por artistas profesionales y grabadas en multipista. Los cantantes interpretan las mismas canciones de forma de poder realizar identificación de voces entre archivos donde la única diferencia es la voz. Esta base de datos permite evaluar tanto algoritmos de separación de fuentes como de clasificación de voces. El hecho de que los cantantes que participan en la grabación de la base tengan su propia discografía, permite además evaluar la incidencia de los efectos de diferentes etapas de la producción musical en la identificación de cantante. VoicesUy es la primer base de datos de música popular en castellano para identificación de cantante y separación de fuentes. Se presentan experimentos que muestran que, si bien el acompañamiento musical dificulta la identificación de cantante, un artista interpretando sus composiciones junto con su banda es más fácil de identificar que interpretando versiones. Denominamos a este comportamiento "efecto banda". Se muestra como mejora la clasificación del intérprete al utilizar técnicas de separación de fuentes. Se prueba una técnica de enmascaramiento sobre una representación tiempo-frecuencia no tradicional y se comparan los resultados utilizado representaciones clásicas como el espectrograma. Para aplicar estas técnicas se utiliza la información de la frecuencia fundamental de la voz. Los resultados de identificación de cantante obtenidos son comparables con otros trabajos de referencia. La clasificación de voces sobre VoicesUy, aplicando separación de fuentes, alcanza un acierto del 95.1 %

    Single channel overlapped-speech detection and separation of spontaneous conversations

    Get PDF
    PhD ThesisIn the thesis, spontaneous conversation containing both speech mixture and speech dialogue is considered. The speech mixture refers to speakers speaking simultaneously (i.e. the overlapped-speech). The speech dialogue refers to only one speaker is actively speaking and the other is silent. That Input conversation is firstly processed by the overlapped-speech detection. Two output signals are then segregated into dialogue and mixture formats. The dialogue is processed by speaker diarization. Its outputs are the individual speech of each speaker. The mixture is processed by speech separation. Its outputs are independent separated speech signals of the speaker. When the separation input contains only the mixture, blind speech separation approach is used. When the separation is assisted by the outputs of the speaker diarization, it is informed speech separation. The research presents novel: overlapped-speech detection algorithm, and two speech separation algorithms. The proposed overlapped-speech detection is an algorithm to estimate the switching instants of the input. Optimization loop is adapted to adopt the best capsulated audio features and to avoid the worst. The optimization depends on principles of the pattern recognition, and k-means clustering. For of 300 simulated conversations, averages of: False-Alarm Error is 1.9%, Missed-Speech Error is 0.4%, and Overlap-Speaker Error is 1%. Approximately, these errors equal the errors of best recent reliable speaker diarization corpuses. The proposed blind speech separation algorithm consists of four sequential techniques: filter-bank analysis, Non-negative Matrix Factorization (NMF), speaker clustering and filter-bank synthesis. Instead of the required speaker segmentation, effective standard framing is contributed. Average obtained objective tests (SAR, SDR and SIR) of 51 simulated conversations are: 5.06dB, 4.87dB and 12.47dB respectively. For the proposed informed speech separation algorithm, outputs of the speaker diarization are a generated-database. The database associated the speech separation by creating virtual targeted-speech and mixture. The contributed virtual signals are trained to facilitate the separation by homogenising them with the NMF-matrix elements of the real mixture. Contributed masking optimized the resulting speech. Average obtained SAR, SDR and SIR of 341 simulated conversations are 9.55dB, 1.12dB, and 2.97dB respectively. Per the objective tests of the two speech separation algorithms, they are in the mid-range of the well-known NMF-based audio and speech separation methods
    corecore