694 research outputs found

    A Classification Scheme Based on Directed Acyclic Graphs for Acoustic Farm Monitoring

    Get PDF
    Intelligent farming as part of the green revolution is advancing the world of agriculture in such a way that farms become evolving, with the scope being the optimization of animal production in an eco-friendly way. In this direction, we propose exploiting the acoustic modality for farm monitoring. Such infor- mation could be used in a stand-alone or complimentary mode to monitor constantly animal population and behavior. To this end, we designed a scheme classifying the vocalizations produced by farm animals. More precisely, we propose a directed acyclic graph, where each node carries out a binary classification task using hidden Markov models. The topological ordering follows a criterion derived from the Kullback-Leibler divergence. During the experimental phase, we employed a publicly available dataset including vocalizations of seven animals typically encountered in farms, where we report promising recognition rates outperform- ing state of the art classifiers

    Text-Independent Speaker Identification using Statistical Learning

    Get PDF
    The proliferation of voice-activated devices and systems and over-the-phone bank transactions has made our daily affairs much easier in recent times. The ease that these systems offer also call for a need for them to be fail-safe against impersonators. Due to the sensitive information that might be shred on these systems, it is imperative that security be an utmost concern during the development stages. Vital systems like these should incorporate a functionality of discriminating between the actual speaker and impersonators. That functionality is the focus of this thesis. Several methods have been proposed to be used to achieve this system and some success has been recorded so far. However, due to the vital role this system has to play in securing critical information, efforts have been continually made to reduce the probability of error in the systems. Therefore, statistical learning methods or techniques are utilized in this thesis because they have proven to have high accuracy and efficiency in various other applications. The statistical methods used are Gaussian Mixture Models and Support Vector Machines. These methods have become the de facto techniques for designing speaker identification systems. The effectiveness of the support vector machine is dependent on the type of kernel used. Several kernels have been proposed for achieving better results and we also introduce a kernel in this thesis which will serve as an alternative to the already defined ones. Other factors including the number of components used in modeling the Gaussian Mixture Model (GMM) affect the performance of the system and these factors are used in this thesis and exciting results were obtained

    Spectral discontinuity in concatenative speech synthesis – perception, join costs and feature transformations

    Get PDF
    This thesis explores the problem of determining an objective measure to represent human perception of spectral discontinuity in concatenative speech synthesis. Such measures are used as join costs to quantify the compatibility of speech units for concatenation in unit selection synthesis. No previous study has reported a spectral measure that satisfactorily correlates with human perception of discontinuity. An analysis of the limitations of existing measures and our understanding of the human auditory system were used to guide the strategies adopted to advance a solution to this problem. A listening experiment was conducted using a database of concatenated speech with results indicating the perceived continuity of each concatenation. The results of this experiment were used to correlate proposed measures of spectral continuity with the perceptual results. A number of standard speech parametrisations and distance measures were tested as measures of spectral continuity and analysed to identify their limitations. Time-frequency resolution was found to limit the performance of standard speech parametrisations.As a solution to this problem, measures of continuity based on the wavelet transform were proposed and tested, as wavelets offer superior time-frequency resolution to standard spectral measures. A further limitation of standard speech parametrisations is that they are typically computed from the magnitude spectrum. However, the auditory system combines information relating to the magnitude spectrum, phase spectrum and spectral dynamics. The potential of phase and spectral dynamics as measures of spectral continuity were investigated. One widely adopted approach to detecting discontinuities is to compute the Euclidean distance between feature vectors about the join in concatenated speech. The detection of an auditory event, such as the detection of a discontinuity, involves processing high up the auditory pathway in the central auditory system. The basic Euclidean distance cannot model such behaviour. A study was conducted to investigate feature transformations with sufficient processing complexity to mimic high level auditory processing. Neural networks and principal component analysis were investigated as feature transformations. Wavelet based measures were found to outperform all measures of continuity based on standard speech parametrisations. Phase and spectral dynamics based measures were found to correlate with human perception of discontinuity in the test database, although neither measure was found to contribute a significant increase in performance when combined with standard measures of continuity. Neural network feature transformations were found to significantly outperform all other measures tested in this study, producing correlations with perceptual results in excess of 90%

    Algoritmos avanzados para detección del síndrome de apnea-hipopnea obstructiva del sueño

    Get PDF
    El Síndrome de Apnea-Hipopnea Obstructiva del Sueño (SAHOS) es un trastorno del sueño muy prevalente en la población general y con afectación de múltiples órganos. Se estima que esta patología afecta entre el 3% y 5% de la población adulta en todo el mundo y aumenta con la edad. Si bien el SAHOS es más frecuente en adultos, afecta también a niños con una prevalencia cercana al 3%. Los eventos respiratorios asociados al SAHOS durante el sueño ocurren como consecuencia de una alteración anatómico-funcional de la vía aérea superior que producen su estrechamiento parcial (hipopnea) o su bloqueo total (apnea). Para establecer el grado de severidad del SAHOS, se define el Índice de Apnea-Hipopnea. Éste índice representa el número de eventos de apnea-hipopnea por hora de sueño. El estudio de referencia para el correcto diagnóstico del SAHOS es la Polisomnografía nocturna. Dado que este tipo de estudio requiere no solo de la medición simultánea de una gran cantidad de señales fisiológicas, sino también de una infraestructura especial y de personal calificado, es de muy difícil acceso y muy costosa en términos de tiempo y dinero.En esta tesis se aborda el diseño, desarrollo, implementación y evaluación de tres métodos para el reconocimiento automático de los eventos de apnea-hipopnea a partir del análisis y procesamiento de las señales de saturación de oxígeno en sangre (SaO2). En particular, se presentan dos métodos de selección de características denominados MDAS y MDCS, los cuales se basan en representaciones ralas de señales de SaO2 para el reconocimiento de eventos de apnea-hipopnea. Además, en esta tesis se introduce una nueva medida de discriminabilidad binaria denotada por DCAF, la cual es capaz de detectar átomos discriminativos en un diccionario. Asimismo, esta medida permite cuantificar eficientemente sus correspondientes grados de discriminabilidad, lo cual resulta útil a los efectos de la clasificación. Los métodos MDAS y MDCS hacen uso de la media DCAF para detectar los átomos más discriminativos de un diccionario dado y, a partir de ellos, realizan la selección de características. En particular, el método MDCS utiliza la medida DCAF para seleccionar los átomos más discriminativos y, a partir de ellos, construir un sub-diccionario. En base a los experimentos desarrollados en esta tesis, el desempeño de la nueva medida DCAF fue comparada con el de varias otras medidas de información del estado del arte. Los resultados muestran que DCAF logró un muy buen desempeño. Por otro lado, el nuevo método MDCS fue comparado con otros tres métodos del estado de arte, superando significativamente el desempeño de todos ellos.Esta tesis introduce además una extensión del problema de clasificación binaria a uno multi-clase. En este contexto, se propone una generalización de la medida DCAF (la cual tiene en cuenta solo dos clases en los datos) a más de dos clases. En particular, la nueva medida de discriminabilidad combinada no solo tiene en cuenta la probabilidad condicional de activación de los átomos en un diccionario dada la clase y el valor de su correspondiente coeficiente de activación, sino que también incorpora el efecto que éste tiene sobre el error total de representación. Asimismo, se presenta un nuevo método iterativo llamado DAS-KSVD para el aprendizaje de diccionarios estructurados en el contexto de problemas de clasificación multi-clase, que utiliza ésta medida. El nuevo método permite detectar los átomos más discriminativos para cada una de las clases. Utilizando una base de datos de dígitos manuscritos ampliamente utilizada en la literatura, se realizó un análisis del desempeño del método DAS-KSVD obteniéndose tasas de reconocimiento superiores a las obtenidas por técnicas semejantes del estado del arte. También se utilizó el nuevo método DAS-KSVD en un problema de clasificación multi-clase asociado al SAHOS. Los resultados obtenidos muestran que éste método tiene un muy buen desempeño en la detección de la patología.Fil: Rolon, Roman Emanuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin

    Electromagnetic models for ultrasound image processing

    Get PDF
    Speckle noise appears when coherent illumination is employed, as for example Laser, Synthetic Aperture Radar (SAR), Sonar, Magnetic Resonance, X-ray and Ultrasound imagery. Backscattered echoes from the randomly distributed scatterers in the microscopic structure of the medium are the origin of speckle phenomenon, which characterizes coherent imaging with a granular appearance. It can be shown that speckle noise is of multiplicative nature, strongly correlated and more importantly, with non-Gaussian statistics. These characteristics differ greatly from the traditional assumption of white additive Gaussian noise, often taken in image segmentation, filtering, and in general, image processing; which leads to reduction of the methods effectiveness for final image information extraction; therefore, this kind of noise severely impairs human and machine ability to image interpretation. Statistical modeling is of particular relevance when dealing with speckled data in order to obtain efficient image processing algorithms; but, additionally, clinical ultrasound imaging systems employ nonlinear signal processing to reduce the dynamic range of the input echo signal to match the smaller dynamic range of the display device and to emphasize objects with weak backscatter. This reduction in dynamic range is normally achieved through a logarithmic amplifier i.e. logarithmic compression, which selectively compresses large input signals. This kind of nonlinear compression totally changes the statistics of the input envelope signal; and, a closed form expression for the density function of the logarithmic transformed data is usually hard to derive. This thesis is concerned with the statistical distributions of the Log-compressed amplitude signal in coherent imagery, and its main objective is to develop a general statistical model for log-compressed ultrasound B-scan images. The developed model is adapted, making the pertinent physical analogies, from the multiplicative model in Synthetic Aperture Radar (SAR) context. It is shown that the proposed model can successfully describe log-compressed data generated from different models proposed in the specialized ultrasound image processing literature. Also, the model is successfully applied to model in-vivo echo-cardiographic (ultrasound) B-scan images. Necessary theorems are established to account for a rigorous mathematical proof of the validity and generality of the model. Additionally, a physical interpretation of the parameters is given, and the connections between the generalized central limit theorems, the multiplicative model and the compound representations approaches for the different models proposed up-to-date, are established. It is shown that the log-amplifier parameters are included as model parameters and all the model parameters are estimated using moments and maximum likelihood methods. Finally, three applications are developed: speckle noise identification and filtering; segmentation of in vivo echo-cardiographic (ultrasound) B-scan images and a novel approach for heart ejection fraction evaluationEl ruido Speckle aparece cuando se utilizan sistemas de iluminación coherente, como por ejemplo Láser, Radar de Apertura Sintética (SAR), Sonar, Resonancia Magnética, rayos X y ultrasonidos. Los ecos dispersados por los centros dispersores distribuidos al azar en la estructura microscópica del medio son el origen de este fenómeno, que caracteriza las imágenes coherentes con un aspecto granular. Se puede demostrar que el ruido Speckle es de carácter multiplicativo, fuertemente correlacionados y lo más importante, con estadística no Gaussiana. Estas características son muy diferentes de la suposición tradicional de ruido aditivo gaussiano blanco, a menudo asumida en la segmentación de imágenes, filtrado, y en general, en el procesamiento de imágenes; lo cual se traduce en la reducción de la eficacia de los métodos para la extracción de información de la imagen final. La modelización estadística es de particular relevancia cuando se trata con datos Speckle, a fin de obtener algoritmos de procesamiento de imágenes eficientes. Además, el procesamiento no lineal de señales empleado en sistemas clínicos de imágenes por ultrasonido para reducir el rango dinámico de la señal de eco de entrada de manera que coincida con el rango dinámico más pequeño del dispositivo de visualización y resaltar así los objetos con dispersión más débil, modifica radicalmente la estadística de los datos. Esta reducción en el rango dinámico se logra normalmente a través de un amplificador logarítmico es decir, la compresión logarítmica, que comprime selectivamente las señales de entrada y una forma analítica para la expresión de la función de densidad de los datos transformados logarítmicamente es por lo general difícil de derivar. Esta tesis se centra en las distribuciones estadísticas de la amplitud de la señal comprimida logarítmicamente en las imágenes coherentes, y su principal objetivo es el desarrollo de un modelo estadístico general para las imágenes por ultrasonido comprimidas logarítmicamente en modo-B. El modelo desarrollado se adaptó, realizando las analogías físicas relevantes, del modelo multiplicativo en radares de apertura sintética (SAR). El Modelo propuesto puede describir correctamente los datos comprimidos logarítmicamente a partir datos generados con los diferentes modelos propuestos en la literatura especializada en procesamiento de imágenes por ultrasonido. Además, el modelo se aplica con éxito para modelar ecocardiografías en vivo. Se enuncian y demuestran los teoremas necesarios para dar cuenta de una demostración matemática rigurosa de la validez y generalidad del modelo. Además, se da una interpretación física de los parámetros y se establecen las conexiones entre el teorema central del límite generalizado, el modelo multiplicativo y la composición de distribuciones para los diferentes modelos propuestos hasta a la fecha. Se demuestra además que los parámetros del amplificador logarítmico se incluyen dentro de los parámetros del modelo y se estiman usando los métodos estándar de momentos y máxima verosimilitud. Por último, tres aplicaciones se desarrollan: filtrado de ruido Speckle, segmentación de ecocardiografías y un nuevo enfoque para la evaluación de la fracción de eyección cardiaca.Postprint (published version

    World Modeling for Intelligent Autonomous Systems

    Get PDF
    The functioning of intelligent autonomous systems requires constant situation awareness and cognition analysis. Thus, it needs a memory structure that contains a description of the surrounding environment (world model) and serves as a central information hub. This book presents a row of theoretical and experimental results in the field of world modeling. This includes areas of dynamic and prior knowledge modeling, information fusion, management and qualitative/quantitative information analysis

    The use of spectral information in the development of novel techniques for speech-based cognitive load classification

    Full text link
    The cognitive load of a user refers to the amount of mental demand imposed on the user when performing a particular task. Estimating the cognitive load (CL) level of the users is necessary to adjust the workload imposed on them accordingly in order to improve task performance. The current speech based CL classification systems are not adequate for commercial use due to their low performance particularly in noisy environments. This thesis proposes many techniques to improve the performance of the speech based cognitive load classification system in both clean and noisy conditions. This thesis analyses and presents the effectiveness of speech features such as spectral centroid frequency (SCF) and spectral centroid amplitude (SCA) for CL classification. Sub-systems based on SCF and SCA features were developed and fused with the traditional Mel frequency cepstral coefficients (MFCC) based system, producing an 8.9% and 31.5% relative error rate reduction respectively when compared to the MFCC-based system alone. The Stroop test corpus was used in these experiments. The investigation into cognitive load information in the form of spectral distribution in different subbands shows that the information distributed in the low frequency subband is significantly higher than the high frequency subband. Two different methods are proposed to utilize this finding. The first method, called the multi-band approach, uses a weighting scheme to emphasize the speech features in low frequency subbands. The cognitive load classification accuracy of this approach is shown to be higher than a system based on a non-weighting scheme. The second method is to design an effective filterbank based on the spectral distribution of cognitive load information using the Kullback-Leibler distance measure. It is shown that the designed filterbank consistently provides higher classification accuracies than other existing filterbanks such as mel, Bark, and equivalent rectangular bandwidth. A discrete cosine transform based speech enhancement technique is proposed in order to increase the robustness of the CL classification system and found to be more suitable than other methods investigated. This proposed method provides a 3.0% average relative error rate reduction for the seven types of noise and five levels of SNR used. In particular, it provides a maximum of 7.5% relative error rate reduction for the F16 noise (in NOISEX-92 database) at 20 dB SNR

    World Modeling for Intelligent Autonomous Systems

    Get PDF
    The functioning of intelligent autonomous systems requires constant situation awareness and cognition analysis. Thus, it needs a memory structure that contains a description of the surrounding environment (world model) and serves as a central information hub. This book presents a row of theoretical and experimental results in the field of world modeling. This includes areas of dynamic and prior knowledge modeling, information fusion, management and qualitative/quantitative information analysis

    Détection, localisation, caractérisation de transitoires acoustiques sous-marins

    Get PDF
    The underwater environment is insonified by a wide variety of acoustic sourcesthat can be monitored by autonomous passive acoustic recorders. A large number of the recordedsounds are transient signals (short-finite duration signals), among which the pulse signals that westudy in this thesis. Pulse signals have specific properties, such as a very short duration (<1ms), fewoscillations, a high directivity, which make them difficult to study by classical signal processing tools(Fourier transform, autocorrelation).In the first part of this study, we develop a method to detect sound sources emitting rhythmic pulsetrains (dolphins, sperm whales, beluga whales). This detector uses only the time of arrival of pulses atthe hydrophone to perform a rhythm analysis based on a complex autocorrelation and a time-rhythmrepresentation. This allows : i) to detect rhythmic pulse trains, ii) to know the beginning and endingtimes of pulse trains, iii) to know the value of the rhythm.In the second part of this thesis, we study the potential of a method called Recurrence Plot Analysis tocharacterize waveforms of pulse signals. After a general presentation of this method we develop threesignal processing architectures based on it, to perform the following tasks : i) transient detection, ii)transient characterization and pattern recognition, iii) estimation of time difference of arrival of thetransient on two hydrophones.All the methods developped in this thesis are validated on simulated and real data recorded at sea.Le milieu marin est insonifié par une grand variété de sources acoustiques, qui peuventêtre monitorées par des enregistreurs acoustiques passifs autonomes. Parmi les sons enregistrés, ontrouve un grand nombre de signaux transitoires (signaux éphémères de durée courte), auxquelsappartiennent notamment les signaux impulsionnels que nous étudions dans cette thèse. Les signauximpulsionnels ont des propriétés spécifiques, telles que leur durée très courte (<1ms), leur faiblenombre d’oscillations, leur forte directivité, qui les rendent difficiles à étudier avec les outils detraitement du signal traditionnels (transformée de Fourier, autocorrélation, etc.).Dans un premier temps, nous nous intéressons à la détection des sources qui émettent des sériesd’impulsions rythmées (dauphins, cachalots, bélugas). Cette détection, s’appuie uniquement surles temps d’arrivée des impulsions reçues, pour effectuer une analyse du rythme au moyen d’uneautocorrélation complexe, et construire une représentation temps-rythme, permettant : i) de détecterles rythmes, ii) de connaître les temps de début et fin des émissions rythmées, iii) de connaître lavaleur du rythme et son évolution.Dans un second temps, nous étudions le potentiel d’une technique appelée analyse par récurrence desphases, pour caractériser les formes d’onde des impulsions. Après avoir présenté le cadre général decette méthode d’analyse, nous l’utilisons dans trois chaînes de traitement répondant à chacune destâches suivantes : i) détection des transitoires, ii) caractérisation et reconnaissance des transitoires,iii) estimation des différences des temps d’arrivée des transitoires sur deux capteurs.Toutes les méthodes développées dans cette étude ont été testées et validées sur des données simuléeset sur des données réelles acquises en me
    corecore