59 research outputs found

    Detection of Glottal Closure Instants based on the Microcanonical Multiscale Formalism

    Get PDF
    International audienceThis paper presents a novel algorithm for automatic detection of Glottal Closure Instants (GCI) from the speech signal. Our approach is based on a novel multiscale method that relies on precise estimation of a multiscale parameter at each time instant in the signal domain. This parameter quantifies the degree of signal singularity at each sample from a multi-scale point of view and thus its value can be used to classify signal samples accordingly. We use this property to develop a simple algorithm for detection of GCIs and we show that for the case of clean speech, our algorithm performs almost as well as a recent state-of-the-art method. Next, by performing a comprehensive comparison in presence of 14 different types of noises, we show that our method is more accurate (particularly for very low SNRs). Our method has lower computational times compared to others and does not rely on an estimate of pitch period or any critical choice of parameters

    Detection of abnormalities in ECG using Deep Learning

    Get PDF
    A significant part of healthcare is focused on the information that the physiological signals offer about the health state of an individual. The Electrocardiogram (ECG) cyclic behaviour gives insight on a subject’s emotional, behavioral and cardiovascular state. These signals often present abnormal events that affects their analysis. Two examples are the noise, that occurs during the acquisition, and symptomatic patterns, that are produced by pathologies. This thesis proposes a Deep Neural Networks framework that learns the normal behaviour of an ECG while detecting abnormal events, tested in two different settings: detection of different types of noise, and; symptomatic events caused by different pathologies. Two algorithms were developed for noise detection, using an autoencoder and Convolutional Neural Networks (CNN), reaching accuracies of 98,18% for the binary class model and 70,74% for the multi-class model, which is able to discern between base wandering, muscle artifact and electrode motion noise. As for the arrhythmia detection algorithm was developed using an autoencoder and Recurrent Neural Networks with Gated Recurrent Units (GRU) architecture. With an accuracy of 56,85% and an average sensitivity of 61.13%, compared to an average sensitivity of 75.22% for a 12 class model developed by Hannun et al. The model detects 7 classes: normal sinus rhythm, paced rhythm, ventricular bigeminy, sinus bradycardia, atrial fibrillation, atrial flutter and pre-excitation. It was concluded that the process of learning the machine learned features of the normal ECG signal, currently sacrifices the accuracy for higher generalization. It performs better at discriminating the presence of abnormal events in ECG than classifying different types of events. In the future, these algorithms could represent a huge contribution in signal acquisition for wearables and the study of pathologies visible in not only in ECG, but also EMG and respiratory signals, especially applied to active learning

    Multi-texture image segmentation

    Get PDF
    Visual perception of images is closely related to the recognition of the different texture areas within an image. Identifying the boundaries of these regions is an important step in image analysis and image understanding. This thesis presents supervised and unsupervised methods which allow an efficient segmentation of the texture regions within multi-texture images. The features used by the methods are based on a measure of the fractal dimension of surfaces in several directions, which allows the transformation of the image into a set of feature images, however no direct measurement of the fractal dimension is made. Using this set of features, supervised and unsupervised, statistical processing schemes are presented which produce low classification error rates. Natural texture images are examined with particular application to the analysis of sonar images of the seabed. A number of processes based on fractal models for texture synthesis are also presented. These are used to produce realistic images of natural textures, again with particular reference to sonar images of the seabed, and which show the importance of phase and directionality in our perception of texture. A further extension is shown to give possible uses for image coding and object identification

    Fractals in Geoscience and Remote Sensing

    Get PDF
    Abstract not availableNA-NOT AVAILABL

    Use of multifractal parameters for speaker recognition

    Get PDF
    Orientadores: Lee Luan Ling, Fábio ViolaroDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Esta dissertação apresenta a implementação de um sistema de Reconhecimento Automático de Locutor (ASR). Este sistema emprega um novo parâmetro de características de locutor baseado no modelo multifractal "VVGM" (Variable Variance Gaussian Multiplier). A metodologia adotada para o desenvolvimento deste sistema foi formulada em duas etapas. Inicialmente foi implementado um sistema ASR tradicional, usando como vetor de características os MFCCs (Mel-Frequency Cepstral Coefficients) e modelo de mistura gaussiana (GMM) como classificador, uma vez que é uma configuração clássica, adotada como referência na literatura. Este procedimento permite ter um conhecimento amplo sobre a produção de sinais de voz, além de um sistema de referência para comparar o desempenho do novo parâmetro VVGM. A segunda etapa foi dedicada ao estudo de processos multifractais em sinais de fala, já que eles enfatizam-se na análise das informações contidas nas partes não estacionárias do sinal avaliado. Aproveitando essa característica, sinais de fala são modelados usando o modelo VVGM. Este modelo é baseado no processo de cascata multiplicativa binomial, e usa as variâncias dos multiplicadores de cada estágio como um novo vetor de característica. As informações obtidas pelos dois métodos são diferentes e complementares. Portanto, é interessante combinar os parâmetros clássicos com os parâmetros multifractais, a fim de melhorar o desempenho dos sistemas de reconhecimento de locutor. Os sistemas propostos foram avaliados por meio de três bases de dados de fala com diferentes configurações, tais como taxas de amostragem, número de falantes e frases e duração do treinamento e teste. Estas diferentes configurações permitem determinar as características do sinal de fala requeridas pelo sistema. Do resultado dos experimentos foi observado que o sistema de identificação de locutor usando os parâmetros VVGM alcançou taxas de acerto significativas, o que mostra que este modelo multifractal contém informações relevantes sobre a identidade de cada locutor. Por exemplo, a segunda base de dados é composta de sinais de fala de 71 locutores (50 homens e 21 mulheres) digitalizados a 22,05 kHz com 16 bits/amostra. O treinamento foi feito com 20 frases para cada locutor, com uma duração total de cerca de 70 s. Avaliando o sistema ASR baseado em VVGM, com locuções de teste de 3 s de comprimento, foi obtida uma taxa de reconhecimento de 91,30%. Usando estas mesmas condições, o sistema ASR baseado em MFCCs atingiu uma taxa de reconhecimento de 98,76%. No entanto, quando os dois parâmetros foram combinados, a taxa de reconhecimento aumentou para 99,43%, mostrando que a nova característica acrescenta informações importantes para o sistema de reconhecimento de locutorAbstract: This dissertation presents an Automatic Speaker Recognition (ASR) system, which employs a new parameter based on the ¿VVGM? (Variable Variance Gaussian Multiplier) multifractal model. The methodology adopted for the development of this system is formulated in two stages. Initially, a traditional ASR system was implemented, based on the use of Mel-Frequency Cepstral Coefficients (MFCCs) and the Gaussian mixture models (GMMs) as the classifier, since it is the method with the best results in the literature. This procedure allows having a broad knowledge about the production of speech signals and a reference system to compare the performance of the new VVGM parameter. The second stage was dedicated to the study of the multifractal processes for speech signals, given that with them, it is possible to analyze information contained in non-stationary parts of the evaluated signal. Taking advantage of this characteristic, speech signals are modeled using the VVGM model, which is based on the binomial multiplicative cascade process, and uses the variances of multipliers for each state as a new speech feature. The information obtained by the two methods is different and complementary. Therefore, it is interesting to combine the classic parameters with the multifractal parameters in order to improve the performance of speaker recognition systems. The proposed systems were evaluated using three databases with different settings, such as sampling rates, number of speakers and phrases, duration of training and testing. These different configurations allow the determination of characteristics of the speech signal required by the system. With the experiments, the speaker identification system based on the VVGM parameters achieved significant success rates, which shows that this multifractal model contains relevant information of the identity of each speaker. For example, the second database is composed of speech signals of 71 speakers (50 men and 21 women) digitized at 22.05 kHz with 16 bits/sample. The training was done with 20 phrases for each speaker, with an approximately total duration of 70 s. Evaluating the ASR system based on VVGM, with this database and using test locutions with 3s of duration, it was obtained a recognition rate of 91.3%. Using these same conditions, the ASR system based on MFCCs reached a recognition rate of 98.76%. However, when the two parameters are combined, the recognition rate increased to 99.43%, showing that the new feature adds substantial information to the speaker recognition systemMestradoTelecomunicações e TelemáticaMestre em Engenharia Elétric

    Acoustical measurements on stages of nine U.S. concert halls

    Get PDF

    Telecommunications Networks

    Get PDF
    This book guides readers through the basics of rapidly emerging networks to more advanced concepts and future expectations of Telecommunications Networks. It identifies and examines the most pressing research issues in Telecommunications and it contains chapters written by leading researchers, academics and industry professionals. Telecommunications Networks - Current Status and Future Trends covers surveys of recent publications that investigate key areas of interest such as: IMS, eTOM, 3G/4G, optimization problems, modeling, simulation, quality of service, etc. This book, that is suitable for both PhD and master students, is organized into six sections: New Generation Networks, Quality of Services, Sensor Networks, Telecommunications, Traffic Engineering and Routing
    corecore