9 research outputs found

    Pembuatan Perangkat Basis Data untuk Sintesis Ucapan (Natural Speech Synthesis) Berbahasa Indonesia Berbasis Hidden Markov Model (HMM)

    Get PDF
    Salah satu teknik sintesis ucapan adalah sistem statistik parametrik sintesis ucapan menggunakan Hidden Markov Model (HMM). Speech synthesis dalam bahasa Indonesia dengan menggunakan HTS masih belum pernah dikembangkan (under-resourced). Penelitian ini diawali dengan pembuatan basis data suara bahasa Indonesia melalui proses perekaman, kemudian diikuti dengan proses segmentasi simbol fonetik, dan pemberian label. Dalam penelitian ini diperoleh basis data dalam bahasa Indonesia sejumlah 1529 kalimat yang sesuai dengan kaidah keseimbangan fonetik (phonetically balanced), yaitu telah memenuhi 33 jenis fonem. Selain itu, diperoleh juga segmentasi dan labeling dataset sebanyak 100 kalimat hasil rekaman suara laki-laki dan 100 kalimat hasil rekaman suara wanita. Penyiapan perangkat lunak untuk menjalankan sistem sintesis ucapan berbahasa Inggris berbasis HMM telah dilakukan dengan mengaplikasikan HTS yang menggunakan. Berdasarkan hasil uji kualitas suara menggunakan uji subyektif, melibatkan 20 responden, diperoleh naturalness dengan nilai Mean Opinion Score (MOS) 3,4 untuk pengujian hasil training speaker dependent (SD) training demo dan 3,2 untuk pengujian hasil speaker adaptation/adaptive (SAD) training demo. Dengan demikian, synthetic speech yang dihasilkan dapat dikategorikan baik dan perangkat lunak yang dipakai dapat digunakan untuk melakukan perancangan sistem sintesis ucapan berbahasa Indonesia.

    Evaluation of Digital Speckle Filters for Ultrasound Images

    Get PDF
    Ultrasound (US) images are inherently corrupted by speckle noise causing inaccuracy of medical diagnosis using this technique. Hence, numerous despeckling filters are used to denoise US images. However most of the despeckling techniques cause blurring to the US images. In this work, four filters namely Lee, Wavelet Linear Minimum Mean Square Error (LMMSE), Speckle-reduction Anisotropic Diffusion (SRAD) and Non-local-means (NLM) filters are evaluated in terms of their ability in noise removal and capability to preserve the image contrast. This is done through calculating four performance metrics Peak Signal to Noise Ratio (PSNR), Ultrasound Despeckling Assessment Index (USDSAI), Normalized Variance and Mean Preservation. The experiments were conducted on three different types of images which is simulated noise images, computer generated image and real US images. The evaluation in terms of PSNR, USDSAI, Normalized Variance and Mean Preservation shows that NLM filter is the best filter in all scenarios considering both speckle noise suppression and image restoration however with quite slow processing time. It may not be the best option of filter if speed is the priority during the image processing. Wavelet LMMSE filter is the next best performing filter after NLM filter with faster speed

    Evaluation of Digital Speckle Filters for Ultrasound Images

    Get PDF
    Ultrasound (US) images are inherently corrupted by speckle noise causing inaccuracy of medical diagnosis using this technique. Hence, numerous despeckling filters are used to denoise US images. However most of the despeckling techniques cause blurring to the US images. In this work, four filters namely Lee, Wavelet Linear Minimum Mean Square Error (LMMSE), Speckle-reduction Anisotropic Diffusion (SRAD) and Non-local-means (NLM) filters are evaluated in terms of their ability in noise removal and capability to preserve the image contrast. This is done through calculating four performance metrics Peak Signal to Noise Ratio (PSNR), Ultrasound Despeckling Assessment Index (USDSAI), Normalized Variance and Mean Preservation. The experiments were conducted on three different types of images which is simulated noise images, computer generated image and real US images. The evaluation in terms of PSNR, USDSAI, Normalized Variance and Mean Preservation shows that NLM filter is the best filter in all scenarios considering both speckle noise suppression and image restoration however with quite slow processing time. It may not be the best option of filter if speed is the priority during the image processing. Wavelet LMMSE filter is the next best performing filter after NLM filter with faster speed

    Un arbre de Markov sélectif en fréquence pour la détection de signaux transitoires à faible rapport signal à bruit

    Get PDF
    Nous nous intĂ©ressons dans cet article Ă  l’extraction de comportements statistiques multirĂ©solutions pour la caractĂ©risation et la segmentation de signaux transitoires dans un contexte fortement bruitĂ©. Ces signaux de courte durĂ©e possĂšdent des composantes frĂ©quentielles trĂšs localisĂ©es et fortement variables. Le choix du compromis temps/frĂ©quence pour l’étude de ces signaux est donc crucial. Nous nous plaçons de ce fait dans le domaine transformĂ© en paquets d’ondelettes, permettant une analyse fine des variations frĂ©quentielles du signal. Nous proposons un modĂšle d’arbre de Markov original adaptĂ© Ă  la dĂ©composition en paquets d’ondelettes afin d’intĂ©grer l’information multirĂ©solution d’échelle en Ă©chelle dans un objectif de segmentation. Nous validons l’approche sur des signaux synthĂ©tiques, puis nous illustrons son intĂ©rĂȘt applicatif dans un contexte biomĂ©dical liĂ©e Ă  la dĂ©tection de signaux transitoires dans les signaux pulmonaires.We deal in this paper with the extraction of multiresolution statistical signatures for the characterization of transient signals in strongly noisy contexts. These short-time signals have sharp and highly variable frequency components. The Time-Frequency analysis window to adopt is then a major issue. Thus we have chosen the wavelet packet domain due to its natural ability to provide multiple time-frequency resolutions. We propose a new oriented Markov model dedicated to the wavelet packet transform, which offers sharp analysis of frequency variations in a signal, locally in time and at several resolutions. We show its efficiency on synthetic signals and we then illustrate its applicative relevance in a biomedical context related to the detection of transient signals in pulmonary sounds

    Multi-Modal Enhancement Techniques for Visibility Improvement of Digital Images

    Get PDF
    Image enhancement techniques for visibility improvement of 8-bit color digital images based on spatial domain, wavelet transform domain, and multiple image fusion approaches are investigated in this dissertation research. In the category of spatial domain approach, two enhancement algorithms are developed to deal with problems associated with images captured from scenes with high dynamic ranges. The first technique is based on an illuminance-reflectance (I-R) model of the scene irradiance. The dynamic range compression of the input image is achieved by a nonlinear transformation of the estimated illuminance based on a windowed inverse sigmoid transfer function. A single-scale neighborhood dependent contrast enhancement process is proposed to enhance the high frequency components of the illuminance, which compensates for the contrast degradation of the mid-tone frequency components caused by dynamic range compression. The intensity image obtained by integrating the enhanced illuminance and the extracted reflectance is then converted to a RGB color image through linear color restoration utilizing the color components of the original image. The second technique, named AINDANE, is a two step approach comprised of adaptive luminance enhancement and adaptive contrast enhancement. An image dependent nonlinear transfer function is designed for dynamic range compression and a multiscale image dependent neighborhood approach is developed for contrast enhancement. Real time processing of video streams is realized with the I-R model based technique due to its high speed processing capability while AINDANE produces higher quality enhanced images due to its multi-scale contrast enhancement property. Both the algorithms exhibit balanced luminance, contrast enhancement, higher robustness, and better color consistency when compared with conventional techniques. In the transform domain approach, wavelet transform based image denoising and contrast enhancement algorithms are developed. The denoising is treated as a maximum a posteriori (MAP) estimator problem; a Bivariate probability density function model is introduced to explore the interlevel dependency among the wavelet coefficients. In addition, an approximate solution to the MAP estimation problem is proposed to avoid the use of complex iterative computations to find a numerical solution. This relatively low complexity image denoising algorithm implemented with dual-tree complex wavelet transform (DT-CWT) produces high quality denoised images

    Communication dans le bruit : perception de sa propre voix et rehaussement de la parole

    Get PDF
    La communication dans le bruit est un problĂšme de tous les jours pour les travailleurs qui oeuvrent dans des environnements industriels bruyants. Un grand nombre de travailleurs se plaignent du fait que leurs protecteurs auditifs les empĂȘchent de communiquer facilement avec leurs collĂšgues. Ils ont alors tendance Ă  retirer leurs protecteurs et mettent ainsi leur audition Ă  risque. Ce problĂšme de communication est en fait double : les protecteurs modifient Ă  la fois la perception de la propre voix du porteur, ainsi que la comprĂ©hension de la parole des autres personnes. Cette double problĂ©matique est considĂ©rĂ©e dans le cadre de cette thĂšse. La modification de la perception de la propre voix du porteur des protecteurs est en partie due Ă  l’effet d’occlusion qui se produit lorsque le conduit auditif est occlus par un bouchon d’oreille. Cet effet d’occlusion se traduit essentiellement par une amĂ©lioration de la perception des sons de basses frĂ©quences internes Ă  l’ĂȘtre humain (bruits physiologiques), et par une modification de la perception de la propre voix de la personne. Dans le but de mieux comprendre ce phĂ©nomĂšne, suite Ă  une Ă©tude approfondie de ce qui se trouve dĂ©jĂ  dans la littĂ©rature, une nouvelle mĂ©thode pour quantifier l’effet d’occlusion a Ă©tĂ© dĂ©veloppĂ©e. Au lieu d’exciter la boite crĂąnienne du sujet au moyen d’un pot vibrant ou de faire parler le sujet, comme il se fait classiquement dans la littĂ©rature, il a Ă©tĂ© dĂ©cidĂ© d’exciter la cavitĂ© buccale des sujets au moyen d’une onde sonore. L’expĂ©rience a Ă©tĂ© conçue de telle maniĂšre que l’onde sonore qui excite la cavitĂ© buccale n’excite pas l’oreille externe ou le reste du corps directement. La dĂ©termination des seuils auditifs en oreilles ouvertes et occluses a ainsi permis de quantifier un effet d’occlusion subjectif pour une onde sonore dans le conduit buccal. Ces rĂ©sultats ainsi que les autres quantifications d’effet d’occlusion prĂ©sentĂ©es dans la littĂ©rature ont permis de mieux comprendre le phĂ©nomĂšne de l’effet d’occlusion et d’évaluer l’influence des diffĂ©rents chemins de transmission entre la source sonore et l’oreille interne. La comprĂ©hension de la parole des autres personnes est altĂ©rĂ©e Ă  la fois par le fort niveau sonore prĂ©sent dans les environnements industriels bruyants et par l’attĂ©nuation du signal de parole due aux protecteurs auditifs. Une possibilitĂ© envisageable pour remĂ©dier Ă  ce problĂšme est de dĂ©bruiter le signal de parole puis de le transmettre sous le protecteur auditif. De nombreuses techniques de dÂŽebruitage existent et sont utilisĂ©es notamment pour dĂ©bruiter la parole en tĂ©lĂ©communication. Dans le cadre de cette thĂšse, le dĂ©bruitage par seuillage d’ondelettes est considĂ©rĂ©. Une premiĂšre Ă©tude des techniques “classiques” de dĂ©bruitage par ondelettes est rĂ©alisĂ©e afin d’évaluer leurs performances dans un environnement industriel bruyant. Ainsi les signaux de paroles testĂ©s sont altĂ©rĂ©s par des bruits industriels selon une large de gamme de rapports signal Ă  bruit. Les signaux dĂ©bruitĂ©s sont Ă©valuĂ©s au moyen de quatre critĂšres. Une importante base de donnĂ©es est ainsi obtenue et est analysĂ©e au moyen d’un algorithme de sĂ©lection conçue spĂ©cifiquement pour cette tĂąche. Cette premiĂšre Ă©tude a permis de mettre en Ă©vidence l’influence des diffĂšrents paramĂȘtres du dĂ©bruitage par ondelettes sur la qualitĂ© de celui-ci et ainsi de dĂ©terminer la mĂ©thode “classique” qui permet d’obtenir les meilleures performances en terme de qualitĂ© de dĂ©bruitage. Cette premiĂšre Ă©tude a Ă©galement permis de donner des guides pour la conception d’une nouvelle loi de seuillage adaptĂ©e au dĂ©bruitage de la parole par ondelettes dans un environnement industriel bruitĂ©. Cette nouvelle loi de seuillage est prĂ©sentĂ©e et Ă©valuĂ©e dans le cadre d’une deuxiĂšme Ă©tude. Ses performances se sont avĂ©rĂ©es supĂ©rieures Ă  la mĂ©thode “classique” mise en Ă©vidence dans la premiĂšre Ă©tude pour des signaux de parole dont le rapport signal Ă  bruit est compris entre −10 dB et 15 dB

    Wavelet Domain Watermark Detection and Extraction using the Vector-based Hidden Markov Model

    Get PDF
    Multimedia data piracy is a growing problem in view of the ease and simplicity provided by the internet in transmitting and receiving such data. A possible solution to preclude unauthorized duplication or distribution of digital data is watermarking. Watermarking is an identifiable piece of information that provides security against multimedia piracy. This thesis is concerned with the investigation of various image watermarking schemes in the wavelet domain using the statistical properties of the wavelet coefficients. The wavelet subband coefficients of natural images have significantly non-Gaussian and heavy-tailed features that are best described by heavy-tailed distributions. Moreover the wavelet coefficients of images have strong inter-scale and inter-orientation dependencies. In view of this, the vector-based hidden Markov model is found to be best suited to characterize the wavelet coefficients. In this thesis, this model is used to develop new digital image watermarking schemes. Additive and multiplicative watermarking schemes in the wavelet domain are developed in order to provide improved detection and extraction of the watermark. Blind watermark detectors using log-likelihood ratio test, and watermark decoders using the maximum likelihood criterion to blindly extract the embedded watermark bits from the observation data are designed. Extensive experiments are conducted throughout this thesis using a number of databases selected from a wide variety of natural images. Simulation results are presented to demonstrate the effectiveness of the proposed image watermarking scheme and their superiority over some of the state-of-the-art techniques. It is shown that in view of the use of the hidden Markov model characterize the distributions of the wavelet coefficients of images, the proposed watermarking algorithms result in higher detection and decoding rates both before and after subjecting the watermarked image to various kinds of attacks

    INFORMATION THEORETIC CRITERIA FOR IMAGE QUALITY ASSESSMENT BASED ON NATURAL SCENE STATISTICS

    Get PDF
    Measurement of visual quality is crucial for various image and video processing applications. It is widely applied in image acquisition, media transmission, video compression, image/video restoration, etc. The goal of image quality assessment (QA) is to develop a computable quality metric which is able to properly evaluate image quality. The primary criterion is better QA consistency with human judgment. Computational complexity and resource limitations are also concerns in a successful QA design. Many methods have been proposed up to now. At the beginning, quality measurements were directly taken from simple distance measurements, which refer to mathematically signal fidelity, such as mean squared error or Minkowsky distance. Lately, QA was extended to color space and the Fourier domain in which images are better represented. Some existing methods also consider the adaptive ability of human vision. Unfortunately, the Video Quality Experts Group indicated that none of the more sophisticated metrics showed any great advantage over other existing metrics. This thesis proposes a general approach to the QA problem by evaluating image information entropy. An information theoretic model for the human visual system is proposed and an information theoretic solution is presented to derive the proper settings. The quality metric is validated by five subjective databases from different research labs. The key points for a successful quality metric are investigated. During the testing, our quality metric exhibits excellent consistency with the human judgments and compatibility with different databases. Other than full reference quality assessment metric, blind quality assessment metrics are also proposed. In order to predict quality without a reference image, two concepts are introduced which quantitatively describe the inter-scale dependency under a multi-resolution framework. Based on the success of the full reference quality metric, several blind quality metrics are proposed for five different types of distortions in the subjective databases. Our blind metrics outperform all existing blind metrics and also are able to deal with some distortions which have not been investigated
    corecore