9 research outputs found
Pembuatan Perangkat Basis Data untuk Sintesis Ucapan (Natural Speech Synthesis) Berbahasa Indonesia Berbasis Hidden Markov Model (HMM)
Salah satu teknik sintesis ucapan adalah sistem statistik parametrik sintesis ucapan menggunakan Hidden Markov Model (HMM). Speech synthesis dalam bahasa Indonesia dengan menggunakan HTS masih belum pernah dikembangkan (under-resourced). Penelitian ini diawali dengan pembuatan basis data suara bahasa Indonesia melalui proses perekaman, kemudian diikuti dengan proses segmentasi simbol fonetik, dan pemberian label. Dalam penelitian ini diperoleh basis data dalam bahasa Indonesia sejumlah 1529 kalimat yang sesuai dengan kaidah keseimbangan fonetik (phonetically balanced), yaitu telah memenuhi 33 jenis fonem. Selain itu, diperoleh juga segmentasi dan labeling dataset sebanyak 100 kalimat hasil rekaman suara laki-laki dan 100 kalimat hasil rekaman suara wanita. Penyiapan perangkat lunak untuk menjalankan sistem sintesis ucapan berbahasa Inggris berbasis HMM telah dilakukan dengan mengaplikasikan HTS yang menggunakan. Berdasarkan hasil uji kualitas suara menggunakan uji subyektif, melibatkan 20 responden, diperoleh naturalness dengan nilai Mean Opinion Score (MOS) 3,4 untuk pengujian hasil training speaker dependent (SD) training demo dan 3,2 untuk pengujian hasil speaker adaptation/adaptive (SAD) training demo. Dengan demikian, synthetic speech yang dihasilkan dapat dikategorikan baik dan perangkat lunak yang dipakai dapat digunakan untuk melakukan perancangan sistem sintesis ucapan berbahasa Indonesia.
Evaluation of Digital Speckle Filters for Ultrasound Images
Ultrasound (US) images are inherently corrupted by speckle noise causing inaccuracy of medical diagnosis using this technique. Hence, numerous despeckling filters are used to denoise US images. However most of the despeckling techniques cause blurring to the US images. In this work, four filters namely Lee, Wavelet Linear Minimum Mean Square Error (LMMSE), Speckle-reduction Anisotropic Diffusion (SRAD) and Non-local-means (NLM) filters are evaluated in terms of their ability in noise removal and capability to preserve the image contrast. This is done through calculating four performance metrics Peak Signal to Noise Ratio (PSNR), Ultrasound Despeckling Assessment Index (USDSAI), Normalized Variance and Mean Preservation. The experiments were conducted on three different types of images which is simulated noise images, computer generated image and real US images. The evaluation in terms of PSNR, USDSAI, Normalized Variance and Mean Preservation shows that NLM filter is the best filter in all scenarios considering both speckle noise suppression and image restoration however with quite slow processing time. It may not be the best option of filter if speed is the priority during the image processing. Wavelet LMMSE filter is the next best performing filter after NLM filter with faster speed
Evaluation of Digital Speckle Filters for Ultrasound Images
Ultrasound (US) images are inherently corrupted by speckle noise causing inaccuracy of medical diagnosis using this technique. Hence, numerous despeckling filters are used to denoise US images. However most of the despeckling techniques cause blurring to the US images. In this work, four filters namely Lee, Wavelet Linear Minimum Mean Square Error (LMMSE), Speckle-reduction Anisotropic Diffusion (SRAD) and Non-local-means (NLM) filters are evaluated in terms of their ability in noise removal and capability to preserve the image contrast. This is done through calculating four performance metrics Peak Signal to Noise Ratio (PSNR), Ultrasound Despeckling Assessment Index (USDSAI), Normalized Variance and Mean Preservation. The experiments were conducted on three different types of images which is simulated noise images, computer generated image and real US images. The evaluation in terms of PSNR, USDSAI, Normalized Variance and Mean Preservation shows that NLM filter is the best filter in all scenarios considering both speckle noise suppression and image restoration however with quite slow processing time. It may not be the best option of filter if speed is the priority during the image processing. Wavelet LMMSE filter is the next best performing filter after NLM filter with faster speed
Un arbre de Markov sélectif en fréquence pour la détection de signaux transitoires à faible rapport signal à bruit
Nous nous intĂ©ressons dans cet article Ă lâextraction de comportements statistiques
multirésolutions pour la caractérisation et la segmentation de signaux transitoires dans un
contexte fortement bruité. Ces signaux de courte durée possÚdent des composantes fréquentielles
trÚs localisées et fortement variables. Le choix du compromis temps/fréquence pour
lâĂ©tude de ces signaux est donc crucial. Nous nous plaçons de ce fait dans le domaine transformĂ©
en paquets dâondelettes, permettant une analyse fine des variations frĂ©quentielles du
signal. Nous proposons un modĂšle dâarbre de Markov original adaptĂ© Ă la dĂ©composition en
paquets dâondelettes afin dâintĂ©grer lâinformation multirĂ©solution dâĂ©chelle en Ă©chelle dans un
objectif de segmentation. Nous validons lâapproche sur des signaux synthĂ©tiques, puis nous
illustrons son intĂ©rĂȘt applicatif dans un contexte biomĂ©dical liĂ©e Ă la dĂ©tection de signaux
transitoires dans les signaux pulmonaires.We deal in this paper with the extraction of multiresolution statistical signatures for
the characterization of transient signals in strongly noisy contexts. These short-time signals
have sharp and highly variable frequency components. The Time-Frequency analysis window
to adopt is then a major issue. Thus we have chosen the wavelet packet domain due to its natural
ability to provide multiple time-frequency resolutions. We propose a new oriented Markov
model dedicated to the wavelet packet transform, which offers sharp analysis of frequency
variations in a signal, locally in time and at several resolutions. We show its efficiency on synthetic
signals and we then illustrate its applicative relevance in a biomedical context related
to the detection of transient signals in pulmonary sounds
Multi-Modal Enhancement Techniques for Visibility Improvement of Digital Images
Image enhancement techniques for visibility improvement of 8-bit color digital images based on spatial domain, wavelet transform domain, and multiple image fusion approaches are investigated in this dissertation research.
In the category of spatial domain approach, two enhancement algorithms are developed to deal with problems associated with images captured from scenes with high dynamic ranges. The first technique is based on an illuminance-reflectance (I-R) model of the scene irradiance. The dynamic range compression of the input image is achieved by a nonlinear transformation of the estimated illuminance based on a windowed inverse sigmoid transfer function. A single-scale neighborhood dependent contrast enhancement process is proposed to enhance the high frequency components of the illuminance, which compensates for the contrast degradation of the mid-tone frequency components caused by dynamic range compression. The intensity image obtained by integrating the enhanced illuminance and the extracted reflectance is then converted to a RGB color image through linear color restoration utilizing the color components of the original image. The second technique, named AINDANE, is a two step approach comprised of adaptive luminance enhancement and adaptive contrast enhancement. An image dependent nonlinear transfer function is designed for dynamic range compression and a multiscale image dependent neighborhood approach is developed for contrast enhancement. Real time processing of video streams is realized with the I-R model based technique due to its high speed processing capability while AINDANE produces higher quality enhanced images due to its multi-scale contrast enhancement property. Both the algorithms exhibit balanced luminance, contrast enhancement, higher robustness, and better color consistency when compared with conventional techniques.
In the transform domain approach, wavelet transform based image denoising and contrast enhancement algorithms are developed. The denoising is treated as a maximum a posteriori (MAP) estimator problem; a Bivariate probability density function model is introduced to explore the interlevel dependency among the wavelet coefficients. In addition, an approximate solution to the MAP estimation problem is proposed to avoid the use of complex iterative computations to find a numerical solution. This relatively low complexity image denoising algorithm implemented with dual-tree complex wavelet transform (DT-CWT) produces high quality denoised images
Communication dans le bruit : perception de sa propre voix et rehaussement de la parole
La communication dans le bruit est un problĂšme de tous les jours pour les travailleurs qui oeuvrent dans des environnements industriels bruyants. Un grand nombre de travailleurs se plaignent du fait que leurs protecteurs auditifs les empĂȘchent de communiquer facilement avec leurs collĂšgues. Ils ont alors tendance Ă retirer leurs protecteurs et mettent ainsi leur audition Ă risque. Ce problĂšme de communication est en fait double : les protecteurs modifient Ă la fois la perception de la propre voix du porteur, ainsi que la comprĂ©hension de la parole des autres personnes. Cette double problĂ©matique est considĂ©rĂ©e dans le cadre de cette thĂšse.
La modification de la perception de la propre voix du porteur des protecteurs est en partie due Ă lâeffet dâocclusion qui se produit lorsque le conduit auditif est occlus par un bouchon dâoreille. Cet effet dâocclusion se traduit essentiellement par une amĂ©lioration de la perception des sons de basses frĂ©quences internes Ă lâĂȘtre humain (bruits physiologiques), et par une modification de la perception de la propre voix de la personne. Dans le but de mieux comprendre ce phĂ©nomĂšne, suite Ă une Ă©tude approfondie de ce qui se trouve dĂ©jĂ dans la littĂ©rature, une nouvelle mĂ©thode pour quantifier lâeffet dâocclusion a Ă©tĂ© dĂ©veloppĂ©e. Au lieu dâexciter la boite crĂąnienne du sujet au moyen dâun pot vibrant ou de faire parler le sujet, comme il se fait classiquement dans la littĂ©rature, il a Ă©tĂ© dĂ©cidĂ© dâexciter la cavitĂ© buccale des sujets au moyen dâune onde sonore. LâexpĂ©rience a Ă©tĂ© conçue de telle maniĂšre que lâonde sonore qui excite la cavitĂ© buccale nâexcite pas lâoreille externe ou le reste du corps directement. La dĂ©termination des seuils auditifs en oreilles ouvertes et occluses a ainsi permis de quantifier un effet dâocclusion subjectif pour une onde sonore dans le conduit buccal. Ces rĂ©sultats ainsi que les autres quantifications dâeffet dâocclusion prĂ©sentĂ©es dans la littĂ©rature ont permis de mieux comprendre le phĂ©nomĂšne de lâeffet dâocclusion et dâĂ©valuer lâinfluence des diffĂ©rents chemins de transmission entre la source sonore et lâoreille interne.
La comprĂ©hension de la parole des autres personnes est altĂ©rĂ©e Ă la fois par le fort niveau sonore prĂ©sent dans les environnements industriels bruyants et par lâattĂ©nuation du signal de parole due aux protecteurs auditifs. Une possibilitĂ© envisageable pour remĂ©dier Ă ce problĂšme est de dĂ©bruiter le signal de parole puis de le transmettre sous le protecteur auditif. De nombreuses techniques de dÂŽebruitage existent et sont utilisĂ©es notamment pour dĂ©bruiter la parole en tĂ©lĂ©communication. Dans le cadre de cette thĂšse, le dĂ©bruitage par seuillage dâondelettes est considĂ©rĂ©. Une premiĂšre Ă©tude des techniques âclassiquesâ de dĂ©bruitage par ondelettes est rĂ©alisĂ©e afin dâĂ©valuer leurs performances dans un environnement industriel bruyant. Ainsi les signaux de paroles testĂ©s sont altĂ©rĂ©s par des bruits industriels selon une large de gamme de rapports signal Ă bruit. Les signaux dĂ©bruitĂ©s sont Ă©valuĂ©s au moyen de quatre critĂšres. Une importante base de donnĂ©es est ainsi obtenue et est analysĂ©e au moyen dâun algorithme de sĂ©lection conçue spĂ©cifiquement pour cette tĂąche. Cette premiĂšre Ă©tude a permis de mettre en Ă©vidence lâinfluence des diffĂšrents paramĂȘtres du dĂ©bruitage par ondelettes sur la qualitĂ© de celui-ci et ainsi de dĂ©terminer la mĂ©thode âclassiqueâ qui permet dâobtenir les meilleures performances en terme de qualitĂ© de dĂ©bruitage. Cette premiĂšre Ă©tude a Ă©galement permis de donner des guides pour la conception dâune nouvelle loi de seuillage adaptĂ©e au dĂ©bruitage de la parole par ondelettes dans un environnement industriel bruitĂ©. Cette nouvelle loi de seuillage est prĂ©sentĂ©e et Ă©valuĂ©e dans le cadre dâune deuxiĂšme Ă©tude. Ses performances se sont avĂ©rĂ©es supĂ©rieures Ă la mĂ©thode âclassiqueâ mise en Ă©vidence dans la premiĂšre Ă©tude pour des signaux de parole dont le rapport signal Ă bruit est compris entre â10 dB et 15 dB
Wavelet Domain Watermark Detection and Extraction using the Vector-based Hidden Markov Model
Multimedia data piracy is a growing problem in view of the ease and simplicity provided by the internet in transmitting and receiving such data. A possible solution to preclude unauthorized duplication or distribution of digital data is watermarking. Watermarking is an identifiable piece of information that provides security against multimedia piracy. This thesis is concerned with the investigation of various image watermarking schemes in the wavelet domain using the statistical properties of the wavelet coefficients. The wavelet subband coefficients of natural images have significantly non-Gaussian and heavy-tailed features that are best described by heavy-tailed distributions. Moreover the wavelet coefficients of images have strong inter-scale and inter-orientation dependencies. In view of this, the vector-based hidden Markov model is found to be best suited to characterize the wavelet coefficients. In this thesis, this model is used to develop new digital image watermarking schemes. Additive and multiplicative watermarking schemes in the wavelet domain are developed in order to provide improved detection and extraction of the watermark. Blind watermark detectors using log-likelihood ratio test, and watermark decoders using the maximum likelihood criterion to blindly extract the embedded watermark bits from the observation data are designed.
Extensive experiments are conducted throughout this thesis using a number of databases selected from a wide variety of natural images. Simulation results are presented to demonstrate the effectiveness of the proposed image watermarking scheme and their superiority over some of the state-of-the-art techniques. It is shown that in view of the use of the hidden Markov model characterize the distributions of the wavelet coefficients of images, the proposed watermarking algorithms result in higher detection and decoding rates both before and after subjecting the watermarked image to various kinds of attacks
INFORMATION THEORETIC CRITERIA FOR IMAGE QUALITY ASSESSMENT BASED ON NATURAL SCENE STATISTICS
Measurement of visual quality is crucial
for various image and video processing applications. It is widely
applied in image acquisition, media transmission, video compression,
image/video restoration, etc.
The goal of image quality assessment (QA) is to develop a computable
quality metric which is able to properly evaluate image quality. The
primary criterion is better QA consistency with human judgment.
Computational complexity and resource limitations are also concerns
in a successful QA design. Many methods have been proposed up to
now. At the beginning, quality measurements were directly taken from
simple distance measurements, which refer to mathematically signal
fidelity, such as mean squared error or Minkowsky distance. Lately,
QA was extended to color space and the Fourier domain in which
images are better represented. Some existing methods also consider
the adaptive ability of human vision. Unfortunately, the Video
Quality Experts Group indicated that none of the more sophisticated
metrics showed any great advantage over other existing metrics.
This thesis proposes a general approach to the QA problem by
evaluating image information entropy. An information theoretic model
for the human visual system is proposed and an information theoretic
solution is presented to derive the proper settings. The quality
metric is validated by five subjective databases from different
research labs. The key points for a successful quality metric are
investigated. During the testing, our quality metric exhibits
excellent consistency with the human judgments and compatibility
with different databases. Other than full reference quality
assessment metric, blind quality assessment metrics are also
proposed. In order to predict quality without a reference image, two
concepts are introduced which quantitatively describe the
inter-scale dependency under a multi-resolution framework. Based on
the success of the full reference quality metric, several blind
quality metrics are proposed for five different types of distortions
in the subjective databases. Our blind metrics outperform all
existing blind metrics and also are able to deal with some
distortions which have not been investigated