Search CORE

9 research outputs found

Pembuatan Perangkat Basis Data untuk Sintesis Ucapan (Natural Speech Synthesis) Berbahasa Indonesia Berbasis Hidden Markov Model (HMM)

Author: Anggrayni Elok
Arifianto Dhany
Sekartedjo Sekartedjo
Publication venue: 'Lembaga Penelitian dan Pengabdian kepada Masyarakat ITS'
Publication date: 20/09/2013
Field of study

Salah satu teknik sintesis ucapan adalah sistem statistik parametrik sintesis ucapan menggunakan Hidden Markov Model (HMM). Speech synthesis dalam bahasa Indonesia dengan menggunakan HTS masih belum pernah dikembangkan (under-resourced). Penelitian ini diawali dengan pembuatan basis data suara bahasa Indonesia melalui proses perekaman, kemudian diikuti dengan proses segmentasi simbol fonetik, dan pemberian label. Dalam penelitian ini diperoleh basis data dalam bahasa Indonesia sejumlah 1529 kalimat yang sesuai dengan kaidah keseimbangan fonetik (phonetically balanced), yaitu telah memenuhi 33 jenis fonem. Selain itu, diperoleh juga segmentasi dan labeling dataset sebanyak 100 kalimat hasil rekaman suara laki-laki dan 100 kalimat hasil rekaman suara wanita. Penyiapan perangkat lunak untuk menjalankan sistem sintesis ucapan berbahasa Inggris berbasis HMM telah dilakukan dengan mengaplikasikan HTS yang menggunakan. Berdasarkan hasil uji kualitas suara menggunakan uji subyektif, melibatkan 20 responden, diperoleh naturalness dengan nilai Mean Opinion Score (MOS) 3,4 untuk pengujian hasil training speaker dependent (SD) training demo dan 3,2 untuk pengujian hasil speaker adaptation/adaptive (SAD) training demo. Dengan demikian, synthetic speech yang dihasilkan dapat dikategorikan baik dan perangkat lunak yang dipakai dapat digunakan untuk melakukan perancangan sistem sintesis ucapan berbahasa Indonesia.

Jurnal Teknik ITS

Institut Teknologi Sepuluh Nopember (ITS): Publikasi Ilmiah Online Mahasiswa ITS (POMITS)

Evaluation of Digital Speckle Filters for Ultrasound Images

Author: Radzi Fara Nabila
Publication venue: Universiti Teknologi PETRONAS
Publication date: 01/05/2014
Field of study

Ultrasound (US) images are inherently corrupted by speckle noise causing inaccuracy of medical diagnosis using this technique. Hence, numerous despeckling filters are used to denoise US images. However most of the despeckling techniques cause blurring to the US images. In this work, four filters namely Lee, Wavelet Linear Minimum Mean Square Error (LMMSE), Speckle-reduction Anisotropic Diffusion (SRAD) and Non-local-means (NLM) filters are evaluated in terms of their ability in noise removal and capability to preserve the image contrast. This is done through calculating four performance metrics Peak Signal to Noise Ratio (PSNR), Ultrasound Despeckling Assessment Index (USDSAI), Normalized Variance and Mean Preservation. The experiments were conducted on three different types of images which is simulated noise images, computer generated image and real US images. The evaluation in terms of PSNR, USDSAI, Normalized Variance and Mean Preservation shows that NLM filter is the best filter in all scenarios considering both speckle noise suppression and image restoration however with quite slow processing time. It may not be the best option of filter if speed is the priority during the image processing. Wavelet LMMSE filter is the next best performing filter after NLM filter with faster speed

UTPedia

Evaluation of Digital Speckle Filters for Ultrasound Images

Author: Radzi Fara Nabila
Publication venue: Universiti Teknologi PETRONAS
Publication date: 01/05/2014
Field of study

UTPedia

Un arbre de Markov sélectif en fréquence pour la détection de signaux transitoires à faible rapport signal à bruit

Author: Collet Christophe
Le Cam Steven
Salzenstein Fabien
Publication venue: 'Lavoisier'
Publication date: 01/01/2010
Field of study

Nous nous intéressons dans cet article à l’extraction de comportements statistiques multirésolutions pour la caractérisation et la segmentation de signaux transitoires dans un contexte fortement bruité. Ces signaux de courte durée possèdent des composantes fréquentielles très localisées et fortement variables. Le choix du compromis temps/fréquence pour l’étude de ces signaux est donc crucial. Nous nous plaçons de ce fait dans le domaine transformé en paquets d’ondelettes, permettant une analyse fine des variations fréquentielles du signal. Nous proposons un modèle d’arbre de Markov original adapté à la décomposition en paquets d’ondelettes afin d’intégrer l’information multirésolution d’échelle en échelle dans un objectif de segmentation. Nous validons l’approche sur des signaux synthétiques, puis nous illustrons son intérêt applicatif dans un contexte biomédical liée à la détection de signaux transitoires dans les signaux pulmonaires.We deal in this paper with the extraction of multiresolution statistical signatures for the characterization of transient signals in strongly noisy contexts. These short-time signals have sharp and highly variable frequency components. The Time-Frequency analysis window to adopt is then a major issue. Thus we have chosen the wavelet packet domain due to its natural ability to provide multiple time-frequency resolutions. We propose a new oriented Markov model dedicated to the wavelet packet transform, which offers sharp analysis of frequency variations in a signal, locally in time and at several resolutions. We show its efficiency on synthetic signals and we then illustrate its applicative relevance in a biomedical context related to the detection of transient signals in pulmonary sounds

I-Revues

Multi-Modal Enhancement Techniques for Visibility Improvement of Digital Images

Author: Tao Li
Publication venue: ODU Digital Commons
Publication date: 01/01/2005
Field of study

Image enhancement techniques for visibility improvement of 8-bit color digital images based on spatial domain, wavelet transform domain, and multiple image fusion approaches are investigated in this dissertation research. In the category of spatial domain approach, two enhancement algorithms are developed to deal with problems associated with images captured from scenes with high dynamic ranges. The first technique is based on an illuminance-reflectance (I-R) model of the scene irradiance. The dynamic range compression of the input image is achieved by a nonlinear transformation of the estimated illuminance based on a windowed inverse sigmoid transfer function. A single-scale neighborhood dependent contrast enhancement process is proposed to enhance the high frequency components of the illuminance, which compensates for the contrast degradation of the mid-tone frequency components caused by dynamic range compression. The intensity image obtained by integrating the enhanced illuminance and the extracted reflectance is then converted to a RGB color image through linear color restoration utilizing the color components of the original image. The second technique, named AINDANE, is a two step approach comprised of adaptive luminance enhancement and adaptive contrast enhancement. An image dependent nonlinear transfer function is designed for dynamic range compression and a multiscale image dependent neighborhood approach is developed for contrast enhancement. Real time processing of video streams is realized with the I-R model based technique due to its high speed processing capability while AINDANE produces higher quality enhanced images due to its multi-scale contrast enhancement property. Both the algorithms exhibit balanced luminance, contrast enhancement, higher robustness, and better color consistency when compared with conventional techniques. In the transform domain approach, wavelet transform based image denoising and contrast enhancement algorithms are developed. The denoising is treated as a maximum a posteriori (MAP) estimator problem; a Bivariate probability density function model is introduced to explore the interlevel dependency among the wavelet coefficients. In addition, an approximate solution to the MAP estimation problem is proposed to avoid the use of complex iterative computations to find a numerical solution. This relatively low complexity image denoising algorithm implemented with dual-tree complex wavelet transform (DT-CWT) produces high quality denoised images

Old Dominion University

Communication dans le bruit : perception de sa propre voix et rehaussement de la parole

Author: Le Cocq Cécile
Publication venue: École de technologie supérieure
Publication date
Field of study

La communication dans le bruit est un problème de tous les jours pour les travailleurs qui oeuvrent dans des environnements industriels bruyants. Un grand nombre de travailleurs se plaignent du fait que leurs protecteurs auditifs les empêchent de communiquer facilement avec leurs collègues. Ils ont alors tendance à retirer leurs protecteurs et mettent ainsi leur audition à risque. Ce problème de communication est en fait double : les protecteurs modifient à la fois la perception de la propre voix du porteur, ainsi que la compréhension de la parole des autres personnes. Cette double problématique est considérée dans le cadre de cette thèse. La modification de la perception de la propre voix du porteur des protecteurs est en partie due à l’effet d’occlusion qui se produit lorsque le conduit auditif est occlus par un bouchon d’oreille. Cet effet d’occlusion se traduit essentiellement par une amélioration de la perception des sons de basses fréquences internes à l’être humain (bruits physiologiques), et par une modification de la perception de la propre voix de la personne. Dans le but de mieux comprendre ce phénomène, suite à une étude approfondie de ce qui se trouve déjà dans la littérature, une nouvelle méthode pour quantifier l’effet d’occlusion a été développée. Au lieu d’exciter la boite crânienne du sujet au moyen d’un pot vibrant ou de faire parler le sujet, comme il se fait classiquement dans la littérature, il a été décidé d’exciter la cavité buccale des sujets au moyen d’une onde sonore. L’expérience a été conçue de telle manière que l’onde sonore qui excite la cavité buccale n’excite pas l’oreille externe ou le reste du corps directement. La détermination des seuils auditifs en oreilles ouvertes et occluses a ainsi permis de quantifier un effet d’occlusion subjectif pour une onde sonore dans le conduit buccal. Ces résultats ainsi que les autres quantifications d’effet d’occlusion présentées dans la littérature ont permis de mieux comprendre le phénomène de l’effet d’occlusion et d’évaluer l’influence des différents chemins de transmission entre la source sonore et l’oreille interne. La compréhension de la parole des autres personnes est altérée à la fois par le fort niveau sonore présent dans les environnements industriels bruyants et par l’atténuation du signal de parole due aux protecteurs auditifs. Une possibilité envisageable pour remédier à ce problème est de débruiter le signal de parole puis de le transmettre sous le protecteur auditif. De nombreuses techniques de d´ebruitage existent et sont utilisées notamment pour débruiter la parole en télécommunication. Dans le cadre de cette thèse, le débruitage par seuillage d’ondelettes est considéré. Une première étude des techniques “classiques” de débruitage par ondelettes est réalisée afin d’évaluer leurs performances dans un environnement industriel bruyant. Ainsi les signaux de paroles testés sont altérés par des bruits industriels selon une large de gamme de rapports signal à bruit. Les signaux débruités sont évalués au moyen de quatre critères. Une importante base de données est ainsi obtenue et est analysée au moyen d’un algorithme de sélection conçue spécifiquement pour cette tâche. Cette première étude a permis de mettre en évidence l’influence des diffèrents paramêtres du débruitage par ondelettes sur la qualité de celui-ci et ainsi de déterminer la méthode “classique” qui permet d’obtenir les meilleures performances en terme de qualité de débruitage. Cette première étude a également permis de donner des guides pour la conception d’une nouvelle loi de seuillage adaptée au débruitage de la parole par ondelettes dans un environnement industriel bruité. Cette nouvelle loi de seuillage est présentée et évaluée dans le cadre d’une deuxième étude. Ses performances se sont avérées supérieures à la méthode “classique” mise en évidence dans la première étude pour des signaux de parole dont le rapport signal à bruit est compris entre −10 dB et 15 dB

Espace ÉTS

Wavelet Domain Watermark Detection and Extraction using the Vector-based Hidden Markov Model

Author: Amini Marzieh
Publication venue
Publication date: 14/09/2016
Field of study

Multimedia data piracy is a growing problem in view of the ease and simplicity provided by the internet in transmitting and receiving such data. A possible solution to preclude unauthorized duplication or distribution of digital data is watermarking. Watermarking is an identifiable piece of information that provides security against multimedia piracy. This thesis is concerned with the investigation of various image watermarking schemes in the wavelet domain using the statistical properties of the wavelet coefficients. The wavelet subband coefficients of natural images have significantly non-Gaussian and heavy-tailed features that are best described by heavy-tailed distributions. Moreover the wavelet coefficients of images have strong inter-scale and inter-orientation dependencies. In view of this, the vector-based hidden Markov model is found to be best suited to characterize the wavelet coefficients. In this thesis, this model is used to develop new digital image watermarking schemes. Additive and multiplicative watermarking schemes in the wavelet domain are developed in order to provide improved detection and extraction of the watermark. Blind watermark detectors using log-likelihood ratio test, and watermark decoders using the maximum likelihood criterion to blindly extract the embedded watermark bits from the observation data are designed. Extensive experiments are conducted throughout this thesis using a number of databases selected from a wide variety of natural images. Simulation results are presented to demonstrate the effectiveness of the proposed image watermarking scheme and their superiority over some of the state-of-the-art techniques. It is shown that in view of the use of the hidden Markov model characterize the distributions of the wavelet coefficients of images, the proposed watermarking algorithms result in higher detection and decoding rates both before and after subjecting the watermarked image to various kinds of attacks

Concordia University Research Repository

INFORMATION THEORETIC CRITERIA FOR IMAGE QUALITY ASSESSMENT BASED ON NATURAL SCENE STATISTICS

Author: Zhang Di
Publication venue: 'University of Waterloo'
Publication date: 01/01/2009
Field of study

Measurement of visual quality is crucial for various image and video processing applications. It is widely applied in image acquisition, media transmission, video compression, image/video restoration, etc. The goal of image quality assessment (QA) is to develop a computable quality metric which is able to properly evaluate image quality. The primary criterion is better QA consistency with human judgment. Computational complexity and resource limitations are also concerns in a successful QA design. Many methods have been proposed up to now. At the beginning, quality measurements were directly taken from simple distance measurements, which refer to mathematically signal fidelity, such as mean squared error or Minkowsky distance. Lately, QA was extended to color space and the Fourier domain in which images are better represented. Some existing methods also consider the adaptive ability of human vision. Unfortunately, the Video Quality Experts Group indicated that none of the more sophisticated metrics showed any great advantage over other existing metrics. This thesis proposes a general approach to the QA problem by evaluating image information entropy. An information theoretic model for the human visual system is proposed and an information theoretic solution is presented to derive the proper settings. The quality metric is validated by five subjective databases from different research labs. The key points for a successful quality metric are investigated. During the testing, our quality metric exhibits excellent consistency with the human judgments and compatibility with different databases. Other than full reference quality assessment metric, blind quality assessment metrics are also proposed. In order to predict quality without a reference image, two concepts are introduced which quantitatively describe the inter-scale dependency under a multi-resolution framework. Based on the success of the full reference quality metric, several blind quality metrics are proposed for five different types of distortions in the subjective databases. Our blind metrics outperform all existing blind metrics and also are able to deal with some distortions which have not been investigated

University of Waterloo's Institutional Repository

Fuzzy and Nonlinear Restoration and Analysis Techniques for Digital Images

Author: Schulte Stefan
Publication venue
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography