675 research outputs found

    Ekstrasi Ciri dan Pengenalan Suara Vokal Bahasa Indonesia Berdasarkan Jenis Kelamin secara Real TIME

    Get PDF
    Suara manusia memiliki ciri yang beraneka ragam, sehingga dapat dijadikan media komunikasi yang efektif. Oleh karena itu banyak penelitian yang berkaitan dengan suara manusia dilakukan untuk meningkatkan pengenalan suara. Proses pengembangan pengenalan suara dilakukan secara realtime berdasarkan jenis kelamin untuk menghasilkan akurasi yang tepat dalam batas waktu yang telah ditentukan. Metode Discrete Wavelet Transform (DWT) level 3 dan Dynamic Time Wraping (DTW) digunakan sebagai metode ekstrasi ciri dan metode pengenalan suara. Pada metode ekstrasi ciri Discrete Wavelet Transform (DWT) level 3 didapatkan 8 buah ciri. Sedangkan metode pengenalan suara menggunakan Dynamic Time Wraping (DTW) dilakukan dengan menghitung diskriminasi jarak terkecil antara dua ciri yang berbeda tanpa dilakukan pelatihan terlebih dahulu. Pengenalan suara diujikan pada 6 orang penutur pria dan 6 orang penutur wanita secara bergantian dengan masing-masing data pengukuran 900 pasang. Hasil persentase rata-rata pengenalan akurasi terbaik mencapai 54,6% dari pengujian terhadap 6 orang penutur pria secara bergantian dan 54,17 % dari pengujian terhadap 6 orang penutur wanita secara bergantian dari masing-masing pasangan data yang diperoleh secara realtime

    Split-screen single-camera stereoscopic PIV application to a turbulent confined swirling layer with free surface

    Get PDF
    An annular liquid wall jet, or vortex tube, generated by helical injection inside a tube is studied experimentally as a possible means of fusion reactor shielding. The hollow confined vortex/swirling layer exhibits simultaneously all the complexities of swirling turbulence, free surface, droplet formation, bubble entrapment; all posing challenging diagnostic issues. The construction of flow apparatus and the choice of working liquid and seeding particles facilitate unimpeded optical access to the flow field. A split-screen, single-camera stereoscopic particle image velocimetry (SPIV) scheme is employed for flow field characterization. Image calibration and free surface identification issues are discussed. The interference in measurements of laser beam reflection at the interface are identified and discussed. Selected velocity measurements and turbulence statistics are presented at Re_λ = 70 (Re = 3500 based on mean layer thickness)

    Motion Segmentation Aided Super Resolution Image Reconstruction

    Get PDF
    This dissertation addresses Super Resolution (SR) Image Reconstruction focusing on motion segmentation. The main thrust is Information Complexity guided Gaussian Mixture Models (GMMs) for Statistical Background Modeling. In the process of developing our framework we also focus on two other topics; motion trajectories estimation toward global and local scene change detections and image reconstruction to have high resolution (HR) representations of the moving regions. Such a framework is used for dynamic scene understanding and recognition of individuals and threats with the help of the image sequences recorded with either stationary or non-stationary camera systems. We introduce a new technique called Information Complexity guided Statistical Background Modeling. Thus, we successfully employ GMMs, which are optimal with respect to information complexity criteria. Moving objects are segmented out through background subtraction which utilizes the computed background model. This technique produces superior results to competing background modeling strategies. The state-of-the-art SR Image Reconstruction studies combine the information from a set of unremarkably different low resolution (LR) images of static scene to construct an HR representation. The crucial challenge not handled in these studies is accumulating the corresponding information from highly displaced moving objects. In this aspect, a framework of SR Image Reconstruction of the moving objects with such high level of displacements is developed. Our assumption is that LR images are different from each other due to local motion of the objects and the global motion of the scene imposed by non-stationary imaging system. Contrary to traditional SR approaches, we employed several steps. These steps are; the suppression of the global motion, motion segmentation accompanied by background subtraction to extract moving objects, suppression of the local motion of the segmented out regions, and super-resolving accumulated information coming from moving objects rather than the whole scene. This results in a reliable offline SR Image Reconstruction tool which handles several types of dynamic scene changes, compensates the impacts of camera systems, and provides data redundancy through removing the background. The framework proved to be superior to the state-of-the-art algorithms which put no significant effort toward dynamic scene representation of non-stationary camera systems

    Deep learning for time series classification: a review

    Get PDF
    Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.Comment: Accepted at Data Mining and Knowledge Discover

    Methods for speaking style conversion from normal speech to high vocal effort speech

    Get PDF
    This thesis deals with vocal-effort-focused speaking style conversion (SSC). Specifically, we studied two topics on conversion of normal speech to high vocal effort. The first topic involves the conversion of normal speech to shouted speech. We employed this conversion in a speaker recognition system with vocal effort mismatch between test and enrollment utterances (shouted speech vs. normal speech). The mismatch causes a degradation of the system's speaker identification performance. As solution, we proposed a SSC system that included a novel spectral mapping, used along a statistical mapping technique, to transform the mel-frequency spectral energies of normal speech enrollment utterances towards their counterparts in shouted speech. We evaluated the proposed solution by comparing speaker identification rates for a state-of-the-art i-vector-based speaker recognition system, with and without applying SSC to the enrollment utterances. Our results showed that applying the proposed SSC pre-processing to the enrollment data improves considerably the speaker identification rates. The second topic involves a normal-to-Lombard speech conversion. We proposed a vocoder-based parametric SSC system to perform the conversion. This system first extracts speech features using the vocoder. Next, a mapping technique, robust to data scarcity, maps the features. Finally, the vocoder synthesizes the mapped features into speech. We used two vocoders in the conversion system, for comparison: a glottal vocoder and the widely used STRAIGHT. We assessed the converted speech from the two vocoder cases with two subjective listening tests that measured similarity to Lombard speech and naturalness. The similarity subjective test showed that, for both vocoder cases, our proposed SSC system was able to convert normal speech to Lombard speech. The naturalness subjective test showed that the converted samples using the glottal vocoder were clearly more natural than those obtained with STRAIGHT
    corecore