21 research outputs found

    An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition

    Get PDF
    In the past, the kernel of automatic speech recognition (ASR) is dynamic time warping (DTW), which is feature-based template matching and belongs to the category technique of dynamic programming (DP). Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is essentially a hidden Markov model- (HMM-) like method where the concept of the typical HMM statistical model is brought into the design of DTW. The developed HMM-like DTW method, transforming feature-based DTW recognition into model-based DTW recognition, will be able to behave as the HMM recognition technique and therefore proposed HMM-like DTW with the HMM-like recognition model will have the capability to further perform model adaptation (also known as speaker adaptation). A series of experimental results in home automation-based multimedia access service environments demonstrated the superiority and effectiveness of the developed smart speech recognition system by HMM-like DTW

    A Wireless Sensor Network-Speech Recognition Scheme Using Deployments of Multiple Kinect Microphone Array-Sensors

    Get PDF
    Speech recognition has successfully been utilized in lots of applications recently. With the development of the Kinect sensor device from Microsoft, speech recognition could be further promoted to be used in an ubiquitous environment where a wireless sensor network using Kinect sensors is deployed. This study develops a wireless sensor network (WSN)-speech recognition scheme using deployments of multiple Kinect microphone-array sensors. Presented speech recognition by Kinect-WSN could effectively capture the acoustic data made from the talking speaker and then perform the corresponding voice command control on certain target. In this study, different strategies to deploy multiple Kinect microphone-array sensors for constructing an ubiquitous Kinect-WSN speech recognition environment are investigated. Several different acoustic sensing data fusion methods are also explored for achieving superior performance on Kinect-WSN speech recognition. The presented method in this paper is evaluated the efficiency and effectiveness in an 5m×5m laboratory environment in which any of four test speakers is to make the voice command anywhere. Developed Kinect microphone array sensor-deployed WSN speech recognition in this work is finely utilized in various different applications in control

    Optical music recognition of the singer using formant frequency estimation of vocal fold vibration and lip motion with interpolated GMM classifiers

    Get PDF
    The main work of this paper is to identify the musical genres of the singer by performing the optical detection of lip motion. Recently, optical music recognition has attracted much attention. Optical music recognition in this study is a type of automatic techniques in information engineering, which can be used to determine the musical style of the singer. This paper proposes a method for optical music recognition where acoustic formant analysis of both vocal fold vibration and lip motion are employed with interpolated Gaussian mixture model (GMM) estimation to perform musical genre classification of the singer. The developed approach for such classification application is called GMM-Formant. Since humming and voiced speech sounds cause periodic vibrations of the vocal folds and then the corresponding motion of the lip, the proposed GMM-Formant firstly operates to acquire the required formant information. Formant information is important acoustic feature data for recognition classification. The proposed GMM-Formant method then uses linear interpolation for combining GMM likelihood estimates and formant evaluation results appropriately. GMM-Formant will effectively adjust the estimated formant feature evaluation outcomes by referring to certain degree of the likelihood score derived from GMM calculations. The superiority and effectiveness of presented GMM-Formant are demonstrated by a series of experiments on musical genre classification of the singer

    A Method to Integrate GMM, SVM and DTW for Speaker Recognition

    Get PDF
    This paper develops an effective and efficient scheme to integrate Gaussian mixture model (GMM), support vector machine (SVM), and dynamic time wrapping (DTW) for automatic speaker recognition. GMM and SVM are two popular classifiers for speaker recognition applications. DTW is a fast and simple template matching method, and it is frequently seen in applications of speech recognition. In this work, DTW does not play a role to perform speech recognition, and it will be employed to be a verifier for verification of valid speakers. The proposed combination scheme of GMM, SVM and DTW, called SVMGMM-DTW, for speaker recognition in this study is a two-phase verification process task including GMM-SVM verification of the first phase and DTW verification of the second phase. By providing a double check to verify the identity of a speaker, it will be difficult for imposters to try to pass the security protection; therefore, the safety degree of speaker recognition systems will be largely increased. A series of experiments designed on door access control applications demonstrated that the superiority of the developed SVMGMM-DTW on speaker recognition accuracy

    Cost-effectiveness of human papillomavirus vaccination for prevention of cervical cancer in Taiwan

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Human papillomavirus (HPV) infection has been shown to be a major risk factor for cervical cancer. Vaccines against HPV-16 and HPV-18 are highly effective in preventing type-specific HPV infections and related cervical lesions. There is, however, limited data available describing the health and economic impacts of HPV vaccination in Taiwan. The objective of this study was to assess the cost-effectiveness of prophylactic HPV vaccination for the prevention of cervical cancer in Taiwan.</p> <p>Methods</p> <p>We developed a Markov model to compare the health and economic outcomes of vaccinating preadolescent girls (at the age of 12 years) for the prevention of cervical cancer with current practice, including cervical cytological screening. Data were synthesized from published papers or reports, and whenever possible, those specific to Taiwan were used. Sensitivity analyses were performed to account for important uncertainties and different vaccination scenarios.</p> <p>Results</p> <p>Under the assumption that the HPV vaccine could provide lifelong protection, the massive vaccination among preadolescent girls in Taiwan would lead to reduction in 73.3% of the total incident cervical cancer cases and would result in a life expectancy gain of 4.9 days or 8.7 quality-adjusted life days at a cost of US324ascomparedtothecurrentpractice.Theincrementalcost−effectivenessratio(ICER)wasUS324 as compared to the current practice. The incremental cost-effectiveness ratio (ICER) was US23,939 per life year gained or US13,674perquality−adjustedlifeyear(QALY)gainedgiventhediscountrateof313,674 per quality-adjusted life year (QALY) gained given the discount rate of 3%. Sensitivity analyses showed that this ICER would remain below US30,000 per QALY under most conditions, even when vaccine efficacy was suboptimal or when vaccine-induced immunity required booster shots every 13 years.</p> <p>Conclusions</p> <p>Although gains in life expectancy may be modest at the individual level, the results indicate that prophylactic HPV vaccination of preadolescent girls in Taiwan would result in substantial population benefits with a favorable cost-effectiveness ratio. Nevertheless, we should not overlook the urgency to improve the compliance rate of cervical screening, particularly for older individuals.</p

    CNN Deep Learning with Wavelet Image Fusion of CCD RGB-IR and Depth-Grayscale Sensor Data for Hand Gesture Intention Recognition

    No full text
    Pixel-based images captured by a charge-coupled device (CCD) with infrared (IR) LEDs around the image sensor are the well-known CCD Red&ndash;Green&ndash;Blue IR (the so-called CCD RGB-IR) data. The CCD RGB-IR data are generally acquired for video surveillance applications. Currently, CCD RGB-IR information has been further used to perform human gesture recognition on surveillance. Gesture recognition, including hand gesture intention recognition, is attracting great attention in the field of deep neural network (DNN) calculations. For further enhancing conventional CCD RGB-IR gesture recognition by DNN, this work proposes a deep learning framework for gesture recognition where a convolution neural network (CNN) incorporated with wavelet image fusion of CCD RGB-IR and additional depth-based depth-grayscale images (captured from depth sensors of the famous Microsoft Kinect device) is constructed for gesture intention recognition. In the proposed CNN with wavelet image fusion, a five-level discrete wavelet transformation (DWT) with three different wavelet decomposition merge strategies, namely, max-min, min-max and mean-mean, is employed; the visual geometry group (VGG)-16 CNN is used for deep learning and recognition of the wavelet fused gesture images. Experiments on the classifications of ten hand gesture intention actions (specified in a scenario of laboratory interactions) show that by additionally incorporating depth-grayscale data into CCD RGB-IR gesture recognition one will be able to further increase the averaged recognition accuracy to 83.88% for the VGG-16 CNN with min-max wavelet image fusion of the CCD RGB-IR and depth-grayscale data, which is obviously superior to the 75.33% of VGG-16 CNN with only CCD RGB-IR

    A Method to Integrate GMM, SVM and DTW for Speaker Recognition

    Get PDF
    This paper develops an effective and efficient scheme to integrate Gaussian mixture model (GMM), support vector machine (SVM), and dynamic time wrapping (DTW) for automatic speaker recognition. GMM and SVM are two popular classifiers for speaker recognition applications. DTW is a fast and simple template matching method, and it is frequently seen in applications of speech recognition. In this work, DTW does not play a role to perform speech recognition, and it will be employed to be a verifier for verification of valid speakers. The proposed combination scheme of GMM, SVM and DTW, called SVMGMM-DTW, for speaker recognition in this study is a two-phase verification process task including GMM-SVM verification of the first phase and DTW verification of the second phase. By providing a double check to verify the identity of a speaker, it will be difficult for imposters to try to pass the security protection; therefore, the safety degree of speaker recognition systems will be largely increased. A series of experiments designed on door access control applications demonstrated that the superiority of the developed SVMGMM-DTW on speaker recognition accuracy
    corecore