24 research outputs found
A Measure of Smoothness in Synthesized Speech
The articulators typically move smoothly during speech production. Therefore, speech features of natural speech are generally smooth. However, over-smoothness causes "muffleness" and, hence, reduction in ability to identify emotions/expressions/styles in synthesized speech that can affect the perception of naturalness in synthesized speech. In the literature, statistical variances of static spectral features have been used as a measure of smoothness in synthesized speech but they are not sufficient enough. This paper proposes another measure of smoothness that can be efficiently applied to evaluate the smoothness of synthesized speech. Experiments showed that the proposed measure is reliable and efficient to measure the smoothness of different kinds of synthesized speech
Classification of cow’s behaviors based on 3-DoF accelerations from cow’s movements
Cow’s behavior classification helps people to monitor cow activities, thus the health and physiological periods of cows can be well tracked. To classify the behavior of cows, the data from the 3-axis acceleration sensor mounted on their neck is often used. Data acquisition and preprocessing of sensor data is required in this device. We acquire data from the 3-axis acceleration sensor mounted on the cows’neck and send to the microcontrollter. At the microcontroller, a proposed decision tree is applied in real-time manner to classify four important activities of the cows (standing, lying, feeding, and walking). Finally, the results can be sent to the server through the wireless transmission module. The test results confirm the reliability of the proposed device
Prevalence, antimicrobial resistance and genomic comparison of non-typhoidal salmonella isolated from pig farms with different levels of intensification in Yangon Region, Myanmar
In Myanmar, where backyard, semi-intensive, and intensive pig (Sus scrofa domesticus) farming coexist, there is limited understanding of the zoonotic risks and antimicrobial resistance (AMR) associated with these farming practices. This study was conducted to investigate the prevalence, AMR and genomic features of Salmonella in pig farms in the Yangon region and the impact of farm intensification to provide evidence to support risk-based future management approaches. Twenty-three farms with different production scales were sampled for two periods with three sampling-visit each. Antimicrobial susceptibility tests and whole-genome sequencing were performed on the isolates. The prevalence of Salmonella was 44.5% in samples collected from backyard farms, followed by intensive (39.5%) and semi-intensive farms (19.5%). The prevalence of multi-drug resistant isolates from intensive farms (45/84, 53.6%) was higher than those from backyard (32/171, 18.7%) and semi-intensive farms (25/161, 15.5%). Among 28 different serovars identified, S. Weltevreden (40; 14.5%), S. Kentucky (38; 13.8%), S. Stanley (35, 12.7%), S. Typhimurium (22; 8.0%) and S. Brancaster (20; 7.3%) were the most prevalent serovars and accounted for 56.3% of the genome sequenced strains. The diversity of Salmonella serovars was highest in semi-intensive and backyard farms (21 and 19 different serovars, respectively). The high prevalence of globally emerging S. Kentucky ST198 was detected on backyard farms. The invasive-infection linked typhoid-toxin gene (cdtB) was found in the backyard farm isolated S. Typhimurium, relatively enriched in virulence and AMR genes, presented an important target for future surveillance. While intensification, in terms of semi-intensive versus backyard production, maybe a mitigator for zoonotic risk through a lower prevalence of Salmonella, intensive production appears to enhance AMR-associated risks. Therefore, it remains crucial to closely monitor the AMR and virulence potential of this pathogen at all scales of production. The results underscored the complex relationship between intensification of animal production and the prevalence, diversity and AMR of Salmonella from pig farms in Myanmar
Urinary catecholamine excretion, cardiovascular variability, and outcomes in tetanus
Severe tetanus is characterized by muscle spasm and cardiovascular system disturbance. The pathophysiology of muscle spasm is relatively well understood and involves inhibition of central inhibitory synapses by tetanus toxin. That of cardiovascular disturbance is less clear, but is believed to relate to disinhibition of the autonomic nervous system. The clinical syndrome of autonomic nervous system dysfunction (ANSD) seen in severe tetanus is characterized principally by changes in heart rate and blood pressure which have been linked to increased circulating catecholamines. Previous studies have described varying relationships between catecholamines and signs of ANSD in tetanus, but are limited by confounders and assays used. In this study, we aimed to perform detailed characterization of the relationship between catecholamines (adrenaline and noradrenaline), cardiovascular parameters (heart rate and blood pressure) and clinical outcomes (ANSD, mechanical ventilation required, and length of intensive care unit stay) in adults with tetanus, as well as examine whether intrathecal antitoxin administration affected subsequent catecholamine excretion. Noradrenaline and adrenaline were measured by ELISA from 24-h urine collections taken on day 5 of hospitalization in 272 patients enrolled in a 2 × 2 factorial-blinded randomized controlled trial in a Vietnamese hospital. Catecholamine results measured from 263 patients were available for analysis. After adjustment for potential confounders (i.e., age, sex, intervention treatment, and medications), there were indications of non-linear relationships between urinary catecholamines and heart rate. Adrenaline and noradrenaline were associated with subsequent development of ANSD, and length of ICU stay
Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion
Abstract Speech synthesis has been applied in many kinds of practical applications. Currently, state-of-the-art speech synthesis uses statistical methods based on hidden Markov model (HMM). Speech synthesized by statistical methods can be considered over-smooth caused by the averaging in statistical processing. In the literature, there have been many studies attempting to solve over-smoothness in speech synthesized by an HMM. However, they are still limited. In this paper, a hybrid synthesis between HMM and exemplar-based voice conversion has been proposed. The experimental results show that the proposed method outperforms state-of-the-art HMM synthesis using global variance
A study on restoration of bone-conducted speech in noisy environments with LP-based model and Gaussian mixture model
The restoration of bone-conducted speech is a very important issue that enables robust speech communication in extremely noisy environments. We proposed a method of blind restoration in our previous studies based on a scheme of linear prediction with a method of training and prediction based on the simple recurrent neural network. However, prediction based on neural networks is not suitable for training with large corpora, which is necessary for real applications. The over-training problem with simple recurrent neural networks makes it difficult to train various kinds of bone-conducted speech in one session. In addition, it is difficult to adapt the neural network model to bone-conducted speech in unknown noisy environments to build an open dataset restoration of bone-conducted speech. Thus, a method of training and prediction based on the Gaussian mixture model was used in this research, instead of a neural network. A method of re-estimating the residual ratio in the scheme of linear prediction is also proposed. We also investigated how the proposed method works to restore bone-conducted speech in extremely noisy environments. Objective and subjective evaluations were carried out to evaluate the improvements in sound quality and the intelligibility of restored speech. The results revealed that our proposed method outperformed previous methods in both human hearing and automatic speech recognition systems even in extremely noisy environments
Transformation of F0 contours for lexical tones in concatenative speech synthesis of tonal languages
Concatenative speech synthesis (CSS) provides the greatest naturalness. However, it requires a huge stored database resulting a huge footprint. Reducing the capacity of stored database while preserving the quality of CSS, or improving the quality to size ratio (QSr), is still a challenge. In this paper, we propose a method of transforming fundamental frequency (F0) contours of lexical tones, developed from TD-GMM framework that successfully applied for transforming spectral sequence in previous researches, in order to improve the QSr of CSS of tonal languages that results CSS available with limited data at offline stage, storing small online footprint, while preserving perceptual quality. The experimental results show that the proposed F0 transformation outperforms conventional and state-of-the-art F0 contour transformations for transforming lexical tones in terms of speech quality. When applying the proposed F0 contour transformation for transforming lexical tones in CSS of tonal languages, the QSr is enhanced compared with the method of simple F0 exchange while the quality of synthetic speech is preserved
A concatenative speech synthesis for monosyllabic languages with limited data
Quality of unit-based concatenative speech synthesis is low while that of corpus-based concatenative speech synthesis with unit selection is great natural. However, unit selection requires a huge data for concatenation that reduces the range of its applications. In this paper, by using temporal decomposition for modeling contextual effects intra-syllable and inter-syllables, we propose a context-fitting unit modification method and a context-matching unit selection method. The two proposed context-specific methods are used in our proposed syllable-based concatenative speech synthesis applied for monosyllabic languages. The experimental results with a Vietnamese speech synthesis using a small corpus support that the proposed methods are efficient. As a consequence, the naturalness and intelligibility of the proposed speech synthesis is high even when we have only limited data for concatenation
A Hybrid TTS between Unit Selection and HMM-based TTS under limited data conditions
The intelligibility of HMM-based TTS can reach that of the original speech. However, HMM-based TTS is far from natural. On the contrary, unit selection TTS is the most-natural sounding TTS currently. However, its intelligibility and naturalness on segmental duration and timing are not stable. Additionally, unit selection needs to store a huge amount of data for concatenation. Recently, hybrid approaches between these two TTS, i.e. the HMM trajectory tiling TTS (HTT), have been studied to take advantages of both unit selection and HMM-based TTS. However, such methods still require a huge amount of data for rendering. In this paper, a hybrid TTS among unit selection, HMM-based TTS, and the Modified Restricted Temporal Decomposition (MRTD), named HTD, is proposed motivating to take advantages of both unit selection and HMM-based TTS under limited data conditions. Here, TD is a sparse representation of speech that decomposes a spectral or prosodic sequence into two mutually independent components: static event targets and correspondent dynamic event functions, and MRTD is a compact but efficient version of TD. Previous studies show that the dynamic event functions of MRTD are related to the perception of speech intelligibility, one core linguistic or content information, while the static event targets of MRTD convey non-linguistic or style information. Therefore, by borrowing the concepts of unit selection to render the event targets of the spectral sequence, and directly borrowing the prosodic sequences and the dynamic event functions of the spectral sequence generated by HMM-based TTS, the naturalness and the intelligibility of the proposed HTD can reach the naturalness of unit selection, and the intelligibility of HMM-based TTS, respectively. Due to the smoothness of event functions of MRTD, an appropriate smoothness in synthesized speech can still be ensured when being rendering by a small amount of data, resulting in the usability of the proposed HTD under limited data conditions. The experimental results with a small Vietnamese dataset, simulated to be a “limited data condition”, show that the proposed HTD outperformed all HMM-based TTS, unit selection, HTT under a limited data condition