156 research outputs found

    Analysing the importance of different visual feature coefficients

    Get PDF
    A study is presented to determine the relative importance of different visual features for speech recognition which includes pixel-based, model-based, contour-based and physical features. Analysis to determine the discriminability of features is per- formed through F-ratio and J-measures for both static and tem- poral derivatives, the results of which were found to correlate highly with speech recognition accuracy (r = 0.97). Princi- pal component analysis is then used to combine all visual fea- tures into a single feature vector, of which further analysis is performed on the resulting basis functions. An optimal feature vector is obtained which outperforms the best individual feature (AAM) with 93.5 % word accuracy

    Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitation

    Get PDF
    A study is presented on how well objective measures of speech quality and intelligibility can predict the subjective in- telligibility of speech that has undergone spectral envelope smoothing and simplification of its excitation. Speech modi- fications are made by resynthesising speech that has been spec- trally smoothed. Objective measures are applied to the mod- ified speech and include measures of speech quality, signal- to-noise ratio and intelligibility, as well as proposing the nor- malised frequency-weighted spectral distortion (NFD) measure. The measures are compared to subjective intelligibility scores where it is found that several have high correlation (|r| ≥ 0.7), with NFD achieving the highest correlation (r = −0.81

    Country Trade Costs, Comparative Advantage and the Pattern of Trade: Multi-Country and Product Panel Evidence

    Get PDF
    This paper investigates whether differences across countries in overall country-specific trade costs affect comparative advantage. It does so by examining whether the commodity composition of countries’ trade is driven by differences in countries’ trade costs, as well as by differences in traditional factor endowments. Industry export shares across up to 71 countries and 158 manufacturing industries for five year periods over the period 1972 to 1992 are shown to be greater in trade cost sensitive industries for countries with relatively low national trade costs. This is after controlling for factor-intensity differences across industries and for endowment differences (physical and human capital) between countries. Further, these relationships are more evident in exporting to global markets than to local or regional markets.Trade costs, comparative advantage

    A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation

    Get PDF
    This work proposes and compares perceptually motivated loss functions for deep learning based binary mask estimation for speech separation. Previous loss functions have focused on maximising classification accuracy of mask estimation but we now propose loss functions that aim to maximise the hit mi- nus false-alarm (HIT-FA) rate which is known to correlate more closely to speech intelligibility. The baseline loss function is bi- nary cross-entropy (CE), a standard loss function used in binary mask estimation, which maximises classification accuracy. We propose first a loss function that maximises the HIT-FA rate in- stead of classification accuracy. We then propose a second loss function that is a hybrid between CE and HIT-FA, providing a balance between classification accuracy and HIT-FA rate. Eval- uations of the perceptually motivated loss functions with the GRID database show improvements to HIT-FA rate and ESTOI across babble and factory noises. Further tests then explore ap- plication of the perceptually motivated loss functions to a larger vocabulary dataset

    The Effect of Real-Time Constraints on Automatic Speech Animation

    Get PDF
    Machine learning has previously been applied successfully to speech-driven facial animation. To account for carry-over and anticipatory coarticulation a common approach is to predict the facial pose using a symmetric window of acoustic speech that includes both past and future context. Using future context limits this approach for animating the faces of characters in real-time and networked applications, such as online gaming. An acceptable latency for conversational speech is 200ms and typically network transmission times will consume a significant part of this. Consequently, we consider asymmetric windows by investigating the extent to which decreasing the future context effects the quality of predicted animation using both deep neural networks (DNNs) and bi-directional LSTM recurrent neural networks (BiLSTMs). Specifically we investigate future contexts from 170ms (fully-symmetric) to 0ms (fullyasymmetric

    Surgical pathology in sub-Saharan Africa—volunteering in Malawi

    Get PDF
    The breadth of material found in surgical pathology services in African countries differs from the common spectrum of "the West”. We report our experience of a voluntary work in the pathology departments of Blantyre and Lilongwe, Malawi. During a 6-week period, 405 cases (378 histology and 27 cytology cases) were processed. The vast majority showed significant pathological findings (n = 369; 91.1%): 175 cases (47.4%) were non-tumoral conditions with predominance of inflammatory lesions, e.g., schistosomiasis (n = 11) and tuberculosis (n = 11). There were 39 (10.6%) benign tumors or tumor-like lesions. Intraepithelial neoplasia of the cervix uteri dominated among premalignant conditions (n = 15; 4.1%). The large group of malignancies (n = 140; 37.9%) comprised 11 pediatric tumors (e.g., rhabdomyosarcoma, small blue round cell tumors) and 129 adult tumors. Among women (n = 76), squamous cell carcinomas (SCCs) of the cervix uteri predominated (n = 25; 32.9%), followed by breast carcinomas (n = 12; 15.8%) and esophageal SCC (n = 9; 11.8%). Males (n = 53) most often showed SCC of the esophagus (n = 9; 17.0%) and of the urinary bladder (n = 7; 13.2%). Lymphomas (n = 7) and Kaposi's sarcomas (n = 6) were less frequent. Differences compared to the western world include the character of the conditions in general, the spectrum of inflammatory lesions, and the young age of adult tumor patients (median 45years; range 18-87years). Providing pathology service in a low-resource country may be handicapped by lack of personnel, inadequate material resources, or insufficient infrastructure. Rotating volunteers offer a bridge for capacity building of both personnel and the local medical service; in addition, the volunteer's horizons are broadened professionally and personall

    Discordance between clinical and immunological ART eligibility criteria for children in Malawi

    Get PDF
    Background: Since May 2014, all HIV positive children aged less than five years in Malawi are eligible for ART. For children older than five years they are eligible if they are in WHO stage III/IV, if stage I/II, if their CD4 750. Conclusion: Most children are correctly started on treatment using recent guidelines. 41% more children <5 years will be started on ART

    Speaker-independent speech animation using perceptual loss functions and synthetic data

    Get PDF
    We propose a real-time speaker-independent speech- to-facial animation system that predicts lip and jaw movements on a reference face for audio speech taken from any speaker. Our approach is motivated by two key observations; 1) Speaker- independent facial animation can be generated from phoneme labels, but to perform this automatically a speech recogniser is needed which, due to contextual look-ahead, introduces too much time lag. 2) Audio-driven speech animation can be performed in real-time but requires large, multi-speaker audio-visual speech datasets of which there are few. We adopt a novel three- stage training procedure that leverages the advantages of each approach. First we train a phoneme-to-visual speech model from a large single-speaker audio-visual dataset. Next, we use this model to generate the synthetic visual component of a large multi-speaker audio dataset of which the video is not available. Finally, we learn an audio-to-visual speech mapping using the synthetic visual features as the target. Furthermore, we increase the realism of the predicted facial animation by introducing two perceptually-based loss functions that aim to improve mouth closures and openings. The proposed method and loss functions are evaluated objectively using mean square error, global variance and a new metric that measures the extent of mouth opening. Subjective tests show that our approach produces facial animation comparable to those produced from phoneme sequences and that improved mouth closures, particularly for bilabial closures, are achieved
    corecore