72 research outputs found

    Video-driven speech reconstruction using generative adversarial networks

    Get PDF
    Speech is a means of communication which relies on both audio and visual information. The absence of one modality can often lead to confusion or misinterpretation of information. In this paper we present an end-to-end temporal model capable of directly synthesising audio from silent video, without needing to transform to-and-from intermediate features. Our proposed approach, based on GANs is capable of producing natural sounding, intelligible speech which is synchronised with the video. The performance of our model is evaluated on the GRID dataset for both speaker dependent and speaker independent scenarios. To the best of our knowledge this is the first method that maps video directly to raw audio and the first to produce intelligible speech when tested on previously unseen speakers. We evaluate the synthesised audio not only based on the sound quality but also on the accuracy of the spoken words

    Realistic speech-driven facial animation with GANs

    Get PDF
    Speech-driven facial animation is the process that automatically synthesizes talking characters based on speech signals. The majority of work in this domain creates a mapping from audio features to visual features. This approach often requires post-processing using computer graphics techniques to produce realistic albeit subject dependent results. We present an end-to-end system that generates videos of a talking head, using only a still image of a person and an audio clip containing speech, without relying on handcrafted intermediate features. Our method generates videos which have (a) lip movements that are in sync with the audio and (b) natural facial expressions such as blinks and eyebrow movements. Our temporal GAN uses 3 discriminators focused on achieving detailed frames, audio-visual synchronization, and realistic expressions. We quantify the contribution of each component in our model using an ablation study and we provide insights into the latent representation of the model. The generated videos are evaluated based on sharpness, reconstruction quality, lip-reading accuracy, synchronization as well as their ability to generate natural blinks

    Speech-driven facial animations improve speech-in-noise comprehension of humans

    Get PDF
    Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer (AVSR) benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments

    Application of strut-and-tie models for assessing RC half-joints not complying with current code specifications

    Get PDF
    The work described is concerned with an investigation of the effectiveness of the use of strut-and-tie models for the structural assessment of half joints. Such elements form a part of many existing bridges which, although not complying with current code specifications, have not as yet displayed any significant signs of distress in spite of the increase in traffic volume and loads over the years. The work is based on a comparative study of the predicted and experimentally established values of load-carrying capacity and location and causes of failure of half-jointed beams with reinforcement layouts that replicate those found in structures designed in accordance with previous code specifications. The results obtained show significant shortcomings of the assessment method as this is found not only to underestimate load-carrying capacity by a margin ranging between 40 % and 65 %, but also to often fail to identify the location and causes of failure. Therefore, there is a need for an alternative assessment method that will be based on concepts capable of both providing a realistic description of structural-concrete behaviour and identifying the causes of failure leading to the loss of load-carrying capacity

    Mucopolysaccharidosis VI

    Get PDF
    Mucopolysaccharidosis VI (MPS VI) is a lysosomal storage disease with progressive multisystem involvement, associated with a deficiency of arylsulfatase B leading to the accumulation of dermatan sulfate. Birth prevalence is between 1 in 43,261 and 1 in 1,505,160 live births. The disorder shows a wide spectrum of symptoms from slowly to rapidly progressing forms. The characteristic skeletal dysplasia includes short stature, dysostosis multiplex and degenerative joint disease. Rapidly progressing forms may have onset from birth, elevated urinary glycosaminoglycans (generally >100 μg/mg creatinine), severe dysostosis multiplex, short stature, and death before the 2nd or 3rd decades. A more slowly progressing form has been described as having later onset, mildly elevated glycosaminoglycans (generally <100 μg/mg creatinine), mild dysostosis multiplex, with death in the 4th or 5th decades. Other clinical findings may include cardiac valve disease, reduced pulmonary function, hepatosplenomegaly, sinusitis, otitis media, hearing loss, sleep apnea, corneal clouding, carpal tunnel disease, and inguinal or umbilical hernia. Although intellectual deficit is generally absent in MPS VI, central nervous system findings may include cervical cord compression caused by cervical spinal instability, meningeal thickening and/or bony stenosis, communicating hydrocephalus, optic nerve atrophy and blindness. The disorder is transmitted in an autosomal recessive manner and is caused by mutations in the ARSB gene, located in chromosome 5 (5q13-5q14). Over 130 ARSB mutations have been reported, causing absent or reduced arylsulfatase B (N-acetylgalactosamine 4-sulfatase) activity and interrupted dermatan sulfate and chondroitin sulfate degradation. Diagnosis generally requires evidence of clinical phenotype, arylsulfatase B enzyme activity <10% of the lower limit of normal in cultured fibroblasts or isolated leukocytes, and demonstration of a normal activity of a different sulfatase enzyme (to exclude multiple sulfatase deficiency). The finding of elevated urinary dermatan sulfate with the absence of heparan sulfate is supportive. In addition to multiple sulfatase deficiency, the differential diagnosis should also include other forms of MPS (MPS I, II IVA, VII), sialidosis and mucolipidosis. Before enzyme replacement therapy (ERT) with galsulfase (Naglazyme®), clinical management was limited to supportive care and hematopoietic stem cell transplantation. Galsulfase is now widely available and is a specific therapy providing improved endurance with an acceptable safety profile. Prognosis is variable depending on the age of onset, rate of disease progression, age at initiation of ERT and on the quality of the medical care provided

    End-to-End Speech-Driven Facial Animation with Temporal GANs

    No full text
    Speech-driven facial animation is the process which uses speech signals to automatically synthesize a talking character. The majority of work in this domain creates a mapping from audio features to visual features. This often requires post-processing using computer graphics techniques to produce realistic albeit subject dependent results. We present a system for generating videos of a talking head, using a still image of a person and an audio clip containing speech, that doesn't rely on any handcrafted intermediate features. To the best of our knowledge, this is the first method capable of generating subject independent realistic videos directly from raw audio. Our method can generate videos which have (a) lip movements that are in sync with the audio and (b) natural facial expressions such as blinks and eyebrow movements. We achieve this by using a temporal GAN with 2 discriminators, which are capable of capturing different aspects of the video. The effect of each component in our system is quantified through an ablation study. The generated videos are evaluated based on their sharpness, reconstruction quality, and lip-reading accuracy. Finally, a user study is conducted, confirming that temporal GANs lead to more natural sequences than a static GAN-based approach

    Application of strut-and-tie models for assessing RC half-joints not complying with current code specifications

    No full text
    The work described is concerned with an investigation of the effectiveness of the use of strut-and-tie models for the structural assessment of half joints. Such elements form a part of many existing bridges which, although not complying with current code specifications, have not as yet displayed any significant signs of distress in spite of the increase in traffic volume and loads over the years. The work is based on a comparative study of the predicted and experimentally established values of load-carrying capacity and location and causes of failure of half-jointed beams with reinforcement layouts that replicate those found in structures designed in accordance with previous code specifications. The results obtained show significant shortcomings of the assessment method as this is found not only to underestimate load-carrying capacity by a margin ranging between 40 % and 65 %, but also to often fail to identify the location and causes of failure. Therefore, there is a need for an alternative assessment method that will be based on concepts capable of both providing a realistic description of structural-concrete behaviour and identifying the causes of failure leading to the loss of load-carrying capacity
    corecore