41,350 research outputs found
Vid2speech: Speech Reconstruction from Silent Video
Speechreading is a notoriously difficult task for humans to perform. In this
paper we present an end-to-end model based on a convolutional neural network
(CNN) for generating an intelligible acoustic speech signal from silent video
frames of a speaking person. The proposed CNN generates sound features for each
frame based on its neighboring frames. Waveforms are then synthesized from the
learned speech features to produce intelligible speech. We show that by
leveraging the automatic feature learning capabilities of a CNN, we can obtain
state-of-the-art word intelligibility on the GRID dataset, and show promising
results for learning out-of-vocabulary (OOV) words.Comment: Accepted for publication at ICASSP 201
Listening in/To Germany, Pale Mother
A newly restored version of Helma Sanders-Brahms’ 1980 film, Deutschland, bleiche Mutter (Germany, Pale Mother), was premiered in 2014 as a “Berlinale Classic”. This article reveals a complex composition of archival and (re)constructed sound that amplifies the film’s problematisation of the relationship between public history and private memory and the competing claims to authenticity and authority in telling the stories of the past
Archaeologies of Sound: Reconstructing Louis MacNeice’s Wartime Radio Publics
This article approaches the problem of reconstructing the culturally situated audience experience of radio programming through the example of Louis MacNeice's wartime radio broadcasts, notably "Alexander Nevsky" and "Christopher Columbus". The article draws on audience research reports, internal correspondence, and close analysis of the broadcasts themselves in order to triangulate a listening experience that, though it ultimately cannot be recovered, can be better understood through its proximate cultural traces
An overview of Old Tibetan synchronic phonology
Despite the importance of Old Tibetan in the Tibeto-Burman language family, little research has treated Old Tibetan synchronic phonology. This article gives a complete overview of the Old Tibetan phonemic system by associating sound values with the letters of the Tibetan alphabet and exploring the distribution of these sounds in syllable structure
Lip2AudSpec: Speech reconstruction from silent lip movements video
In this study, we propose a deep neural network for reconstructing
intelligible speech from silent lip movement videos. We use auditory
spectrogram as spectral representation of speech and its corresponding sound
generation method resulting in a more natural sounding reconstructed speech.
Our proposed network consists of an autoencoder to extract bottleneck features
from the auditory spectrogram which is then used as target to our main lip
reading network comprising of CNN, LSTM and fully connected layers. Our
experiments show that the autoencoder is able to reconstruct the original
auditory spectrogram with a 98% correlation and also improves the quality of
reconstructed speech from the main lip reading network. Our model, trained
jointly on different speakers is able to extract individual speaker
characteristics and gives promising results of reconstructing intelligible
speech with superior word recognition accuracy
Bantu lexical reconstruction
Lexical reconstruction has been an important enterprise in Bantu historical linguistics since the earliest days of the discipline. In this chapter a historical overview is provided of the principal scholarly contributions to that field of study. It is also explained how the Comparative Method has been and can be applied to reconstruct ancestral Bantu vocabulary via the intermediate step of phonological reconstruction and how the study of sound change needs to be completed with diachronic semantics in order to correctly reconstruct both the form and the meaning of etymons. Finally, some issues complicating this type of historical linguistic research, such as “osculance” due to prehistoric language contact, are addressed, as well as the relationship between reconstruction and classification
- …