92,658 research outputs found
AudioViewer: Learning to Visualize Sounds
A long-standing goal in the field of sensory substitution is to enable sound
perception for deaf and hard of hearing (DHH) people by visualizing audio
content. Different from existing models that translate to hand sign language,
between speech and text, or text and images, we target immediate and low-level
audio to video translation that applies to generic environment sounds as well
as human speech. Since such a substitution is artificial, without labels for
supervised learning, our core contribution is to build a mapping from audio to
video that learns from unpaired examples via high-level constraints. For
speech, we additionally disentangle content from style, such as gender and
dialect. Qualitative and quantitative results, including a human study,
demonstrate that our unpaired translation approach maintains important audio
features in the generated video and that videos of faces and numbers are well
suited for visualizing high-dimensional audio features that can be parsed by
humans to match and distinguish between sounds and words. Code and models are
available at https://chunjinsong.github.io/audioviewe
AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis
Affect is an emotional characteristic encompassing valence, arousal, and
intensity, and is a crucial attribute for enabling authentic conversations.
While existing text-to-speech (TTS) and speech-to-speech systems rely on
strength embedding vectors and global style tokens to capture emotions, these
models represent emotions as a component of style or represent them in discrete
categories. We propose AffectEcho, an emotion translation model, that uses a
Vector Quantized codebook to model emotions within a quantized space featuring
five levels of affect intensity to capture complex nuances and subtle
differences in the same emotion. The quantized emotional embeddings are
implicitly derived from spoken speech samples, eliminating the need for one-hot
vectors or explicit strength embeddings. Experimental results demonstrate the
effectiveness of our approach in controlling the emotions of generated speech
while preserving identity, style, and emotional cadence unique to each speaker.
We showcase the language-independent emotion modeling capability of the
quantized emotional embeddings learned from a bilingual (English and Chinese)
speech corpus with an emotion transfer task from a reference speech to a target
speech. We achieve state-of-art results on both qualitative and quantitative
metrics
Data Innovation for International Development: An overview of natural language processing for qualitative data analysis
Availability, collection and access to quantitative data, as well as its
limitations, often make qualitative data the resource upon which development
programs heavily rely. Both traditional interview data and social media
analysis can provide rich contextual information and are essential for
research, appraisal, monitoring and evaluation. These data may be difficult to
process and analyze both systematically and at scale. This, in turn, limits the
ability of timely data driven decision-making which is essential in fast
evolving complex social systems. In this paper, we discuss the potential of
using natural language processing to systematize analysis of qualitative data,
and to inform quick decision-making in the development context. We illustrate
this with interview data generated in a format of micro-narratives for the UNDP
Fragments of Impact project
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
- …