45,985 research outputs found
Deep factorization for speech signal
Various informative factors mixed in speech signals, leading to great
difficulty when decoding any of the factors. An intuitive idea is to factorize
each speech frame into individual informative factors, though it turns out to
be highly difficult. Recently, we found that speaker traits, which were assumed
to be long-term distributional properties, are actually short-time patterns,
and can be learned by a carefully designed deep neural network (DNN). This
discovery motivated a cascade deep factorization (CDF) framework that will be
presented in this paper. The proposed framework infers speech factors in a
sequential way, where factors previously inferred are used as conditional
variables when inferring other factors. We will show that this approach can
effectively factorize speech signals, and using these factors, the original
speech spectrum can be recovered with a high accuracy. This factorization and
reconstruction approach provides potential values for many speech processing
tasks, e.g., speaker recognition and emotion recognition, as will be
demonstrated in the paper.Comment: Accepted by ICASSP 2018. arXiv admin note: substantial text overlap
with arXiv:1706.0177
Recommended from our members
Breathing Signature as Vitality Score Index Created by Exercises of Qigong: Implications of Artificial Intelligence Tools Used in Traditional Chinese Medicine.
Rising concerns about the short- and long-term detrimental consequences of administration of conventional pharmacopeia are fueling the search for alternative, complementary, personalized, and comprehensive approaches to human healthcare. Qigong, a form of Traditional Chinese Medicine, represents a viable alternative approach. Here, we started with the practical, philosophical, and psychological background of Ki (in Japanese) or Qi (in Chinese) and their relationship to Qigong theory and clinical application. Noting the drawbacks of the current state of Qigong clinic, herein we propose that to manage the unique aspects of the Eastern 'non-linearity' and 'holistic' approach, it needs to be integrated with the Western "linearity" "one-direction" approach. This is done through developing the concepts of "Qigong breathing signatures," which can define our life breathing patterns associated with diseases using machine learning technology. We predict that this can be achieved by establishing an artificial intelligence (AI)-Medicine training camp of databases, which will integrate Qigong-like breathing patterns with different pathologies unique to individuals. Such an integrated connection will allow the AI-Medicine algorithm to identify breathing patterns and guide medical intervention. This unique view of potentially connecting Eastern Medicine and Western Technology can further add a novel insight to our current understanding of both Western and Eastern medicine, thereby establishing a vitality score index (VSI) that can predict the outcomes of lifestyle behaviors and medical conditions
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
Every culture and language is unique. Our work expressly focuses on the
uniqueness of culture and language in relation to human affect, specifically
sentiment and emotion semantics, and how they manifest in social multimedia. We
develop sets of sentiment- and emotion-polarized visual concepts by adapting
semantic structures called adjective-noun pairs, originally introduced by Borth
et al. (2013), but in a multilingual context. We propose a new
language-dependent method for automatic discovery of these adjective-noun
constructs. We show how this pipeline can be applied on a social multimedia
platform for the creation of a large-scale multilingual visual sentiment
concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our
unified ontology is organized hierarchically by multilingual clusters of
visually detectable nouns and subclusters of emotionally biased versions of
these nouns. In addition, we present an image-based prediction task to show how
generalizable language-specific models are in a multilingual context. A new,
publicly available dataset of >15.6K sentiment-biased visual concepts across 12
languages with language-specific detector banks, >7.36M images and their
metadata is also released.Comment: 11 pages, to appear at ACM MM'1
- …