Search CORE

45,985 research outputs found

Deep factorization for speech signal

Author: Chen Yixiang
Li Lantian
Shi Ying
Tang Zhiyuan
Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 27/02/2018
Field of study

Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors. An intuitive idea is to factorize each speech frame into individual informative factors, though it turns out to be highly difficult. Recently, we found that speaker traits, which were assumed to be long-term distributional properties, are actually short-time patterns, and can be learned by a carefully designed deep neural network (DNN). This discovery motivated a cascade deep factorization (CDF) framework that will be presented in this paper. The proposed framework infers speech factors in a sequential way, where factors previously inferred are used as conditional variables when inferring other factors. We will show that this approach can effectively factorize speech signals, and using these factors, the original speech spectrum can be recovered with a high accuracy. This factorization and reconstruction approach provides potential values for many speech processing tasks, e.g., speaker recognition and emotion recognition, as will be demonstrated in the paper.Comment: Accepted by ICASSP 2018. arXiv admin note: substantial text overlap with arXiv:1706.0177

arXiv.org e-Print Archive

Crossref

Recommended from our members

Breathing Signature as Vitality Score Index Created by Exercises of Qigong: Implications of Artificial Intelligence Tools Used in Traditional Chinese Medicine.

Author: Dethlefs Brent A
Lee Katherine L
Li Shengwen Calvin
Loudon William G
Luo Jane
Su Qingning
Zhang Junjie
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

Rising concerns about the short- and long-term detrimental consequences of administration of conventional pharmacopeia are fueling the search for alternative, complementary, personalized, and comprehensive approaches to human healthcare. Qigong, a form of Traditional Chinese Medicine, represents a viable alternative approach. Here, we started with the practical, philosophical, and psychological background of Ki (in Japanese) or Qi (in Chinese) and their relationship to Qigong theory and clinical application. Noting the drawbacks of the current state of Qigong clinic, herein we propose that to manage the unique aspects of the Eastern 'non-linearity' and 'holistic' approach, it needs to be integrated with the Western "linearity" "one-direction" approach. This is done through developing the concepts of "Qigong breathing signatures," which can define our life breathing patterns associated with diseases using machine learning technology. We predict that this can be achieved by establishing an artificial intelligence (AI)-Medicine training camp of databases, which will integrate Qigong-like breathing patterns with different pathologies unique to individuals. Such an integrated connection will allow the AI-Medicine algorithm to identify breathing patterns and guide medical intervention. This unique view of potentially connecting Eastern Medicine and Western Technology can further add a novel insight to our current understanding of both Western and Eastern medicine, thereby establishing a vitality score index (VSI) that can predict the outcomes of lifestyle behaviors and medical conditions

eScholarship - University of California

Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology

Author: Balahur A.
Bautin M.
Dryer M. S.
Esuli A.
Güngördü Z.
Krizhevsky A.
Lang P.
Lee J. H.
McCarthy E. D.
Mesquita B.
Mihalcea R.
Mikolov T.
Plutchik R.
Schmid H.
Vessel E. A.
You Q.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/08/2015
Field of study

Every culture and language is unique. Our work expressly focuses on the uniqueness of culture and language in relation to human affect, specifically sentiment and emotion semantics, and how they manifest in social multimedia. We develop sets of sentiment- and emotion-polarized visual concepts by adapting semantic structures called adjective-noun pairs, originally introduced by Borth et al. (2013), but in a multilingual context. We propose a new language-dependent method for automatic discovery of these adjective-noun constructs. We show how this pipeline can be applied on a social multimedia platform for the creation of a large-scale multilingual visual sentiment concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our unified ontology is organized hierarchically by multilingual clusters of visually detectable nouns and subclusters of emotionally biased versions of these nouns. In addition, we present an image-based prediction task to show how generalizable language-specific models are in a multilingual context. A new, publicly available dataset of >15.6K sentiment-biased visual concepts across 12 languages with language-specific detector banks, >7.36M images and their metadata is also released.Comment: 11 pages, to appear at ACM MM'1

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref