Search CORE

46 research outputs found

A deep adaptation network for speech enhancement: combining a relativistic discriminator with multi-kernel maximum mean discrepancy

Author: Cheng Jiaming
Huang Chengwei
Liang Ruiyu
Liang Zhenlin
Schuller Björn
Zhao Li
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

Author: Li Guanbin
Li Ruiyu
Lin Liang
Qi Xiaojuan
Shen Xiaoyong
Xu Ruijia
Yang Jihan
Publication venue
Publication date: 18/12/2019
Field of study

We focus on Unsupervised Domain Adaptation (UDA) for the task of semantic segmentation. Recently, adversarial alignment has been widely adopted to match the marginal distribution of feature representations across two domains globally. However, this strategy fails in adapting the representations of the tail classes or small objects for semantic segmentation since the alignment objective is dominated by head categories or large objects. In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations. Specifically, we firstly perturb the intermediate feature maps with several attack objectives (i.e., discriminator and classifier) on each individual position for both domains, and then the classifier is trained to be invariant to the perturbations. By perturbing each position individually, our model treats each location evenly regardless of the category or object size and thus circumvents the aforementioned issue. Moreover, the domain gap in feature space is reduced by extrapolating source and target perturbed features towards each other with attack on the domain discriminator. Our approach achieves the state-of-the-art performance on two challenging domain adaptation tasks for semantic segmentation: GTA5 -> Cityscapes and SYNTHIA -> Cityscapes.Comment: To Appear in AAAI202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Cross-layer similarity knowledge distillation for speech enhancement

Author: Cheng Jiaming
Jia Jie
Liang Ruiyu
Peng Yiyuan
Schuller Björn
Xie Yue
Zhao Li
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2022
Field of study

Speech enhancement (SE) algorithms based on deep neural networks (DNNs) often encounter challenges of limited hardware resources or strict latency requirements when deployed in real-world scenarios. However, a strong enhancement effect typically requires a large DNN. In this paper, a knowledge distillation framework for SE is proposed to compress the DNN model. We study the strategy of cross-layer connection paths, which fuses multi-level information from the teacher and transfers it to the student. To adapt to the SE task, we propose a frame-level similarity distillation loss. We apply this method to the deep complex convolution recurrent network (DCCRN) and make targeted adjustments. Experimental results show that the proposed method considerably improves the enhancement effect of the compressed DNN and outperforms other distillation methods

OPUS Augsburg

Practical Speech Emotion Recognition Based on Online Learning: From Acted Data to Elicited Data

Author: Cheng Zha
Chengwei Huang
Ji Xi
Li Zhao
Qingyun Wang
Ruiyu Liang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

We study the cross-database speech emotion recognition based on online learning. How to apply a classifier trained on acted data to naturalistic data, such as elicited data, remains a major challenge in today’s speech emotion recognition system. We introduce three types of different data sources: first, a basic speech emotion dataset which is collected from acted speech by professional actors and actresses; second, a speaker-independent data set which contains a large number of speakers; third, an elicited speech data set collected from a cognitive task. Acoustic features are extracted from emotional utterances and evaluated by using maximal information coefficient (MIC). A baseline valence and arousal classifier is designed based on Gaussian mixture models. Online training module is implemented by using AdaBoost. While the offline recognizer is trained on the acted data, the online testing data includes the speaker-independent data and the elicited data. Experimental results show that by introducing the online learning module our speech emotion recognition system can be better adapted to new data, which is an important character in real world applications

Crossref

Directory of Open Access Journals

Detecting Depression from Speech through an Attentive LSTM Network

Author: Liang Ruiyu
Liu Chengyu
Xie Yue
Zhang Li
Zhao Li
Zhao Yan
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/11/2021
Field of study

Depression endangers people's health conditions and affects the social order as a mental disorder. As an efficient diagnosis of depression, automatic depression detection has attracted lots of researcher's interest. This study presents an attention-based Long Short-Term Memory (LSTM) model for depression detection to make full use of the difference between depression and non-depression between timeframes. The proposed model uses frame-level features, which capture the temporal information of depressive speech, to replace traditional statistical features as an input of the LSTM layers. To achieve more multi-dimensional deep feature representations, the LSTM output is then passed on attention layers on both time and feature dimensions. Then, we concat the output of the attention layers and put the fused feature representation into the fully connected layer. At last, the fully connected layer's output is passed on to softmax layer. Experiments conducted on the DAIC-WOZ database demonstrate that the proposed attentive LSTM model achieves an average accuracy rate of 90.2% and outperforms the traditional LSTM network and LSTM with local attention by 0.7% and 2.3%, respectively, which indicates its feasibility

Northumbria University Research Portal

Partial trisomy 2q33.3-q37.3 in a patient with an inverted duplicated neocentric marker chromosome

Author: Baoheng Gui
Chen Chen
Desheng Liang
Guizhi Tang
Jiazhen Chang
Lingqian Wu
Ruiyu Ma
Ruolan Guo
Yan Xia
Yanghui Zhang
Yanru Huang
Ying Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref

Speech emotion classification using atention-based LSTM

Author: Huang Chengwei
Liang Ruiyu
Liang Zhenlin
Schuller Björn
Xie Yue
Zou Cairong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

OPUS Augsburg

Speech Emotion Recognition Based on Sparse Transfer Learning Method

Author: Peng SONG
Ruiyu LIANG
Wenming ZHENG
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2015
Field of study

Crossref

Attention-Based Dense LSTM for Speech Emotion Recognition

Author: Li ZHAO
Ruiyu LIANG
Yue XIE
Zhenlin LIANG
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date
Field of study

Crossref

A Novel Bimodal Emotion Database from Physiological Signals and Facial Expression

Author: Bei WANG
Jingjie YAN
Ruiyu LIANG
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date
Field of study

Crossref