Search CORE

160 research outputs found

Graph Few-shot Learning via Knowledge Transfer

Author: Chawla Nitesh V.
Huang Junzhou
Jiang Meng
Li Zhenhui
Wang Suhang
Wei Ying
Yao Huaxiu
Zhang Chuxu
Publication venue
Publication date: 03/04/2020
Field of study

Towards the challenging problem of semi-supervised node classification, there have been extensive studies. As a frontier, Graph Neural Networks (GNNs) have aroused great interest recently, which update the representation of each node by aggregating information of its neighbors. However, most GNNs have shallow layers with a limited receptive field and may not achieve satisfactory performance especially when the number of labeled nodes is quite small. To address this challenge, we innovatively propose a graph few-shot learning (GFL) algorithm that incorporates prior knowledge learned from auxiliary graphs to improve classification accuracy on the target graph. Specifically, a transferable metric space characterized by a node embedding and a graph-specific prototype embedding function is shared between auxiliary graphs and the target, facilitating the transfer of structural knowledge. Extensive experiments and ablation studies on four real-world graph datasets demonstrate the effectiveness of our proposed model.Comment: Full paper (with Appendix) of AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Few-Shot Knowledge Graph Completion

Author: Chawla Nitesh V.
Huang Chao
Jiang Meng
Li Zhenhui
Yao Huaxiu
Zhang Chuxu
Publication venue
Publication date: 25/11/2019
Field of study

Knowledge graphs (KGs) serve as useful resources for various natural language processing applications. Previous KG completion approaches require a large number of training instances (i.e., head-tail entity pairs) for every relation. The real case is that for most of the relations, very few entity pairs are available. Existing work of one-shot learning limits method generalizability for few-shot scenarios and does not fully use the supervisory information; however, few-shot KG completion has not been well studied yet. In this work, we propose a novel few-shot relation learning model (FSRL) that aims at discovering facts of new relations with few-shot references. FSRL can effectively capture knowledge from heterogeneous graph structure, aggregate representations of few-shot references, and match similar entity pairs of reference set for every relation. Extensive experiments on two public datasets demonstrate that FSRL outperforms the state-of-the-art

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Performance analysis of high-speed railway communication systems subjected to co-channel interference and channel estimation errors

Author: Hanzo Lajos
Huang Qing
Jin Fan
Tan Zhenhui
Wang Haibo
Zhang Jiayi
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date
Field of study

The performance of high-speed railway wireless communication systems is studied in the presence of co-channel interference and imperfect channel estimation in the uplink. The authors derive exact closed-form expressions for the outage probability and investigate the impact of fading severity. New explicit expressions are derived for both the level crossing rate and average outage duration for illustrating the impact of mobile speed and channel estimation errors on the achievable system performance. Our results are generalised and hence they subsume a range of previously reported results

Southampton (e-Prints Soton)

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Author: Huang Rongjie
Jiang Ziyue
Ren Yi
Yang Qian
Ye Zhenhui
Zhao Zhou
Zuo Jialong
Publication venue
Publication date: 22/05/2023
Field of study

Stutter removal is an essential scenario in the field of speech editing. However, when the speech recording contains stutters, the existing text-based speech editing approaches still suffer from: 1) the over-smoothing problem in the edited speech; 2) lack of robustness due to the noise introduced by stutter; 3) to remove the stutters, users are required to determine the edited region manually. To tackle the challenges in stutter removal, we propose FluentSpeech, a stutter-oriented automatic speech editing model. Specifically, 1) we propose a context-aware diffusion model that iteratively refines the modified mel-spectrogram with the guidance of context features; 2) we introduce a stutter predictor module to inject the stutter information into the hidden sequence; 3) we also propose a stutter-oriented automatic speech editing (SASE) dataset that contains spontaneous speech recordings with time-aligned stutter labels to train the automatic stutter localization model. Experimental results on VCTK and LibriTTS datasets demonstrate that our model achieves state-of-the-art performance on speech editing. Further experiments on our SASE dataset show that FluentSpeech can effectively improve the fluency of stuttering speech in terms of objective and subjective metrics. Code and audio samples can be found at https://github.com/Zain-Jiang/Speech-Editing-Toolkit.Comment: Accepted by ACL 2023 (Findings

arXiv.org e-Print Archive

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Author: Huang Jiawei
Huang Rongjie
Liu Jinglin
Ma Zejun
Ren Yi
Yang Dongchao
Ye Zhenhui
Yin Xiang
Zhang Chen
Zhao Zhou
Publication venue
Publication date: 29/05/2023
Field of study

Large diffusion models have been successful in text-to-audio (T2A) synthesis tasks, but they often suffer from common issues such as semantic misalignment and poor temporal consistency due to limited natural language understanding and data scarcity. Additionally, 2D spatial structures widely used in T2A works lead to unsatisfactory audio quality when generating variable-length audio samples since they do not adequately prioritize temporal information. To address these challenges, we propose Make-an-Audio 2, a latent diffusion-based T2A method that builds on the success of Make-an-Audio. Our approach includes several techniques to improve semantic alignment and temporal consistency: Firstly, we use pre-trained large language models (LLMs) to parse the text into structured pairs for better temporal information capture. We also introduce another structured-text encoder to aid in learning semantic alignment during the diffusion denoising process. To improve the performance of variable length generation and enhance the temporal information extraction, we design a feed-forward Transformer-based diffusion denoiser. Finally, we use LLMs to augment and transform a large amount of audio-label data into audio-text datasets to alleviate the problem of scarcity of temporal data. Extensive experiments show that our method outperforms baseline models in both objective and subjective metrics, and achieves significant gains in temporal information understanding, semantic consistency, and sound quality

arXiv.org e-Print Archive

NEMO-Binding Domain Peptide Attenuates Lipopolysaccharide-Induced Acute Lung Injury by Inhibiting the NF- κ

Author: Jianhua Huang
Li Li
Linxin Zheng
Weifeng Yuan
Wenjie Huang
Zhenhui Guo
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Amplifying the Music Listening Experience through Song Comments on Music Streaming Platforms

Author: Chen Longfei
Huang Yangkun
Li Quan
Liu Qianyu
Ma Xiaojuan
Peng Zhenhui
Sun Zhida
Zeng Haipeng
Zhang Chenyang
Publication venue
Publication date: 07/08/2023
Field of study

Music streaming services are increasingly popular among younger generations who seek social experiences through personal expression and sharing of subjective feelings in comments. However, such emotional aspects are often ignored by current platforms, which affects the listeners' ability to find music that triggers specific personal feelings. To address this gap, this study proposes a novel approach that leverages deep learning methods to capture contextual keywords, sentiments, and induced mechanisms from song comments. The study augments a current music app with two features, including the presentation of tags that best represent song comments and a novel map metaphor that reorganizes song comments based on chronological order, content, and sentiment. The effectiveness of the proposed approach is validated through a usage scenario and a user study that demonstrate its capability to improve the user experience of exploring songs and browsing comments of interest. This study contributes to the advancement of music streaming services by providing a more personalized and emotionally rich music experience for younger generations.Comment: In the Proceedings of ChinaVis 202

arXiv.org e-Print Archive

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

Author: Huang Rongjie
Jiang Ziyue
Liu Luping
Wang Yongqi
Weng Chao
Yang Dongchao
Ye Zhenhui
Yu Dong
Zhang Chunlei
Zhao Zhou
Publication venue
Publication date: 30/05/2023
Field of study

Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common. In addition, the majority of voice synthesis models currently rely on annotated audio data, but it is crucial to scale them to self-supervised datasets in order to effectively capture the wide range of acoustic variations present in human voice, including speaker identity, emotion, and prosody. In this work, we propose Make-A-Voice, a unified framework for synthesizing and manipulating voice signals from discrete representations. Make-A-Voice leverages a "coarse-to-fine" approach to model the human voice, which involves three stages: 1) semantic stage: model high-level transformation between linguistic content and self-supervised semantic tokens, 2) acoustic stage: introduce varying control signals as acoustic conditions for semantic-to-acoustic modeling, and 3) generation stage: synthesize high-fidelity waveforms from acoustic tokens. Make-A-Voice offers notable benefits as a unified voice synthesis framework: 1) Data scalability: the major backbone (i.e., acoustic and generation stage) does not require any annotations, and thus the training data could be scaled up. 2) Controllability and conditioning flexibility: we investigate different conditioning mechanisms and effectively handle three voice synthesis applications, including text-to-speech (TTS), voice conversion (VC), and singing voice synthesis (SVS) by re-synthesizing the discrete voice representations with prompt guidance. Experimental results demonstrate that Make-A-Voice exhibits superior audio quality and style similarity compared with competitive baseline models. Audio samples are available at https://Make-A-Voice.github.i

arXiv.org e-Print Archive

CBLab: Supporting the Training of Large-scale Traffic Control Policies with Scalable Traffic Simulation

Author: Du Yuhao
Huang Zherui
Li Fuliang
Li Zhenhui
Liang Chumeng
Liu Yicheng
Liu Zhanyu
Shi Hanyuan
Wu Kan
Zheng Guanjie
Publication venue
Publication date: 04/06/2023
Field of study

Traffic simulation provides interactive data for the optimization of traffic control policies. However, existing traffic simulators are limited by their lack of scalability and shortage in input data, which prevents them from generating interactive data from traffic simulation in the scenarios of real large-scale city road networks. In this paper, we present \textbf{C}ity \textbf{B}rain \textbf{Lab}, a toolkit for scalable traffic simulation. CBLab consists of three components: CBEngine, CBData, and CBScenario. CBEngine is a highly efficient simulator supporting large-scale traffic simulation. CBData includes a traffic dataset with road network data of 100 cities all around the world. We also develop a pipeline to conduct a one-click transformation from raw road networks to input data of our traffic simulation. Combining CBEngine and CBData allows researchers to run scalable traffic simulations in the road network of real large-scale cities. Based on that, CBScenario implements an interactive environment and a benchmark for two scenarios of traffic control policies respectively, with which traffic control policies adaptable for large-scale urban traffic can be trained and tuned. To the best of our knowledge, CBLab is the first infrastructure supporting traffic control policy optimization in large-scale urban scenarios. CBLab has supported the City Brain Challenge @ KDD CUP 2021. The project is available on GitHub:~\url{https://github.com/CityBrainLab/CityBrainLab.git}.Comment: Accepted by KDD2023 (Applied Data Science Track

arXiv.org e-Print Archive