160 research outputs found
Graph Few-shot Learning via Knowledge Transfer
Towards the challenging problem of semi-supervised node classification, there
have been extensive studies. As a frontier, Graph Neural Networks (GNNs) have
aroused great interest recently, which update the representation of each node
by aggregating information of its neighbors. However, most GNNs have shallow
layers with a limited receptive field and may not achieve satisfactory
performance especially when the number of labeled nodes is quite small. To
address this challenge, we innovatively propose a graph few-shot learning (GFL)
algorithm that incorporates prior knowledge learned from auxiliary graphs to
improve classification accuracy on the target graph. Specifically, a
transferable metric space characterized by a node embedding and a
graph-specific prototype embedding function is shared between auxiliary graphs
and the target, facilitating the transfer of structural knowledge. Extensive
experiments and ablation studies on four real-world graph datasets demonstrate
the effectiveness of our proposed model.Comment: Full paper (with Appendix) of AAAI 202
Few-Shot Knowledge Graph Completion
Knowledge graphs (KGs) serve as useful resources for various natural language
processing applications. Previous KG completion approaches require a large
number of training instances (i.e., head-tail entity pairs) for every relation.
The real case is that for most of the relations, very few entity pairs are
available. Existing work of one-shot learning limits method generalizability
for few-shot scenarios and does not fully use the supervisory information;
however, few-shot KG completion has not been well studied yet. In this work, we
propose a novel few-shot relation learning model (FSRL) that aims at
discovering facts of new relations with few-shot references. FSRL can
effectively capture knowledge from heterogeneous graph structure, aggregate
representations of few-shot references, and match similar entity pairs of
reference set for every relation. Extensive experiments on two public datasets
demonstrate that FSRL outperforms the state-of-the-art
Performance analysis of high-speed railway communication systems subjected to co-channel interference and channel estimation errors
The performance of high-speed railway wireless communication systems is studied in the presence of co-channel interference and imperfect channel estimation in the uplink. The authors derive exact closed-form expressions for the outage probability and investigate the impact of fading severity. New explicit expressions are derived for both the level crossing rate and average outage duration for illustrating the impact of mobile speed and channel estimation errors on the achievable system performance. Our results are generalised and hence they subsume a range of previously reported results
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Stutter removal is an essential scenario in the field of speech editing.
However, when the speech recording contains stutters, the existing text-based
speech editing approaches still suffer from: 1) the over-smoothing problem in
the edited speech; 2) lack of robustness due to the noise introduced by
stutter; 3) to remove the stutters, users are required to determine the edited
region manually. To tackle the challenges in stutter removal, we propose
FluentSpeech, a stutter-oriented automatic speech editing model. Specifically,
1) we propose a context-aware diffusion model that iteratively refines the
modified mel-spectrogram with the guidance of context features; 2) we introduce
a stutter predictor module to inject the stutter information into the hidden
sequence; 3) we also propose a stutter-oriented automatic speech editing (SASE)
dataset that contains spontaneous speech recordings with time-aligned stutter
labels to train the automatic stutter localization model. Experimental results
on VCTK and LibriTTS datasets demonstrate that our model achieves
state-of-the-art performance on speech editing. Further experiments on our SASE
dataset show that FluentSpeech can effectively improve the fluency of
stuttering speech in terms of objective and subjective metrics. Code and audio
samples can be found at https://github.com/Zain-Jiang/Speech-Editing-Toolkit.Comment: Accepted by ACL 2023 (Findings
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Large diffusion models have been successful in text-to-audio (T2A) synthesis
tasks, but they often suffer from common issues such as semantic misalignment
and poor temporal consistency due to limited natural language understanding and
data scarcity. Additionally, 2D spatial structures widely used in T2A works
lead to unsatisfactory audio quality when generating variable-length audio
samples since they do not adequately prioritize temporal information. To
address these challenges, we propose Make-an-Audio 2, a latent diffusion-based
T2A method that builds on the success of Make-an-Audio. Our approach includes
several techniques to improve semantic alignment and temporal consistency:
Firstly, we use pre-trained large language models (LLMs) to parse the text into
structured pairs for better temporal information capture. We
also introduce another structured-text encoder to aid in learning semantic
alignment during the diffusion denoising process. To improve the performance of
variable length generation and enhance the temporal information extraction, we
design a feed-forward Transformer-based diffusion denoiser. Finally, we use
LLMs to augment and transform a large amount of audio-label data into
audio-text datasets to alleviate the problem of scarcity of temporal data.
Extensive experiments show that our method outperforms baseline models in both
objective and subjective metrics, and achieves significant gains in temporal
information understanding, semantic consistency, and sound quality
Amplifying the Music Listening Experience through Song Comments on Music Streaming Platforms
Music streaming services are increasingly popular among younger generations
who seek social experiences through personal expression and sharing of
subjective feelings in comments. However, such emotional aspects are often
ignored by current platforms, which affects the listeners' ability to find
music that triggers specific personal feelings. To address this gap, this study
proposes a novel approach that leverages deep learning methods to capture
contextual keywords, sentiments, and induced mechanisms from song comments. The
study augments a current music app with two features, including the
presentation of tags that best represent song comments and a novel map metaphor
that reorganizes song comments based on chronological order, content, and
sentiment. The effectiveness of the proposed approach is validated through a
usage scenario and a user study that demonstrate its capability to improve the
user experience of exploring songs and browsing comments of interest. This
study contributes to the advancement of music streaming services by providing a
more personalized and emotionally rich music experience for younger
generations.Comment: In the Proceedings of ChinaVis 202
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Various applications of voice synthesis have been developed independently
despite the fact that they generate "voice" as output in common. In addition,
the majority of voice synthesis models currently rely on annotated audio data,
but it is crucial to scale them to self-supervised datasets in order to
effectively capture the wide range of acoustic variations present in human
voice, including speaker identity, emotion, and prosody. In this work, we
propose Make-A-Voice, a unified framework for synthesizing and manipulating
voice signals from discrete representations. Make-A-Voice leverages a
"coarse-to-fine" approach to model the human voice, which involves three
stages: 1) semantic stage: model high-level transformation between linguistic
content and self-supervised semantic tokens, 2) acoustic stage: introduce
varying control signals as acoustic conditions for semantic-to-acoustic
modeling, and 3) generation stage: synthesize high-fidelity waveforms from
acoustic tokens. Make-A-Voice offers notable benefits as a unified voice
synthesis framework: 1) Data scalability: the major backbone (i.e., acoustic
and generation stage) does not require any annotations, and thus the training
data could be scaled up. 2) Controllability and conditioning flexibility: we
investigate different conditioning mechanisms and effectively handle three
voice synthesis applications, including text-to-speech (TTS), voice conversion
(VC), and singing voice synthesis (SVS) by re-synthesizing the discrete voice
representations with prompt guidance. Experimental results demonstrate that
Make-A-Voice exhibits superior audio quality and style similarity compared with
competitive baseline models. Audio samples are available at
https://Make-A-Voice.github.i
CBLab: Supporting the Training of Large-scale Traffic Control Policies with Scalable Traffic Simulation
Traffic simulation provides interactive data for the optimization of traffic
control policies. However, existing traffic simulators are limited by their
lack of scalability and shortage in input data, which prevents them from
generating interactive data from traffic simulation in the scenarios of real
large-scale city road networks.
In this paper, we present \textbf{C}ity \textbf{B}rain \textbf{Lab}, a
toolkit for scalable traffic simulation. CBLab consists of three components:
CBEngine, CBData, and CBScenario. CBEngine is a highly efficient simulator
supporting large-scale traffic simulation. CBData includes a traffic dataset
with road network data of 100 cities all around the world. We also develop a
pipeline to conduct a one-click transformation from raw road networks to input
data of our traffic simulation. Combining CBEngine and CBData allows
researchers to run scalable traffic simulations in the road network of real
large-scale cities. Based on that, CBScenario implements an interactive
environment and a benchmark for two scenarios of traffic control policies
respectively, with which traffic control policies adaptable for large-scale
urban traffic can be trained and tuned. To the best of our knowledge, CBLab is
the first infrastructure supporting traffic control policy optimization in
large-scale urban scenarios. CBLab has supported the City Brain Challenge @ KDD
CUP 2021. The project is available on
GitHub:~\url{https://github.com/CityBrainLab/CityBrainLab.git}.Comment: Accepted by KDD2023 (Applied Data Science Track
- …