127 research outputs found
Connector 0.5: A unified framework for graph representation learning
Graph representation learning models aim to represent the graph structure and
its features into low-dimensional vectors in a latent space, which can benefit
various downstream tasks, such as node classification and link prediction. Due
to its powerful graph data modelling capabilities, various graph embedding
models and libraries have been proposed to learn embeddings and help
researchers ease conducting experiments. In this paper, we introduce a novel
graph representation framework covering various graph embedding models, ranging
from shallow to state-of-the-art models, namely Connector. First, we consider
graph generation by constructing various types of graphs with different
structural relations, including homogeneous, signed, heterogeneous, and
knowledge graphs. Second, we introduce various graph representation learning
models, ranging from shallow to deep graph embedding models. Finally, we plan
to build an efficient open-source framework that can provide deep graph
embedding models to represent structural relations in graphs. The framework is
available at https://github.com/NSLab-CUK/Connector.Comment: An unified framework for graph representation learnin
Into-TTS : Intonation Template based Prosody Control System
Intonations take an important role in delivering the intention of the
speaker. However, current end-to-end TTS systems often fail to model proper
intonations. To alleviate this problem, we propose a novel, intuitive method to
synthesize speech in different intonations using predefined intonation
templates. Prior to the acoustic model training, speech data are automatically
grouped into intonation templates by k-means clustering, according to their
sentence-final F0 contour. Two proposed modules are added to the end-to-end TTS
framework: intonation classifier and intonation encoder. The intonation
classifier recommends a suitable intonation template to the given text. The
intonation encoder, attached to the text encoder output, synthesizes speech
abiding the requested intonation template. Main contributions of our paper are:
(a) an easy-to-use intonation control system covering a wide range of users;
(b) better performance in wrapping speech in a requested intonation with
improved pitch distance and MOS; and (c) feasibility to future integration
between TTS and NLP, TTS being able to utilize contextual information. Audio
samples are available at https://srtts.github.io/IntoTTS.Comment: Submitted to INTERSPEECH 202
Efficient Parallel Audio Generation using Group Masked Language Modeling
We present a fast and high-quality codec language model for parallel audio
generation. While SoundStorm, a state-of-the-art parallel audio generation
model, accelerates inference speed compared to autoregressive models, it still
suffers from slow inference due to iterative sampling. To resolve this problem,
we propose Group-Masked Language Modeling~(G-MLM) and Group Iterative Parallel
Decoding~(G-IPD) for efficient parallel audio generation. Both the training and
sampling schemes enable the model to synthesize high-quality audio with a small
number of iterations by effectively modeling the group-wise conditional
dependencies. In addition, our model employs a cross-attention-based
architecture to capture the speaker style of the prompt voice and improves
computational efficiency. Experimental results demonstrate that our proposed
model outperforms the baselines in prompt-based audio generation.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a
speech sample with the voice characteristic of an unseen speaker. The main
challenge of ZSM-TTS is to increase the overall speaker similarity for unseen
speakers. One of the most successful speaker conditioning methods for
flow-based multi-speaker text-to-speech (TTS) models is to utilize the
functions which predict the scale and bias parameters of the affine coupling
layers according to the given speaker embedding vector. In this letter, we
improve on the previous speaker conditioning method by introducing a
speaker-normalized affine coupling (SNAC) layer which allows for unseen speaker
speech synthesis in a zero-shot manner leveraging a normalization-based
conditioning technique. The newly designed coupling layer explicitly normalizes
the input by the parameters predicted from a speaker embedding vector while
training, enabling an inverse process of denormalizing for a new speaker
embedding at inference. The proposed conditioning scheme yields the
state-of-the-art performance in terms of the speech quality and speaker
similarity in a ZSM-TTS setting.Comment: Accepted to IEEE Signal Processing Letter
CloudNine: Analyzing Meteorological Observation Impact on Weather Prediction Using Explainable Graph Neural Networks
The impact of meteorological observations on weather forecasting varies with
sensor type, location, time, and other environmental factors. Thus,
quantitative analysis of observation impacts is crucial for effective and
efficient development of weather forecasting systems. However, the existing
impact analysis methods are difficult to be widely applied due to their high
dependencies on specific forecasting systems. Also, they cannot provide
observation impacts at multiple spatio-temporal scales, only global impacts of
observation types. To address these issues, we present a novel system called
``CloudNine,'' which allows analysis of individual observations' impacts on
specific predictions based on explainable graph neural networks (XGNNs).
Combining an XGNN-based atmospheric state estimation model with a numerical
weather prediction model, we provide a web application to search for
observations in the 3D space of the Earth system and to visualize the impact of
individual observations on predictions in specific spatial regions and time
periods
Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis
Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance
its systems by enlarging the training data through crowd-sourcing or augmenting
existing speech data. However, the use of low-quality data has led to a decline
in the overall system performance. To avoid such degradation, instead of
directly augmenting the input data, we propose a latent filling (LF) method
that adopts simple but effective latent space data augmentation in the speaker
embedding space of the ZS-TTS system. By incorporating a consistency loss, LF
can be seamlessly integrated into existing ZS-TTS systems without the need for
additional training stages. Experimental results show that LF significantly
improves speaker similarity while preserving speech quality.Comment: Accepted to ICASSP 202
An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
With the recent developments in cross-lingual Text-to-Speech (TTS) systems,
L2 (second-language, or foreign) accent problems arise. Moreover, running a
subjective evaluation for such cross-lingual TTS systems is troublesome. The
vowel space analysis, which is often utilized to explore various aspects of
language including L2 accents, is a great alternative analysis tool. In this
study, we apply the vowel space analysis method to explore L2 accents of
cross-lingual TTS systems. Through the vowel space analysis, we observe the
three followings: a) a parallel architecture (Glow-TTS) is less L2-accented
than an auto-regressive one (Tacotron); b) L2 accents are more dominant in
non-shared vowels in a language pair; and c) L2 accents of cross-lingual TTS
systems share some phenomena with those of human L2 learners. Our findings
imply that it is necessary for TTS systems to handle each language pair
differently, depending on their linguistic characteristics such as non-shared
vowels. They also hint that we can further incorporate linguistics knowledge in
developing cross-lingual TTS systems.Comment: Submitted to ICASSP 202
spatio temporal contextualization of queries for microtexts in social media mathematical modeling
Abstract In this paper, we present our ongoing project on query contextualization by integrating all possible IoT-based data sources. Most importantly, mobile users are regarded as the IoT sensors which can be the textual data sources with spatio-temporal contexts. Given a large amount of text streams, it has been difficult for the traditional information retrieval systems to conduct the searching tasks. The goal of this work is i ) to understand and process microtexts in social media (e.g., Twitter and Facebook), and ii ) to reformulate the queries for searching for relevant microtexts in these social media
Low-Loading of Pt Nanoparticles on 3D Carbon Foam Support for Highly Active and Stable Hydrogen Production
Minimizing Pt loading is essential for designing cost-effective water electrolyzers and fuel cell systems. Recently, three-dimensional macroporous open-pore electroactive supports have been widely regarded as promising architectures to lower loading amounts of Pt because of its large surface area, easy electrolyte access to Pt sites, and superior gas diffusion properties to accelerate diffusion of H2 bubbles from the Pt surface. However, studies to date have mainly focused on Pt loading on Ni-based 3D open pore supports which are prone to corrosion in highly acidic and alkaline conditions. Here, we investigate electrodeposition of Pt nanoparticles in low-loading amounts on commercially available, inexpensive, 3D carbon foam (CF) support and benchmark their activity and stability for electrolytic hydrogen production. We first elucidate the effect of deposition potential on the Pt nanoparticle size, density and subsequently its coverage on 3D CF. Analysis of the Pt deposit using scanning electron microscopy images reveal that for a given deposition charge density, the particle density increases (with cubic power) and particle size decreases (linearly) with deposition overpotential. A deposition potential of −0.4 V vs. standard calomel electrode (SCE) provided the highest Pt nanoparticle coverage on 3D CF surface. Different loading amounts of Pt (0.0075–0.1 mgPt/cm2) was then deposited on CF at −0.4 V vs. SCE and subsequently studied for its hydrogen evolution reaction (HER) activity in acidic 1M H2SO4 electrolyte. The Pt/CF catalyst with loading amounts as low as 0.06 mgPt/cm2 (10-fold lower than state-of-the-art commercial electrodes) demonstrated a mass activity of 2.6 ampere per milligram Pt at 200 mV overpotential, nearly 6-fold greater than the commercial Pt/C catalyst tested under similar conditions. The 3D architectured electrode also demonstrated excellent stability, showing <7% loss in activity after 60 h of constant current water electrolysis at 100 mA/cm2
Biological Assembly and Synthesis of Inorganic Nanostructures
Science technologies have been in pursuit of smaller, faster and more efficient devices and enormous efforts made by myriad numbers of scientists have provided us with electronics in reduced volumes with improved performances. Miniaturization of electronic circuits down to micrometer scale has been well-developed as industrial processes and it is easy to witness electronic products containing integrated circuits consisted of microstructures in our everyday life. However, miniaturization of circuit components down to nanometer scale has revealed new challenges not only for difficult handling of diminutive structures but also for unusual physical properties of nanomaterials. Countless numbers of conventional chemical and physical studies have been dedicated to exploit the benefit of the unique properties of nanostructures by developing efficient techniques for controlled synthesis and assembly of nanostructures. However, environmental concerns of using toxic solvent systems and high energy-consuming processes, and pursuit of highly selective molecular interactions for highly precise assemblies have averted the eyes of scientists to biological materials. Biorecognition properties of biological materials are attractive for achieving programmed self-assembly of nanostructures and biomolecules with metal-reducing ability are very inviting for developments of environmentally-acceptable synthesis processes. In the light of above discussion, this thesis takes the advantages of biological approaches to assemble and synthesize inorganic nanostructures in a controlled manner. DNA was used for the assembly processes due to their facilities of sequence programming and chemical modifications. Spatially controlled assembly of multi-segmented Au/Pd/Au nanowires across gold electrodes has been demonstrated using thiolated DNA strands functionalized on the gold surface of nanowires and electrodes. Electron transport properties of DNA-assisted assembled nanowires were demonstrated showing negligible blocking effect by DNA layers hybridized between nanowires and electrodes. The assembled Au/Pd/Au nanowire was used for hydrogen sensing manifesting the applicability of DNA-assisted assembly to build functional nanodevices. Amino acids are essential as building blocks for proteins and for metabolisms. Recently, amino acids have been given another important role as a reducing and capping agent for the synthesis of gold nanostructures. Amino acid-mediated synthesis of gold nanostructures has been demonstrated showing the capability of biological approaches to synthesize single crystalline gold nanostructures in 0-D, 1-D and 2-D dimensions by manipulating the reaction environment. Structural changes of gold nanostructures due to the speciation of gold complexes were systematically demonstrated by altering solvent conditions. The effect of the side chains of amino acids on the structural features of gold nanostructures was systematically demonstrated. Distinguished electron transport properties were observed for single crystalline nanoribbons showing resistivity lower by an order of magnitude than polycrystalline counterparts. Rapid reversible room temperature H2S gas sensor was fabricated using AC aligned gold nanoparticle a
- …