127 research outputs found

    Connector 0.5: A unified framework for graph representation learning

    Full text link
    Graph representation learning models aim to represent the graph structure and its features into low-dimensional vectors in a latent space, which can benefit various downstream tasks, such as node classification and link prediction. Due to its powerful graph data modelling capabilities, various graph embedding models and libraries have been proposed to learn embeddings and help researchers ease conducting experiments. In this paper, we introduce a novel graph representation framework covering various graph embedding models, ranging from shallow to state-of-the-art models, namely Connector. First, we consider graph generation by constructing various types of graphs with different structural relations, including homogeneous, signed, heterogeneous, and knowledge graphs. Second, we introduce various graph representation learning models, ranging from shallow to deep graph embedding models. Finally, we plan to build an efficient open-source framework that can provide deep graph embedding models to represent structural relations in graphs. The framework is available at https://github.com/NSLab-CUK/Connector.Comment: An unified framework for graph representation learnin

    Into-TTS : Intonation Template based Prosody Control System

    Full text link
    Intonations take an important role in delivering the intention of the speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to the acoustic model training, speech data are automatically grouped into intonation templates by k-means clustering, according to their sentence-final F0 contour. Two proposed modules are added to the end-to-end TTS framework: intonation classifier and intonation encoder. The intonation classifier recommends a suitable intonation template to the given text. The intonation encoder, attached to the text encoder output, synthesizes speech abiding the requested intonation template. Main contributions of our paper are: (a) an easy-to-use intonation control system covering a wide range of users; (b) better performance in wrapping speech in a requested intonation with improved pitch distance and MOS; and (c) feasibility to future integration between TTS and NLP, TTS being able to utilize contextual information. Audio samples are available at https://srtts.github.io/IntoTTS.Comment: Submitted to INTERSPEECH 202

    Efficient Parallel Audio Generation using Group Masked Language Modeling

    Full text link
    We present a fast and high-quality codec language model for parallel audio generation. While SoundStorm, a state-of-the-art parallel audio generation model, accelerates inference speed compared to autoregressive models, it still suffers from slow inference due to iterative sampling. To resolve this problem, we propose Group-Masked Language Modeling~(G-MLM) and Group Iterative Parallel Decoding~(G-IPD) for efficient parallel audio generation. Both the training and sampling schemes enable the model to synthesize high-quality audio with a small number of iterations by effectively modeling the group-wise conditional dependencies. In addition, our model employs a cross-attention-based architecture to capture the speaker style of the prompt voice and improves computational efficiency. Experimental results demonstrate that our proposed model outperforms the baselines in prompt-based audio generation.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech

    Full text link
    Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voice characteristic of an unseen speaker. The main challenge of ZSM-TTS is to increase the overall speaker similarity for unseen speakers. One of the most successful speaker conditioning methods for flow-based multi-speaker text-to-speech (TTS) models is to utilize the functions which predict the scale and bias parameters of the affine coupling layers according to the given speaker embedding vector. In this letter, we improve on the previous speaker conditioning method by introducing a speaker-normalized affine coupling (SNAC) layer which allows for unseen speaker speech synthesis in a zero-shot manner leveraging a normalization-based conditioning technique. The newly designed coupling layer explicitly normalizes the input by the parameters predicted from a speaker embedding vector while training, enabling an inverse process of denormalizing for a new speaker embedding at inference. The proposed conditioning scheme yields the state-of-the-art performance in terms of the speech quality and speaker similarity in a ZSM-TTS setting.Comment: Accepted to IEEE Signal Processing Letter

    CloudNine: Analyzing Meteorological Observation Impact on Weather Prediction Using Explainable Graph Neural Networks

    Full text link
    The impact of meteorological observations on weather forecasting varies with sensor type, location, time, and other environmental factors. Thus, quantitative analysis of observation impacts is crucial for effective and efficient development of weather forecasting systems. However, the existing impact analysis methods are difficult to be widely applied due to their high dependencies on specific forecasting systems. Also, they cannot provide observation impacts at multiple spatio-temporal scales, only global impacts of observation types. To address these issues, we present a novel system called ``CloudNine,'' which allows analysis of individual observations' impacts on specific predictions based on explainable graph neural networks (XGNNs). Combining an XGNN-based atmospheric state estimation model with a numerical weather prediction model, we provide a web application to search for observations in the 3D space of the Earth system and to visualize the impact of individual observations on predictions in specific spatial regions and time periods

    Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis

    Full text link
    Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data. However, the use of low-quality data has led to a decline in the overall system performance. To avoid such degradation, instead of directly augmenting the input data, we propose a latent filling (LF) method that adopts simple but effective latent space data augmentation in the speaker embedding space of the ZS-TTS system. By incorporating a consistency loss, LF can be seamlessly integrated into existing ZS-TTS systems without the need for additional training stages. Experimental results show that LF significantly improves speaker similarity while preserving speech quality.Comment: Accepted to ICASSP 202

    An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

    Full text link
    With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis, which is often utilized to explore various aspects of language including L2 accents, is a great alternative analysis tool. In this study, we apply the vowel space analysis method to explore L2 accents of cross-lingual TTS systems. Through the vowel space analysis, we observe the three followings: a) a parallel architecture (Glow-TTS) is less L2-accented than an auto-regressive one (Tacotron); b) L2 accents are more dominant in non-shared vowels in a language pair; and c) L2 accents of cross-lingual TTS systems share some phenomena with those of human L2 learners. Our findings imply that it is necessary for TTS systems to handle each language pair differently, depending on their linguistic characteristics such as non-shared vowels. They also hint that we can further incorporate linguistics knowledge in developing cross-lingual TTS systems.Comment: Submitted to ICASSP 202

    spatio temporal contextualization of queries for microtexts in social media mathematical modeling

    Get PDF
    Abstract In this paper, we present our ongoing project on query contextualization by integrating all possible IoT-based data sources. Most importantly, mobile users are regarded as the IoT sensors which can be the textual data sources with spatio-temporal contexts. Given a large amount of text streams, it has been difficult for the traditional information retrieval systems to conduct the searching tasks. The goal of this work is i ) to understand and process microtexts in social media (e.g., Twitter and Facebook), and ii ) to reformulate the queries for searching for relevant microtexts in these social media

    Low-Loading of Pt Nanoparticles on 3D Carbon Foam Support for Highly Active and Stable Hydrogen Production

    Get PDF
    Minimizing Pt loading is essential for designing cost-effective water electrolyzers and fuel cell systems. Recently, three-dimensional macroporous open-pore electroactive supports have been widely regarded as promising architectures to lower loading amounts of Pt because of its large surface area, easy electrolyte access to Pt sites, and superior gas diffusion properties to accelerate diffusion of H2 bubbles from the Pt surface. However, studies to date have mainly focused on Pt loading on Ni-based 3D open pore supports which are prone to corrosion in highly acidic and alkaline conditions. Here, we investigate electrodeposition of Pt nanoparticles in low-loading amounts on commercially available, inexpensive, 3D carbon foam (CF) support and benchmark their activity and stability for electrolytic hydrogen production. We first elucidate the effect of deposition potential on the Pt nanoparticle size, density and subsequently its coverage on 3D CF. Analysis of the Pt deposit using scanning electron microscopy images reveal that for a given deposition charge density, the particle density increases (with cubic power) and particle size decreases (linearly) with deposition overpotential. A deposition potential of −0.4 V vs. standard calomel electrode (SCE) provided the highest Pt nanoparticle coverage on 3D CF surface. Different loading amounts of Pt (0.0075–0.1 mgPt/cm2) was then deposited on CF at −0.4 V vs. SCE and subsequently studied for its hydrogen evolution reaction (HER) activity in acidic 1M H2SO4 electrolyte. The Pt/CF catalyst with loading amounts as low as 0.06 mgPt/cm2 (10-fold lower than state-of-the-art commercial electrodes) demonstrated a mass activity of 2.6 ampere per milligram Pt at 200 mV overpotential, nearly 6-fold greater than the commercial Pt/C catalyst tested under similar conditions. The 3D architectured electrode also demonstrated excellent stability, showing <7% loss in activity after 60 h of constant current water electrolysis at 100 mA/cm2

    Biological Assembly and Synthesis of Inorganic Nanostructures

    No full text
    Science technologies have been in pursuit of smaller, faster and more efficient devices and enormous efforts made by myriad numbers of scientists have provided us with electronics in reduced volumes with improved performances. Miniaturization of electronic circuits down to micrometer scale has been well-developed as industrial processes and it is easy to witness electronic products containing integrated circuits consisted of microstructures in our everyday life. However, miniaturization of circuit components down to nanometer scale has revealed new challenges not only for difficult handling of diminutive structures but also for unusual physical properties of nanomaterials. Countless numbers of conventional chemical and physical studies have been dedicated to exploit the benefit of the unique properties of nanostructures by developing efficient techniques for controlled synthesis and assembly of nanostructures. However, environmental concerns of using toxic solvent systems and high energy-consuming processes, and pursuit of highly selective molecular interactions for highly precise assemblies have averted the eyes of scientists to biological materials. Biorecognition properties of biological materials are attractive for achieving programmed self-assembly of nanostructures and biomolecules with metal-reducing ability are very inviting for developments of environmentally-acceptable synthesis processes. In the light of above discussion, this thesis takes the advantages of biological approaches to assemble and synthesize inorganic nanostructures in a controlled manner. DNA was used for the assembly processes due to their facilities of sequence programming and chemical modifications. Spatially controlled assembly of multi-segmented Au/Pd/Au nanowires across gold electrodes has been demonstrated using thiolated DNA strands functionalized on the gold surface of nanowires and electrodes. Electron transport properties of DNA-assisted assembled nanowires were demonstrated showing negligible blocking effect by DNA layers hybridized between nanowires and electrodes. The assembled Au/Pd/Au nanowire was used for hydrogen sensing manifesting the applicability of DNA-assisted assembly to build functional nanodevices. Amino acids are essential as building blocks for proteins and for metabolisms. Recently, amino acids have been given another important role as a reducing and capping agent for the synthesis of gold nanostructures. Amino acid-mediated synthesis of gold nanostructures has been demonstrated showing the capability of biological approaches to synthesize single crystalline gold nanostructures in 0-D, 1-D and 2-D dimensions by manipulating the reaction environment. Structural changes of gold nanostructures due to the speciation of gold complexes were systematically demonstrated by altering solvent conditions. The effect of the side chains of amino acids on the structural features of gold nanostructures was systematically demonstrated. Distinguished electron transport properties were observed for single crystalline nanoribbons showing resistivity lower by an order of magnitude than polycrystalline counterparts. Rapid reversible room temperature H2S gas sensor was fabricated using AC aligned gold nanoparticle a
    • …
    corecore