56 research outputs found
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past and future sentences. At inference time, instead of the standard Gaussian distribution used by VAE, CUC-VAE allows sampling from an utterance-specific prior distribution conditioned on cross-utterance information, which allows the prosody features generated by the TTS system to be related to the context and is more similar to how humans naturally produce prosody. The performance of CUC-VAE is evaluated via a qualitative listening test for naturalness, intelligibility and quantitative measurements, including word error rates and the standard deviation of prosody attributes. Experimental results on LJ-Speech and LibriTTS data show that the proposed CUC-VAE TTS system improves naturalness and prosody diversity with clear margins
Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework
The socially-aware navigation system has evolved to adeptly avoid various
obstacles while performing multiple tasks, such as point-to-point navigation,
human-following, and -guiding. However, a prominent gap persists: in
Human-Robot Interaction (HRI), the procedure of communicating commands to
robots demands intricate mathematical formulations. Furthermore, the transition
between tasks does not quite possess the intuitive control and user-centric
interactivity that one would desire. In this work, we propose an LLM-driven
interactive multimodal multitask robot navigation framework, termed LIM2N, to
solve the above new challenge in the navigation field. We achieve this by first
introducing a multimodal interaction framework where language and hand-drawn
inputs can serve as navigation constraints and control objectives. Next, a
reinforcement learning agent is built to handle multiple tasks with the
received information. Crucially, LIM2N creates smooth cooperation among the
reasoning of multimodal input, multitask planning, and adaptation and
processing of the intelligent sensing modules in the complicated system.
Extensive experiments are conducted in both simulation and the real world
demonstrating that LIM2N has superior user needs understanding, alongside an
enhanced interactive experience
Cross-Utterance Conditioned VAE for Speech Generation
Speech synthesis systems powered by neural networks hold promise for
multimedia production, but frequently face issues with producing expressive
speech and seamless editing. In response, we present the Cross-Utterance
Conditioned Variational Autoencoder speech synthesis (CUC-VAE S2) framework to
enhance prosody and ensure natural speech generation. This framework leverages
the powerful representational capabilities of pre-trained language models and
the re-expression abilities of variational autoencoders (VAEs). The core
component of the CUC-VAE S2 framework is the cross-utterance CVAE, which
extracts acoustic, speaker, and textual features from surrounding sentences to
generate context-sensitive prosodic features, more accurately emulating human
prosody generation. We further propose two practical algorithms tailored for
distinct speech synthesis applications: CUC-VAE TTS for text-to-speech and
CUC-VAE SE for speech editing. The CUC-VAE TTS is a direct application of the
framework, designed to generate audio with contextual prosody derived from
surrounding texts. On the other hand, the CUC-VAE SE algorithm leverages real
mel spectrogram sampling conditioned on contextual information, producing audio
that closely mirrors real sound and thereby facilitating flexible speech
editing based on text such as deletion, insertion, and replacement.
Experimental results on the LibriTTS datasets demonstrate that our proposed
models significantly enhance speech synthesis and editing, producing more
natural and expressive speech.Comment: 13 pages
Structural bias in T4 RNA ligase-mediated 3ā²-adapter ligation
T4 RNA ligases are commonly used to attach adapters to RNAs, but large differences in ligation efficiency make detection and quantitation problematic. We developed a ligation selection strategy using random RNAs in combination with high-throughput sequencing to gain insight into the differences in efficiency of ligating pre-adenylated DNA adapters to RNA 3ā²-ends. After analyzing biases in RNA sequence, secondary structure and RNA-adapter cofold structure, we conclude that T4 RNA ligases do not show significant primary sequence preference in RNA substrates, but are biased against structural features within RNAs and adapters. Specifically, RNAs with less than three unstructured nucleotides at the 3ā²-end and RNAs that are predicted to cofold with an adapter in unfavorable structures are likely to be poorly ligated. The effect of RNA-adapter cofold structures on ligation is supported by experiments where the ligation efficiency of specific miRNAs was changed by designing adapters to alter cofold structure. In addition, we show that using adapters with randomized regions results in higher ligation efficiency and reduced ligation bias. We propose that using randomized adapters may improve RNA representation in experiments that include a 3ā²-adapter ligation step
NtGNL1 Plays an Essential Role in Pollen Tube Tip Growth and Orientation Likely via Regulation of Post-Golgi Trafficking
Background: Tobacco GNOM LIKE 1 (NtGNL1), a new member of the Big/GBF family, is characterized by a sec 7 domain. Thus, we proposed that NtGNL1 may function in regulating pollen tube growth for vesicle trafficking. Methodology/Principal Findings: To test this hypothesis, we used an RNAi technique to down-regulate NtGNL1 expression and found that pollen tube growth and orientation were clearly inhibited. Cytological observations revealed that both timing and behavior of endocytosis was disrupted, and endosome trafficking to prevacuolar compartments (PVC) or multivesicular bodies (MVB) was altered in pollen tube tips. Moreover, NtGNL1 seemed to partially overlap with Golgi bodies, but clearly colocalized with putative late endosome compartments. We also observed that in such pollen tubes, the Golgi apparatus disassembled and fused with the endoplasmic reticulum, indicating abnormal post-Golgi trafficking. During this process, actin organization was also remodeled. Conclusions/Significance: Thus, we revealed that NtGNL1 is essential for pollen tube growth and orientation and it likel
6G Network AI Architecture for Everyone-Centric Customized Services
Mobile communication standards were developed for enhancing transmission and
network performance by using more radio resources and improving spectrum and
energy efficiency. How to effectively address diverse user requirements and
guarantee everyone's Quality of Experience (QoE) remains an open problem. The
Sixth Generation (6G) mobile systems will solve this problem by utilizing
heterogenous network resources and pervasive intelligence to support
everyone-centric customized services anywhere and anytime. In this article, we
first coin the concept of Service Requirement Zone (SRZ) on the user side to
characterize and visualize the integrated service requirements and preferences
of specific tasks of individual users. On the system side, we further introduce
the concept of User Satisfaction Ratio (USR) to evaluate the system's overall
service ability of satisfying a variety of tasks with different SRZs. Then, we
propose a network Artificial Intelligence (AI) architecture with integrated
network resources and pervasive AI capabilities for supporting customized
services with guaranteed QoEs. Finally, extensive simulations show that the
proposed network AI architecture can consistently offer a higher USR
performance than the cloud AI and edge AI architectures with respect to
different task scheduling algorithms, random service requirements, and dynamic
network conditions
Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history
Colobines are a unique group of Old World monkeys that principally eat leaves and seeds rather than fruits and insects. We report the sequencing at 146Ć coverage, de novo assembly and analyses of the genome of a male golden snub-nosed monkey (Rhinopithecus roxellana) and resequencing at 30Ć coverage of three related species (Rhinopithecus bieti, Rhinopithecus brelichi and Rhinopithecus strykeri). Comparative analyses showed that Asian colobines have an enhanced ability to derive energy from fatty acids and to degrade xenobiotics. We found evidence for functional evolution in the colobine RNASE1 gene, encoding a key secretory RNase that digests the high concentrations of bacterial RNA derived from symbiotic microflora. Demographic reconstructions indicated that the profile of ancient effective population sizes for R. roxellana more closely resembles that of giant panda rather than its congeners. These findings offer new insights into the dietary adaptations and evolutionary history of colobine primates
Resource allocation in broadband wireless networks
published_or_final_versionElectrical and Electronic EngineeringDoctoralDoctor of Philosoph
- ā¦