4,670 research outputs found

    Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

    Full text link
    Expressive text-to-speech (TTS) aims to synthesize speeches with human-like tones, moods, or even artistic attributes. Recent advancements in expressive TTS empower users with the ability to directly control synthesis style through natural language prompts. However, these methods often require excessive training with a significant amount of style-annotated data, which can be challenging to acquire. Moreover, they may have limited adaptability due to fixed style annotations. In this work, we present FreeStyleTTS (FS-TTS), a controllable expressive TTS model with minimal human annotations. Our approach utilizes a large language model (LLM) to transform expressive TTS into a style retrieval task. The LLM selects the best-matching style references from annotated utterances based on external style prompts, which can be raw input text or natural language style descriptions. The selected reference guides the TTS pipeline to synthesize speeches with the intended style. This innovative approach provides flexible, versatile, and precise style control with minimal human workload. Experiments on a Mandarin storytelling corpus demonstrate FS-TTS's proficiency in leveraging LLM's semantic inference ability to retrieve desired styles from either input text or user-defined descriptions. This results in synthetic speeches that are closely aligned with the specified styles.Comment: 5 pages,3 figures, submitted to ICASSP 202

    Discriminative Elastic-Net Regularized Linear Regression

    Get PDF
    In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zeroone matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of theses methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available datasets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html

    Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation

    Full text link
    Optical flow is an easily conceived and precious cue for advancing unsupervised video object segmentation (UVOS). Most of the previous methods directly extract and fuse the motion and appearance features for segmenting target objects in the UVOS setting. However, optical flow is intrinsically an instantaneous velocity of all pixels among consecutive frames, thus making the motion features not aligned well with the primary objects among the corresponding frames. To solve the above challenge, we propose a concise, practical, and efficient architecture for appearance and motion feature alignment, dubbed hierarchical feature alignment network (HFAN). Specifically, the key merits in HFAN are the sequential Feature AlignMent (FAM) module and the Feature AdaptaTion (FAT) module, which are leveraged for processing the appearance and motion features hierarchically. FAM is capable of aligning both appearance and motion features with the primary object semantic representations, respectively. Further, FAT is explicitly designed for the adaptive fusion of appearance and motion features to achieve a desirable trade-off between cross-modal features. Extensive experiments demonstrate the effectiveness of the proposed HFAN, which reaches a new state-of-the-art performance on DAVIS-16, achieving 88.7 J&F\mathcal{J}\&\mathcal{F} Mean, i.e., a relative improvement of 3.5% over the best published result.Comment: Accepted by ECCV-202

    Attack is Good Augmentation: Towards Skeleton-Contrastive Representation Learning

    Full text link
    Contrastive learning, relying on effective positive and negative sample pairs, is beneficial to learn informative skeleton representations in unsupervised skeleton-based action recognition. To achieve these positive and negative pairs, existing weak/strong data augmentation methods have to randomly change the appearance of skeletons for indirectly pursuing semantic perturbations. However, such approaches have two limitations: 1) solely perturbing appearance cannot well capture the intrinsic semantic information of skeletons, and 2) randomly perturbation may change the original positive/negative pairs to soft positive/negative ones. To address the above dilemma, we start the first attempt to explore an attack-based augmentation scheme that additionally brings in direct semantic perturbation, for constructing hard positive pairs and further assisting in constructing hard negative pairs. In particular, we propose a novel Attack-Augmentation Mixing-Contrastive learning (A2^2MC) to contrast hard positive features and hard negative features for learning more robust skeleton representations. In A2^2MC, Attack-Augmentation (Att-Aug) is designed to collaboratively perform targeted and untargeted perturbations of skeletons via attack and augmentation respectively, for generating high-quality hard positive features. Meanwhile, Positive-Negative Mixer (PNM) is presented to mix hard positive features and negative features for generating hard negative features, which are adopted for updating the mixed memory banks. Extensive experiments on three public datasets demonstrate that A2^2MC is competitive with the state-of-the-art methods

    An experimental investigation on the mechanical properties of the interface between large-sized graphene and a flexible substrate

    Get PDF
    In this paper, the interfacial mechanical properties of large-sized monolayer graphene attached to a flexible polyethylene terephthalate (PET) substrate are investigated. Using a micro-tensile test and Raman spectroscopy, in situ measurements are taken to obtain the full-field deformation of graphene subjected to a uniaxial tensile loading and unloading cycle. The results of the full-field deformation are subsequently used to identify the status of the interface between the graphene and the substrate as one of perfect adhesion, one showing slide or partial debonding, and one that is fully debonded. The interfacial stress/strain transfer and the evolution of the interface from one status to another during the loading and unloading processes are discussed and the mechanical parameters, such as interfacial strength and interfacial shear strength, are obtained quantitatively demonstrating a relatively weak interface between large-sized graphene and PET

    [μ-1,1′-(Butane-1,4-di­yl)di-1H-benz­imidazole-κ2 N 3:N 3′]bis­{[N,N′-bis(car­boxy­meth­yl)ethyl­enediamine-N,N′-di­acetato-κ5 O,O′,O′′,N,N′]mercury(II)} methanol disolvate

    Get PDF
    The binuclear title complex, [Hg2(C10H14N2O8)2(C18H18N4)]·2CH3OH, lies on an inversion center with the unique HgII ion coordinated in a disorted octa­hedral environment with one Hg—N bond significantly shorter than the other two. In the crystal structure, inter­molecular O—H⋯O hydrogen bonds link complex and solvent mol­ecules into a three-dimensional network

    Improving Species Identification of Ancient Mammals Based on Next-Generation Sequencing Data

    Get PDF
    The taxonomical identification merely based on morphology is often difficult for ancient remains. Therefore, universal or specific PCR amplification followed by sequencing and BLAST (basic local alignment search tool) search has become the most frequently used genetic-based method for the species identification of biological samples, including ancient remains. However, it is challenging for these methods to process extremely ancient samples with severe DNA fragmentation and contamination. Here, we applied whole-genome sequencing data from 12 ancient samples with ages ranging from 2.7 to 700 kya to compare different mapping algorithms, and tested different reference databases, mapping similarities and query coverage to explore the best method and mapping parameters that can improve the accuracy of ancient mammal species identification. The selected method and parameters were tested using 152 ancient samples, and 150 of the samples were successfully identified. We further screened the BLAST-based mapping results according to the deamination characteristics of ancient DNA to improve the ability of ancient species identification. Our findings demonstrate a marked improvement to the normal procedures used for ancient species identification, which was achieved through defining the mapping and filtering guidelines to identify true ancient DNA sequences. The guidelines summarized in this study could be valuable in archaeology, paleontology, evolution, and forensic science. For the convenience of the scientific community, we wrote a software script with Perl, called AncSid, which is made available on GitHub

    (E)-N′-(5-Chloro-2-hydroxy­benzyl­idene)-4-(8-quinol­yloxy)butanohydrazide monohydrate

    Get PDF
    The crystal of the title Schiff base compound, C20H18ClN3O3·H2O, was twinned by a twofold rotation about (100). The asymmetric unit contains two crystallographically independent mol­ecules with similar conformations, and two water mol­ecules. The C=N—N angles of 115.7 (6) and 116.2 (6)° are significantly smaller than the ideal value of 120° expected for sp 2-hybridized N atoms and the dihedral angles between the benzene ring and quinoline ring system in the two mol­ecules are 52.5 (7) and 53.9 (7)°. The mol­ecules aggregate via C—Cl⋯π and π–π inter­actions [centroid–centroid distances = 3.696 (5)–3.892 (5) Å] and weak C—H⋯O inter­actions as parallel sheets, which are further linked by water mol­ecules through N—H⋯O and O—H⋯O hydrogen bonds into a supra­molecular two-dimensional network
    corecore