26 research outputs found

    Sematic understanding of large-scale outdoor web images: From emotion recognition to scene classification

    Get PDF
    Facial expression recognition and scene-based image clustering are very popular topics in the fields of human-computer interaction and computer vision. Their relationship has been rarely investigated but is a very attractive topic that has many potential applications, such as landscape design, instructions for vacation choices, or plant layout design in the public space. In this research, we use the existing deep learning algorithms to study two issues, i.e., facial expression recognition and scene-based image clustering for large scale outdoor web images. This research paves a path for a future attempt that explores their relationship in real-world images. First, we concentrate on emotion recognition and investigate the performance of the well-known algorithms including Visual Geometry Group Network (VGG network) and Residual Net (ResNet) on the emotions in images captured from a public park. Then we introduce some approaches to address the challenges of the occluded or children's faces. Our proposed pre-processing schemes not only allow the algorithm to detect more faces but also to increase the rate of recognition accuracy under the complex environment. We also investigate the visual analysis of landscape by introducing a set of scene labels for a large set of natural scene images collected from an online source. Then the weakly supervised method - Curriculum Net is applied for scene labeling of our dataset. In Curriculum Net, the training dataset is split into two parts, clean (easy) and noisy (hard) datasets by using a Density Peak Clustering algorithm, from which Curriculum Net is trained from easy to hard data. Particularly, we adopt a more effective density clustering method, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), to improve the clean-noisy separation of training images that leads to the improved scene labeling performance. By summarizing the work in emotion recognition and scene-based image clustering, we prepare the future research to reveal the relationship between the two aspects in real-world scenarios

    SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias

    Full text link
    Generative adversarial network (GAN)-based neural vocoders have been widely used in audio synthesis tasks due to their high generation quality, efficient inference, and small computation footprint. However, it is still challenging to train a universal vocoder which can generalize well to out-of-domain (OOD) scenarios, such as unseen speaking styles, non-speech vocalization, singing, and musical pieces. In this work, we propose SnakeGAN, a GAN-based universal vocoder, which can synthesize high-fidelity audio in various OOD scenarios. SnakeGAN takes a coarse-grained signal generated by a differentiable digital signal processing (DDSP) model as prior knowledge, aiming at recovering high-fidelity waveform from a Mel-spectrogram. We introduce periodic nonlinearities through the Snake activation function and anti-aliased representation into the generator, which further brings the desired inductive bias for audio synthesis and significantly improves the extrapolation capacity for universal vocoding in unseen scenarios. To validate the effectiveness of our proposed method, we train SnakeGAN with only speech data and evaluate its performance for various OOD distributions with both subjective and objective metrics. Experimental results show that SnakeGAN significantly outperforms the compared approaches and can generate high-fidelity audio samples including unseen speakers with unseen styles, singing voices, instrumental pieces, and nonverbal vocalization.Comment: Accepted by ICME 202

    AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

    Full text link
    Recently, the utilization of extensive open-sourced text data has significantly advanced the performance of text-based large language models (LLMs). However, the use of in-the-wild large-scale speech data in the speech technology community remains constrained. One reason for this limitation is that a considerable amount of the publicly available speech data is compromised by background noise, speech overlapping, lack of speech segmentation information, missing speaker labels, and incomplete transcriptions, which can largely hinder their usefulness. On the other hand, human annotation of speech data is both time-consuming and costly. To address this issue, we introduce an automatic in-the-wild speech data preprocessing framework (AutoPrep) in this paper, which is designed to enhance speech quality, generate speaker labels, and produce transcriptions automatically. The proposed AutoPrep framework comprises six components: speech enhancement, speech segmentation, speaker clustering, target speech extraction, quality filtering and automatic speech recognition. Experiments conducted on the open-sourced WenetSpeech and our self-collected AutoPrepWild corpora demonstrate that the proposed AutoPrep framework can generate preprocessed data with similar DNSMOS and PDNSMOS scores compared to several open-sourced TTS datasets. The corresponding TTS system can achieve up to 0.68 in-domain speaker similarity

    Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis

    Full text link
    Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any adaptation time and parameters. Previous researches usually use a speaker encoder to extract a global fixed speaker embedding from reference speech, and several attempts have tried variable-length speaker embedding. However, they neglect to transfer the personal pronunciation characteristics related to phoneme content, leading to poor speaker similarity in terms of detailed speaking styles and pronunciation habits. To improve the ability of the speaker encoder to model personal pronunciation characteristics, we propose content-dependent fine-grained speaker embedding for zero-shot speaker adaptation. The corresponding local content embeddings and speaker embeddings are extracted from a reference speech, respectively. Instead of modeling the temporal relations, a reference attention module is introduced to model the content relevance between the reference speech and the input text, and to generate the fine-grained speaker embedding for each phoneme encoder output. The experimental results show that our proposed method can improve speaker similarity of synthesized speeches, especially for unseen speakers.Comment: Submitted to Interspeech 202

    A Review of Past, Present, and Future Technologies for Permanent Plugging and Abandonment of Wellbores and Restoration of Subsurface Geologic Barriers

    No full text
    Leakage of oil & gas during fossil fuel exploration, production, and transportation poses a major environmental challenge that impacts the quality of air, water, soil, and ultimately the life on Earth. The result of uncontrolled spills & leakages may cause the contamination of groundwater as well as methane emission into the atmosphere increasing global warming, while the spills in open waters in the case of offshore wellbores impact fragile marine ecosystems. The subsurface conditions where P&A materials need to be placed, the challenges encountered during barrier placement in wellbores under current P&A technologies, contaminations of barrier materials by drilling fluid and possible mitigation, and the future requirements of P&A materials and technology involved in restoring subsurface sealing barriers interrupted by drilling are discussed

    Relationship between Adiponectin Gene Polymorphisms and Late-Onset Alzheimer's Disease.

    No full text
    In recent years, researchers have found that adiponectin (ANP) plays an important role in the pathogenesis of Alzheimer's disease (AD), and low serum concentrations of ANP are associated with AD. Higher plasma ANP level have a protective effect against the development of cognitive decline, suggesting that ANP may affect AD onset. Meanwhile, accumulating evidence supports the crucial role of ANP in the pathogenesis of AD. To study the relationship between ANP gene polymorphisms (rs266729, -11377C>G and rs1501299, G276T) and late-onset AD (LOAD), we carried out a case-control study that included 201 LOAD patients and 257 healthy control subjects. Statistically significant differences were detected in the genotype and allelotype frequency distributions of rs266729 and rs1501299 between the LOAD group and the control group, with a noticeable increase in the G and T allelotype frequency distributions in the LOAD group (P 0.05) between the LOAD group and control group, whereas the CG and GT haplotypes were significantly different (P < 0.05), suggesting a negative correlation between the CG haplotype and LOAD onset (OR = 0.74, 95% CI = 0.57-0.96, P = 0.022), and a positive correlation between the GT haplotype and LOAD onset (OR = 2.29, 95% CI = 1.42-3.68, P = 0.005). Therefore, we speculated that the rs266729 and rs1501299 of ANP gene polymorphisms and the GT and CG haplotypes were associated with LOAD

    An Efficient Modular Gateway Recombinase-Based Gene Stacking System for Generating Multi-Trait Transgenic Plants

    No full text
    Transgenic technology can transfer favorable traits regardless of reproductive isolation and is an important method in plant synthetic biology and genetic improvement. Complex metabolic pathway modification and pyramiding breeding strategies often require the introduction of multiple genes at once, but the current vector assembly systems for constructing multigene expression cassettes are not completely satisfactory. In this study, a new in vitro gene stacking system, GuanNan Stacking (GNS), was developed. Through the introduction of Type IIS restriction enzyme-mediated Golden Gate cloning, GNS allows the modular, standardized assembly of target gene expression cassettes. Because of the introduction of Gateway recombination, GNS facilitates the cloning of superlarge transgene expression cassettes, allows multiple expression cassettes to be efficiently assembled in a binary vector simultaneously, and is compatible with the Cre enzyme-mediated marker deletion mechanism. The linked dual positive-negative marker selection strategy ensures the efficient acquisition of target recombinant plasmids without prokaryotic selection markers in the T-DNA region. The host-independent negative selection marker combined with the TAC backbone ensures the cloning and transfer of large T-DNAs (>100 kb). Using the GNS system, we constructed a binary vector containing five foreign gene expression cassettes and obtained transgenic rice carrying the target traits, proving that the method developed in this research is a powerful tool for plant metabolic engineering and compound trait transgenic breeding
    corecore