26 research outputs found
Sematic understanding of large-scale outdoor web images: From emotion recognition to scene classification
Facial expression recognition and scene-based image clustering are very popular topics in the fields of human-computer interaction and computer vision. Their relationship has been rarely investigated but is a very attractive topic that has many potential applications, such as landscape design, instructions for vacation choices, or plant layout design in the public space. In this research, we use the existing deep learning algorithms to study two issues, i.e., facial expression recognition and scene-based image clustering for large scale outdoor web images. This research paves a path for a future attempt that explores their relationship in real-world images. First, we concentrate on emotion recognition and investigate the performance of the well-known algorithms including Visual Geometry Group Network (VGG network) and Residual Net (ResNet) on the emotions in images captured from a public park. Then we introduce some approaches to address the challenges of the occluded or children's faces. Our proposed pre-processing schemes not only allow the algorithm to detect more faces but also to increase the rate of recognition accuracy under the complex environment. We also investigate the visual analysis of landscape by introducing a set of scene labels for a large set of natural scene images collected from an online source. Then the weakly supervised method - Curriculum Net is applied for scene labeling of our dataset. In Curriculum Net, the training dataset is split into two parts, clean (easy) and noisy (hard) datasets by using a Density Peak Clustering algorithm, from which Curriculum Net is trained from easy to hard data. Particularly, we adopt a more effective density clustering method, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), to improve the clean-noisy separation of training images that leads to the improved scene labeling performance. By summarizing the work in emotion recognition and scene-based image clustering, we prepare the future research to reveal the relationship between the two aspects in real-world scenarios
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
Generative adversarial network (GAN)-based neural vocoders have been widely
used in audio synthesis tasks due to their high generation quality, efficient
inference, and small computation footprint. However, it is still challenging to
train a universal vocoder which can generalize well to out-of-domain (OOD)
scenarios, such as unseen speaking styles, non-speech vocalization, singing,
and musical pieces. In this work, we propose SnakeGAN, a GAN-based universal
vocoder, which can synthesize high-fidelity audio in various OOD scenarios.
SnakeGAN takes a coarse-grained signal generated by a differentiable digital
signal processing (DDSP) model as prior knowledge, aiming at recovering
high-fidelity waveform from a Mel-spectrogram. We introduce periodic
nonlinearities through the Snake activation function and anti-aliased
representation into the generator, which further brings the desired inductive
bias for audio synthesis and significantly improves the extrapolation capacity
for universal vocoding in unseen scenarios. To validate the effectiveness of
our proposed method, we train SnakeGAN with only speech data and evaluate its
performance for various OOD distributions with both subjective and objective
metrics. Experimental results show that SnakeGAN significantly outperforms the
compared approaches and can generate high-fidelity audio samples including
unseen speakers with unseen styles, singing voices, instrumental pieces, and
nonverbal vocalization.Comment: Accepted by ICME 202
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Recently, the utilization of extensive open-sourced text data has
significantly advanced the performance of text-based large language models
(LLMs). However, the use of in-the-wild large-scale speech data in the speech
technology community remains constrained. One reason for this limitation is
that a considerable amount of the publicly available speech data is compromised
by background noise, speech overlapping, lack of speech segmentation
information, missing speaker labels, and incomplete transcriptions, which can
largely hinder their usefulness. On the other hand, human annotation of speech
data is both time-consuming and costly. To address this issue, we introduce an
automatic in-the-wild speech data preprocessing framework (AutoPrep) in this
paper, which is designed to enhance speech quality, generate speaker labels,
and produce transcriptions automatically. The proposed AutoPrep framework
comprises six components: speech enhancement, speech segmentation, speaker
clustering, target speech extraction, quality filtering and automatic speech
recognition. Experiments conducted on the open-sourced WenetSpeech and our
self-collected AutoPrepWild corpora demonstrate that the proposed AutoPrep
framework can generate preprocessed data with similar DNSMOS and PDNSMOS scores
compared to several open-sourced TTS datasets. The corresponding TTS system can
achieve up to 0.68 in-domain speaker similarity
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Zero-shot speaker adaptation aims to clone an unseen speaker's voice without
any adaptation time and parameters. Previous researches usually use a speaker
encoder to extract a global fixed speaker embedding from reference speech, and
several attempts have tried variable-length speaker embedding. However, they
neglect to transfer the personal pronunciation characteristics related to
phoneme content, leading to poor speaker similarity in terms of detailed
speaking styles and pronunciation habits. To improve the ability of the speaker
encoder to model personal pronunciation characteristics, we propose
content-dependent fine-grained speaker embedding for zero-shot speaker
adaptation. The corresponding local content embeddings and speaker embeddings
are extracted from a reference speech, respectively. Instead of modeling the
temporal relations, a reference attention module is introduced to model the
content relevance between the reference speech and the input text, and to
generate the fine-grained speaker embedding for each phoneme encoder output.
The experimental results show that our proposed method can improve speaker
similarity of synthesized speeches, especially for unseen speakers.Comment: Submitted to Interspeech 202
A Review of Past, Present, and Future Technologies for Permanent Plugging and Abandonment of Wellbores and Restoration of Subsurface Geologic Barriers
Leakage of oil & gas during fossil fuel exploration, production, and transportation poses a major environmental challenge that impacts the quality of air, water, soil, and ultimately the life on Earth. The result of uncontrolled spills & leakages may cause the contamination of groundwater as well as methane emission into the atmosphere increasing global warming, while the spills in open waters in the case of offshore wellbores impact fragile marine ecosystems. The subsurface conditions where P&A materials need to be placed, the challenges encountered during barrier placement in wellbores under current P&A technologies, contaminations of barrier materials by drilling fluid and possible mitigation, and the future requirements of P&A materials and technology involved in restoring subsurface sealing barriers interrupted by drilling are discussed
Relationship between Adiponectin Gene Polymorphisms and Late-Onset Alzheimer's Disease.
In recent years, researchers have found that adiponectin (ANP) plays an important role in the pathogenesis of Alzheimer's disease (AD), and low serum concentrations of ANP are associated with AD. Higher plasma ANP level have a protective effect against the development of cognitive decline, suggesting that ANP may affect AD onset. Meanwhile, accumulating evidence supports the crucial role of ANP in the pathogenesis of AD. To study the relationship between ANP gene polymorphisms (rs266729, -11377C>G and rs1501299, G276T) and late-onset AD (LOAD), we carried out a case-control study that included 201 LOAD patients and 257 healthy control subjects. Statistically significant differences were detected in the genotype and allelotype frequency distributions of rs266729 and rs1501299 between the LOAD group and the control group, with a noticeable increase in the G and T allelotype frequency distributions in the LOAD group (P 0.05) between the LOAD group and control group, whereas the CG and GT haplotypes were significantly different (P < 0.05), suggesting a negative correlation between the CG haplotype and LOAD onset (OR = 0.74, 95% CI = 0.57-0.96, P = 0.022), and a positive correlation between the GT haplotype and LOAD onset (OR = 2.29, 95% CI = 1.42-3.68, P = 0.005). Therefore, we speculated that the rs266729 and rs1501299 of ANP gene polymorphisms and the GT and CG haplotypes were associated with LOAD
An Efficient Modular Gateway Recombinase-Based Gene Stacking System for Generating Multi-Trait Transgenic Plants
Transgenic technology can transfer favorable traits regardless of reproductive isolation and is an important method in plant synthetic biology and genetic improvement. Complex metabolic pathway modification and pyramiding breeding strategies often require the introduction of multiple genes at once, but the current vector assembly systems for constructing multigene expression cassettes are not completely satisfactory. In this study, a new in vitro gene stacking system, GuanNan Stacking (GNS), was developed. Through the introduction of Type IIS restriction enzyme-mediated Golden Gate cloning, GNS allows the modular, standardized assembly of target gene expression cassettes. Because of the introduction of Gateway recombination, GNS facilitates the cloning of superlarge transgene expression cassettes, allows multiple expression cassettes to be efficiently assembled in a binary vector simultaneously, and is compatible with the Cre enzyme-mediated marker deletion mechanism. The linked dual positive-negative marker selection strategy ensures the efficient acquisition of target recombinant plasmids without prokaryotic selection markers in the T-DNA region. The host-independent negative selection marker combined with the TAC backbone ensures the cloning and transfer of large T-DNAs (>100 kb). Using the GNS system, we constructed a binary vector containing five foreign gene expression cassettes and obtained transgenic rice carrying the target traits, proving that the method developed in this research is a powerful tool for plant metabolic engineering and compound trait transgenic breeding