208 research outputs found

    Deep Cross-Modal Audio-Visual Generation

    Full text link
    Cross-modal audio-visual perception has been a long-lasting topic in psychology and neurology, and various studies have discovered strong correlations in human perception of auditory and visual stimuli. Despite works in computational multimodal modeling, the problem of cross-modal audio-visual generation has not been systematically studied in the literature. In this paper, we make the first attempt to solve this cross-modal generation problem leveraging the power of deep generative adversarial training. Specifically, we use conditional generative adversarial networks to achieve cross-modal audio-visual generation of musical performances. We explore different encoding methods for audio and visual signals, and work on two scenarios: instrument-oriented generation and pose-oriented generation. Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments. Our experiments using both classification and human evaluations demonstrate that our model has the ability to generate one modality, i.e., audio/visual, from the other modality, i.e., visual/audio, to a good extent. Our experiments on various design choices along with the datasets will facilitate future research in this new problem space

    Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

    Full text link
    We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons

    ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed

    Full text link
    Recent developments in neural speech synthesis and vocoding have sparked a renewed interest in voice conversion (VC). Beyond timbre transfer, achieving controllability on para-linguistic parameters such as pitch and Speed is critical in deploying VC systems in many application scenarios. Existing studies, however, either only provide utterance-level global control or lack interpretability on the controls. In this paper, we propose ControlVC, the first neural voice conversion system that achieves time-varying controls on pitch and speed. ControlVC uses pre-trained encoders to compute pitch and linguistic embeddings from the source utterance and speaker embeddings from the target utterance. These embeddings are then concatenated and converted to speech using a vocoder. It achieves speed control through TD-PSOLA pre-processing on the source utterance, and achieves pitch control by manipulating the pitch contour before feeding it to the pitch encoder. Systematic subjective and objective evaluations are conducted to assess the speech quality and controllability. Results show that, on non-parallel and zero-shot conversion tasks, ControlVC significantly outperforms two other self-constructed baselines on speech quality, and it can successfully achieve time-varying pitch and speed control.Comment: Audio samples: https://bit.ly/3PsrKLJ; Code: https://github.com/MelissaChen15/control-v

    Rapamycin Attenuated Zinc-Induced Tau Phosphorylation and Oxidative Stress in Rats: Involvement of Dual mTOR/p70S6K and Nrf2/HO-1 Pathways

    Full text link
    Alzheimer's disease is pathologically characterized by abnormal accumulation of amyloid-beta plaques, neurofibrillary tangles, oxidative stress, neuroinflammation, and neurodegeneration. Metal dysregulation, including excessive zinc released by presynaptic neurons, plays an important role in tau pathology and oxidase activation. The activities of mammalian target of rapamycin (mTOR)/ribosomal S6 protein kinase (p70S6K) are elevated in the brains of patients with Alzheimer's disease. Zinc induces tau hyperphosphorylation via mTOR/P70S6K activation in vitro. However, the involvement of the mTOR/P70S6K pathway in zinc-induced oxidative stress, tau degeneration, and synaptic and cognitive impairment has not been fully elucidated in vivo. Here, we assessed the effect of pathological zinc concentrations in SH-SY5Y cells by using biochemical assays and immunofluorescence staining. Rats (n = 18, male) were laterally ventricularly injected with zinc, treated with rapamycin (intraperitoneal injection) for 1 week, and assessed using the Morris water maze. Evaluation of oxidative stress, tau phosphorylation, and synaptic impairment was performed using the hippocampal tissue of the rats by biochemical assays and immunofluorescence staining. The results from the Morris water maze showed that the capacity of spatial memory was impaired in zinc-treated rats. Zinc sulfate significantly increased the levels of P-mTOR Ser2448, P-p70S6K Thr389, and P-tau Ser356 and decreased the levels of nuclear factor erythroid 2-related factor-2 (Nrf2) and heme oxygenase-1 (HO-1) in SH-SY5Y cells and in zinc-treated rats compared with the control groups. Increased expression of reactive oxygen species was observed in zinc sulfate-induced SH-SY5Y cells and in the hippocampus of zinc-injected rats. Rapamycin, an inhibitor of mTOR, rescued zinc-induced increases in mTOR/p70S6K activation, tau phosphorylation, and oxidative stress, and Nrf2/HO-1 inactivation, cognitive impairment, and synaptic impairment reduced the expression of synapse-related proteins in zinc-injected rats. In conclusion, our findings imply that rapamycin prevents zinc-induced cognitive impairment and protects neurons from tau pathology, oxidative stress, and synaptic impairment by decreasing mTOR/p70S6K hyperactivity and increasing Nrf2/HO-1 activity

    Towards Collaborative Intelligence: Routability Estimation based on Decentralized Private Data

    Full text link
    Applying machine learning (ML) in design flow is a popular trend in EDA with various applications from design quality predictions to optimizations. Despite its promise, which has been demonstrated in both academic researches and industrial tools, its effectiveness largely hinges on the availability of a large amount of high-quality training data. In reality, EDA developers have very limited access to the latest design data, which is owned by design companies and mostly confidential. Although one can commission ML model training to a design company, the data of a single company might be still inadequate or biased, especially for small companies. Such data availability problem is becoming the limiting constraint on future growth of ML for chip design. In this work, we propose an Federated-Learning based approach for well-studied ML applications in EDA. Our approach allows an ML model to be collaboratively trained with data from multiple clients but without explicit access to the data for respecting their data privacy. To further strengthen the results, we co-design a customized ML model FLNet and its personalization under the decentralized training scenario. Experiments on a comprehensive dataset show that collaborative training improves accuracy by 11% compared with individual local models, and our customized model FLNet significantly outperforms the best of previous routability estimators in this collaborative training flow.Comment: 6 pages, 2 figures, 5 tables, accepted by DAC'2

    Genetic polymorphisms in plasminogen activator inhibitor-1 predict susceptibility to steroid-induced osteonecrosis of the femoral head in Chinese population

    Get PDF
    BACKGROUND: Steroid usage has been considered as a leading cause of non-traumatic osteonecrosis of the femoral head (ONFH), which is involved in hypo-fibrinolysis and blood supply interruption. Genetic polymorphisms in plasminogen activator inhibitor-1 (PAI-1) have been demonstrated to be associated with ONFH risk in several populations. However, this relationship has not been established in Chinese population. The aim of this study was to investigate the association of PAI-1 gene polymorphisms with steroid-induced ONFH in a large cohort of Chinese population. METHODS: A case–control study was conducted, which included 94 and 106 unrelated patients after steroid administration recruited from 14 provinces in China, respectively. Two SNPs (rs11178 and rs2227631) within PAI-1 were genotyped using Sequenom MassARRAY system. RESULTS: rs2227631 SNP was significantly associated with steroid-induced ONFH group in codominant (P = 0.04) and recessive (P = 0.02) models. However, there were no differences found in genotype frequencies of rs11178 SNP between controls and patients with steroid-induced ONFH (all P > 0.05). CONCLUSIONS: Our data offer the convincing evidence for the first time that rs2227631 SNP of PAI-1 may be associated with the risk of steroid-induced ONFH, suggesting that the genetic variations of this gene may play an important role in the disease development. VIRTUAL SLIDES: The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1569909986109783
    • …
    corecore