138 research outputs found

    Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning

    Full text link
    Audio-visual speech recognition (AVSR) has gained remarkable success for ameliorating the noise-robustness of speech recognition. Mainstream methods focus on fusing audio and visual inputs to obtain modality-invariant representations. However, such representations are prone to over-reliance on audio modality as it is much easier to recognize than video modality in clean conditions. As a result, the AVSR model underestimates the importance of visual stream in face of noise corruption. To this end, we leverage visual modality-specific representations to provide stable complementary information for the AVSR task. Specifically, we propose a reinforcement learning (RL) based framework called MSRL, where the agent dynamically harmonizes modality-invariant and modality-specific representations in the auto-regressive decoding process. We customize a reward function directly related to task-specific metrics (i.e., word error rate), which encourages the MSRL to effectively explore the optimal integration strategy. Experimental results on the LRS3 dataset show that the proposed method achieves state-of-the-art in both clean and various noisy conditions. Furthermore, we demonstrate the better generality of MSRL system than other baselines when test set contains unseen noises.Comment: Accepted by AAAI202

    Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

    Get PDF
    12 pages, 5 figures, Accepted by IJCAI 2023Preprin

    Paeonol Ameliorates Glucose and Lipid Metabolism in Experimental Diabetes by Activating Akt

    Get PDF
    Our previous study proved that paeonol (Pae) could lower blood glucose levels of diabetic mice. There are also a few reports of its potential use for diabetes treatment. However, the role of Pae in regulating glucose and lipid metabolism in diabetes remains largely unknown. Considering the critical role of serine/threonine kinase B (Akt) in glucose and lipid metabolism, we explored whether Pae could improve glucose and lipid metabolism disorders via Akt. Here, we found that Pae attenuated fasting blood glucose, glycosylated serum protein, serum cholesterol and triglyceride (TG), hepatic glycogen, cholesterol and TG in diabetic mice. Moreover, Pae enhanced glucokinase (GCK) and low-density lipoprotein receptor (LDLR) protein expressions, and increased the phosphorylation of Akt. In insulin-resistant HepG2 cells, Pae increased glucose uptake and decreased lipid accumulation. What’s more, Pae elevated LDLR and GCK expressions as well as Akt phosphorylation, which was consistent with the in vivo results. Knockdown and inhibition experiments of Akt revealed that Pae regulated LDLR and GCK expressions through activation of Akt. Finally, molecular docking assay indicated the steady hydrogen bond was formed between Pae and Akt2. Experiments above suggested that Pae ameliorated glucose and lipid metabolism disorders and the underlying mechanism was closely related to the activation of Akt

    Lipids, lipid-lowering agents, and inflammatory bowel disease: a Mendelian randomization study

    Get PDF
    BackgroundTo assess the causal role of lipid traits and lipid-lowering agents in inflammatory bowel disease (IBD).MethodsUnivariable mendelian randomization (MR) and multivariable MR (MVMR) analyses were conducted to evaluate the causal association between low-density lipoprotein cholesterol (LDL-C), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C) and IBD. Drug-targeted MR analyzed the effects of lipid-lowering drugs on IBD, and network MR was used to analyze potential mediation effects.ResultsThe levels of HDL-C had an inverse relationship with the risk of Crohn’s disease (CD, OR: 0.85, 95% CI: 0.73-0.98, P = 0.024). In MVMR, the inverse relationships were found in all three outcomes. Drug-targeted MR analyses showed that with one-SD LDL-C decrease predicted by variants at or near proprotein convertase subtilisin/kexin type 9 (PCSK9), the OR values of people diagnosed with IBD, ulcerative colitis (UC) and CD were 1.75 (95%CI: 1.13-2.69, P = 0.011), 2.1 (95%CI: 1.28-3.42, P = 0.003) and 2.24 (95%CI: 1.11-4.5, P = 0.024), respectively. With one-SD LDL-C decrease predicted by variants at or near cholesteryl ester transfer protein (CETP), the OR value of people diagnosed with CD was 0.12 (95%CI: 0.03-0.51, P = 0.004). Network-MR showed that HDL-C mediated the causal pathway from variants at or near CETP to CD.ConclusionOur study suggested a causal association between HDL-C and IBD, UC and CD. Genetically proxied inhibition of PCSK9 increased the risk of IBD, UC and CD, while inhibition of CETP decreased the risk of CD. Further studies are needed to clarify the long-term effect of lipid-lowering drugs on the gastrointestinal disorders

    A Novel Unsupervised Video Anomaly Detection Framework Based on Optical Flow Reconstruction and Erased Frame Prediction

    Get PDF
    Reconstruction-based and prediction-based approaches are widely used for video anomaly detection (VAD) in smart city surveillance applications. However, neither of these approaches can effectively utilize the rich contextual information that exists in videos, which makes it difficult to accurately perceive anomalous activities. In this paper, we exploit the idea of a training model based on the “Cloze Test” strategy in natural language processing (NLP) and introduce a novel unsupervised learning framework to encode both motion and appearance information at an object level. Specifically, to store the normal modes of video activity reconstructions, we first design an optical stream memory network with skip connections. Secondly, we build a space–time cube (STC) for use as the basic processing unit of the model and erase a patch in the STC to form the frame to be reconstructed. This enables a so-called ”incomplete event (IE)” to be completed. On this basis, a conditional autoencoder is utilized to capture the high correspondence between optical flow and STC. The model predicts erased patches in IEs based on the context of the front and back frames. Finally, we employ a generating adversarial network (GAN)-based training method to improve the performance of VAD. By distinguishing the predicted erased optical flow and erased video frame, the anomaly detection results are shown to be more reliable with our proposed method which can help reconstruct the original video in IE. Comparative experiments conducted on the benchmark UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets demonstrate AUROC scores reaching 97.7%, 89.7%, and 75.8%, respectively
    corecore