46 research outputs found

    Audio Visual Speaker Localization from EgoCentric Views

    Full text link
    The use of audio and visual modality for speaker localization has been well studied in the literature by exploiting their complementary characteristics. However, most previous works employ the setting of static sensors mounted at fixed positions. Unlike them, in this work, we explore the ego-centric setting, where the heterogeneous sensors are embodied and could be moving with a human to facilitate speaker localization. Compared to the static scenario, the ego-centric setting is more realistic for smart-home applications e.g., a service robot. However, this also brings new challenges such as blurred images, frequent speaker disappearance from the field of view of the wearer, and occlusions. In this paper, we study egocentric audio-visual speaker DOA estimation and deal with the challenges mentioned above. Specifically, we propose a transformer-based audio-visual fusion method to estimate the relative DOA of the speaker to the wearer, and design a training strategy to mitigate the problem of the speaker disappearing from the camera's view. We also develop a new dataset for simulating the out-of-view scenarios, by creating a scene with a camera wearer walking around while a speaker is moving at the same time. The experimental results show that our proposed method offers promising performance in this new dataset in terms of tracking accuracy. Finally, we adapt the proposed method for the multi-speaker scenario. Experiments on EasyCom show the effectiveness of the proposed model for multiple speakers in real scenarios, which achieves state-of-the-art results in the sphere active speaker detection task and the wearer activity prediction task. The simulated dataset and related code are available at https://github.com/KawhiZhao/Egocentric-Audio-Visual-Speaker-Localization

    Enzymatic Synthesis of Functional Structured Lipids from Glycerol and Naturally Phenolic Antioxidants

    Get PDF
    Glycerol is a valuable by-product in biodiesel production by transesterification, hydrolysis reaction, and soap manufacturing by saponification. The conversion of glycerol into value-added products has attracted growing interest due to the dramatic growth of the biodiesel industry in recent years. Especially, phenolic structured lipids have been widely studied due to their influence on food quality, which have antioxidant properties for the lipid food preservation. Actually, they are triacylglycerols that have been modified with phenolic acids to change their positional distribution in glycerol backbone by enzymatically catalyzed reactions. Due to lipases’ fatty acid selectivity and regiospecificity, lipase-catalyzed reactions have been promoted for offering the advantage of greater control over the positional distribution of fatty acids in glycerol backbone. Moreover, microreactors were applied in a wide range of enzymatic applications. Nowadays, phenolic structured lipids have attracted attention for their applications in cosmetic, pharmaceutical, and food industries, which definitely provide attributes that consumers will find valuable. Therefore, it is important that further research be conducted that will allow for better understanding and more control over the various esterification/transesterification processes and reduction in costs associated with large-scale production of the bioconversion of glycerol. The investigated approach is a promising and environmentally safe route for value-added products from glycerol

    Research on method of vibration analysis of rubber tracked vehicle based on dynamic model

    Get PDF
    To understand the vibration characteristics of rubber track system in traveling, this research studied the small harvester installed with rubber track system and the dynamic model reflecting vibration characteristics of rubber track system on the ground was constructed. Comparing analysis results with measured experimental data obtained from vehicle test, it is proved that the dynamic model established by theoretical analysis can correctly and effectively predict actual movement condition and vibration characteristics of rubber track system, especially at low test vehicle speeds. The relative difference between measured data of vibration acceleration obtained from real vehicle tests and the theoretical value was in the range of –1.2 %-+18.2 %. The vibration prediction and analysis method of rubber tracked vehicle was discussed in this study, and important basic data were provided for the research of comfort evaluation of working posture and lightweight design of rubber tracked mechanism

    Inhibitory effect and underlying mechanism of cinnamon and clove essential oils on Botryosphaeria dothidea and Colletotrichum gloeosporioides causing rots in postharvest bagging-free apple fruits

    Get PDF
    Bagging-free apple is more vulnerable to postharvest disease, which severely limits the cultivation pattern transformation of the apple industry in China. This study aimed to ascertain the dominant pathogens in postharvest bagging-free apples, to evaluate the efficacy of essential oil (EO) on inhibition of fungal growth, and to further clarify the molecular mechanism of this action. By morphological characteristics and rDNA sequence analyses, Botryosphaeria dothidea (B. dothidea) and Colletotrichum gloeosporioides (C. gloeosporioides) were identified as the main pathogens isolated from decayed bagging-free apples. Cinnamon and clove EO exhibited high inhibitory activities against mycelial growth both in vapor and contact phases under in vitro conditions. EO vapor at a concentration of 60 μL L−1 significantly reduced the incidence and lesion diameter of inoculated decay in vivo. Observations using a scanning electron microscope (SEM) and transmission electron microscope (TEM) revealed that EO changed the mycelial morphology and cellular ultrastructure and destroyed the integrity and structure of cell membranes and major organelles. Using RNA sequencing and bioinformatics, it was demonstrated that clove EO treatment impaired the cell membrane integrity and biological function via downregulating the genes involved in the membrane component and transmembrane transport. Simultaneously, a stronger binding affinity of trans-cinnamaldehyde and eugenol with CYP51 was assessed by in silico analysis, attenuating the activity of this ergosterol synthesis enzyme. Moreover, pronounced alternations in the oxidation/reduction reaction and critical materials metabolism of clove EO-treated C. gloeosporioides were also observed from transcriptomic data. Altogether, these findings contributed novel antimicrobial cellular and molecular mechanisms of EO, suggesting its potential use as a natural and useful preservative for controlling postharvest spoilage in bagging-free apples

    Clarifying the mechanisms of the light-induced color formation of apple peel under dark conditions through metabolomics and transcriptomic analyses

    Get PDF
    Many studies have demonstrated that anthocyanin synthesis in apple peel is induced by light, but the color of bagged apple peel continues to change under dark conditions after light induction has not been characterized. Here, transcriptional and metabolic changes associated with changes in apple peel coloration in the dark after different light induction treatments were studied. Apple pericarp can achieve a normal color under complete darkness followed by light induction. Metabolomics analysis indicated that the expression levels of cyanidin-3-O-galactoside and cyanidin-3-O-glucoside were high, which might be associated with the red color development of apple peel. Transcriptome analysis revealed high expression levels of MdUFGTs, MdMYBs, and MdNACs, which might play a key role in light-induced anthocyanin accumulation under dark conditions. 13 key genes related to dark coloring after light induction was screened. The results of this study provide new insights into the mechanism of anthocyanin synthesis under dark conditions

    Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

    Full text link
    Zero-shot text-to-speech aims at synthesizing voices with unseen speech prompts. Previous large-scale multispeaker TTS models have successfully achieved this goal with an enrolled recording within 10 seconds. However, most of them are designed to utilize only short speech prompts. The limited information in short speech prompts significantly hinders the performance of fine-grained identity imitation. In this paper, we introduce Mega-TTS 2, a generic zero-shot multispeaker TTS model that is capable of synthesizing speech for unseen speakers with arbitrary-length prompts. Specifically, we 1) design a multi-reference timbre encoder to extract timbre information from multiple reference speeches; 2) and train a prosody language model with arbitrary-length speech prompts; With these designs, our model is suitable for prompts of different lengths, which extends the upper bound of speech quality for zero-shot text-to-speech. Besides arbitrary-length prompts, we introduce arbitrary-source prompts, which leverages the probabilities derived from multiple P-LLM outputs to produce expressive and controlled prosody. Furthermore, we propose a phoneme-level auto-regressive duration model to introduce in-context learning capabilities to duration modeling. Experiments demonstrate that our method could not only synthesize identity-preserving speech with a short prompt of an unseen speaker but also achieve improved performance with longer speech prompts. Audio samples can be found in https://mega-tts.github.io/mega2_demo/

    Deep Lesion Graphs in the Wild: Relationship Learning and Organization of Significant Radiology Image Findings in a Diverse Large-scale Lesion Database

    Full text link
    Radiologists in their daily work routinely find and annotate significant abnormalities on a large number of radiology images. Such abnormalities, or lesions, have collected over years and stored in hospitals' picture archiving and communication systems. However, they are basically unsorted and lack semantic annotations like type and location. In this paper, we aim to organize and explore them by learning a deep feature representation for each lesion. A large-scale and comprehensive dataset, DeepLesion, is introduced for this task. DeepLesion contains bounding boxes and size measurements of over 32K lesions. To model their similarity relationship, we leverage multiple supervision information including types, self-supervised location coordinates and sizes. They require little manual annotation effort but describe useful attributes of the lesions. Then, a triplet network is utilized to learn lesion embeddings with a sequential sampling strategy to depict their hierarchical similarity structure. Experiments show promising qualitative and quantitative results on lesion retrieval, clustering, and classification. The learned embeddings can be further employed to build a lesion graph for various clinically useful applications. We propose algorithms for intra-patient lesion matching and missing annotation mining. Experimental results validate their effectiveness.Comment: Accepted by CVPR2018. DeepLesion url adde
    corecore