Search CORE

48 research outputs found

Audio Visual Speaker Localization from EgoCentric Views

Author: Qian Xinyuan
Wang Wenwu
Xu Yong
Zhao Jinzheng
Publication venue
Publication date: 28/09/2023
Field of study

The use of audio and visual modality for speaker localization has been well studied in the literature by exploiting their complementary characteristics. However, most previous works employ the setting of static sensors mounted at fixed positions. Unlike them, in this work, we explore the ego-centric setting, where the heterogeneous sensors are embodied and could be moving with a human to facilitate speaker localization. Compared to the static scenario, the ego-centric setting is more realistic for smart-home applications e.g., a service robot. However, this also brings new challenges such as blurred images, frequent speaker disappearance from the field of view of the wearer, and occlusions. In this paper, we study egocentric audio-visual speaker DOA estimation and deal with the challenges mentioned above. Specifically, we propose a transformer-based audio-visual fusion method to estimate the relative DOA of the speaker to the wearer, and design a training strategy to mitigate the problem of the speaker disappearing from the camera's view. We also develop a new dataset for simulating the out-of-view scenarios, by creating a scene with a camera wearer walking around while a speaker is moving at the same time. The experimental results show that our proposed method offers promising performance in this new dataset in terms of tracking accuracy. Finally, we adapt the proposed method for the multi-speaker scenario. Experiments on EasyCom show the effectiveness of the proposed model for multiple speakers in real scenarios, which achieves state-of-the-art results in the sphere active speaker detection task and the wearer activity prediction task. The simulated dataset and related code are available at https://github.com/KawhiZhao/Egocentric-Audio-Visual-Speaker-Localization

arXiv.org e-Print Archive

Enzymatic Synthesis of Functional Structured Lipids from Glycerol and Naturally Phenolic Antioxidants

Author: Chen Shulin
Hu Yan
Wang Jinzheng
Wang Jun
Zhu Linlin
Publication venue: 'IntechOpen'
Publication date: 19/04/2019
Field of study

Glycerol is a valuable by-product in biodiesel production by transesterification, hydrolysis reaction, and soap manufacturing by saponification. The conversion of glycerol into value-added products has attracted growing interest due to the dramatic growth of the biodiesel industry in recent years. Especially, phenolic structured lipids have been widely studied due to their influence on food quality, which have antioxidant properties for the lipid food preservation. Actually, they are triacylglycerols that have been modified with phenolic acids to change their positional distribution in glycerol backbone by enzymatically catalyzed reactions. Due to lipases’ fatty acid selectivity and regiospecificity, lipase-catalyzed reactions have been promoted for offering the advantage of greater control over the positional distribution of fatty acids in glycerol backbone. Moreover, microreactors were applied in a wide range of enzymatic applications. Nowadays, phenolic structured lipids have attracted attention for their applications in cosmetic, pharmaceutical, and food industries, which definitely provide attributes that consumers will find valuable. Therefore, it is important that further research be conducted that will allow for better understanding and more control over the various esterification/transesterification processes and reduction in costs associated with large-scale production of the bioconversion of glycerol. The investigated approach is a promising and environmentally safe route for value-added products from glycerol

IntechOpen

Crossref

Research on method of vibration analysis of rubber tracked vehicle based on dynamic model

Author: Jinzheng Zhang
Qi Wang
Qichun Jin
Publication venue: 'JVE International Ltd.'
Publication date: 01/03/2018
Field of study

To understand the vibration characteristics of rubber track system in traveling, this research studied the small harvester installed with rubber track system and the dynamic model reflecting vibration characteristics of rubber track system on the ground was constructed. Comparing analysis results with measured experimental data obtained from vehicle test, it is proved that the dynamic model established by theoretical analysis can correctly and effectively predict actual movement condition and vibration characteristics of rubber track system, especially at low test vehicle speeds. The relative difference between measured data of vibration acceleration obtained from real vehicle tests and the theoretical value was in the range of –1.2 %-+18.2 %. The vibration prediction and analysis method of rubber tracked vehicle was discussed in this study, and important basic data were provided for the research of comfort evaluation of working posture and lightweight design of rubber tracked mechanism

Crossref

Maintenance, Reliability and Condition Monitoring

Directory of Open Access Journals

JVE International

Journal of Mechatronics and Artificial Intelligence in Engineering

Inhibitory effect and underlying mechanism of cinnamon and clove essential oils on Botryosphaeria dothidea and Colletotrichum gloeosporioides causing rots in postharvest bagging-free apple fruits

Author: Dan Wang
Guiping Wang
Hao Zhai
Jinzheng Wang
Xiaomin Xue
Publication venue: 'Frontiers Media SA'
Publication date: 01/02/2023
Field of study

Bagging-free apple is more vulnerable to postharvest disease, which severely limits the cultivation pattern transformation of the apple industry in China. This study aimed to ascertain the dominant pathogens in postharvest bagging-free apples, to evaluate the efficacy of essential oil (EO) on inhibition of fungal growth, and to further clarify the molecular mechanism of this action. By morphological characteristics and rDNA sequence analyses, Botryosphaeria dothidea (B. dothidea) and Colletotrichum gloeosporioides (C. gloeosporioides) were identified as the main pathogens isolated from decayed bagging-free apples. Cinnamon and clove EO exhibited high inhibitory activities against mycelial growth both in vapor and contact phases under in vitro conditions. EO vapor at a concentration of 60 μL L−1 significantly reduced the incidence and lesion diameter of inoculated decay in vivo. Observations using a scanning electron microscope (SEM) and transmission electron microscope (TEM) revealed that EO changed the mycelial morphology and cellular ultrastructure and destroyed the integrity and structure of cell membranes and major organelles. Using RNA sequencing and bioinformatics, it was demonstrated that clove EO treatment impaired the cell membrane integrity and biological function via downregulating the genes involved in the membrane component and transmembrane transport. Simultaneously, a stronger binding affinity of trans-cinnamaldehyde and eugenol with CYP51 was assessed by in silico analysis, attenuating the activity of this ergosterol synthesis enzyme. Moreover, pronounced alternations in the oxidation/reduction reaction and critical materials metabolism of clove EO-treated C. gloeosporioides were also observed from transcriptomic data. Altogether, these findings contributed novel antimicrobial cellular and molecular mechanisms of EO, suggesting its potential use as a natural and useful preservative for controlling postharvest spoilage in bagging-free apples

Directory of Open Access Journals

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

Author: Berghi Davide
Cui Meng
Jackson Philip J. B.
Qian Xinyuan
Sun Jianyuan
Wang Wenwu
Wu Peipei
Xu Yong
Zhao Jinzheng
Publication venue
Publication date: 17/12/2023
Field of study

Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we conduct a comprehensive overview of audio-visual speaker tracking. To our knowledge, this is the first extensive survey over the past five years. We introduce the family of Bayesian filters and summarize the methods for obtaining audio-visual measurements. In addition, the existing trackers and their performance on AV16.3 dataset are summarized. In the past few years, deep learning techniques have thrived, which also boosts the development of audio visual speaker tracking. The influence of deep learning techniques in terms of measurement extraction and state estimation is also discussed. At last, we discuss the connections between audio-visual speaker tracking and other areas such as speech separation and distributed speaker tracking

arXiv.org e-Print Archive

Clarifying the mechanisms of the light-induced color formation of apple peel under dark conditions through metabolomics and transcriptomic analyses

Author: Jinzheng Wang
Ru Chen
Shoule Tian
Xianyan Zhao
Xiaomin Xue
Xueping Han
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2022
Field of study

Many studies have demonstrated that anthocyanin synthesis in apple peel is induced by light, but the color of bagged apple peel continues to change under dark conditions after light induction has not been characterized. Here, transcriptional and metabolic changes associated with changes in apple peel coloration in the dark after different light induction treatments were studied. Apple pericarp can achieve a normal color under complete darkness followed by light induction. Metabolomics analysis indicated that the expression levels of cyanidin-3-O-galactoside and cyanidin-3-O-glucoside were high, which might be associated with the red color development of apple peel. Transcriptome analysis revealed high expression levels of MdUFGTs, MdMYBs, and MdNACs, which might play a key role in light-induced anthocyanin accumulation under dark conditions. 13 key genes related to dark coloring after light induction was screened. The results of this study provide new insights into the mechanism of anthocyanin synthesis under dark conditions

Directory of Open Access Journals

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Author: He Jinzheng
Jiang Ziyue
Liu Jinglin
Ma Zejun
Ren Yi
Wang Chunfeng
Wei Pengfei
Ye Zhenhui
Yin Xiang
Zhang Chen
Zhao Zhou
Publication venue
Publication date: 14/07/2023
Field of study

Zero-shot text-to-speech aims at synthesizing voices with unseen speech prompts. Previous large-scale multispeaker TTS models have successfully achieved this goal with an enrolled recording within 10 seconds. However, most of them are designed to utilize only short speech prompts. The limited information in short speech prompts significantly hinders the performance of fine-grained identity imitation. In this paper, we introduce Mega-TTS 2, a generic zero-shot multispeaker TTS model that is capable of synthesizing speech for unseen speakers with arbitrary-length prompts. Specifically, we 1) design a multi-reference timbre encoder to extract timbre information from multiple reference speeches; 2) and train a prosody language model with arbitrary-length speech prompts; With these designs, our model is suitable for prompts of different lengths, which extends the upper bound of speech quality for zero-shot text-to-speech. Besides arbitrary-length prompts, we introduce arbitrary-source prompts, which leverages the probabilities derived from multiple P-LLM outputs to produce expressive and controlled prosody. Furthermore, we propose a phoneme-level auto-regressive duration model to introduce in-context learning capabilities to duration modeling. Experiments demonstrate that our method could not only synthesize identity-preserving speech with a short prompt of an unseen speaker but also achieve improved performance with longer speech prompts. Audio samples can be found in https://mega-tts.github.io/mega2_demo/

arXiv.org e-Print Archive