48 research outputs found
Audio Visual Speaker Localization from EgoCentric Views
The use of audio and visual modality for speaker localization has been well
studied in the literature by exploiting their complementary characteristics.
However, most previous works employ the setting of static sensors mounted at
fixed positions. Unlike them, in this work, we explore the ego-centric setting,
where the heterogeneous sensors are embodied and could be moving with a human
to facilitate speaker localization. Compared to the static scenario, the
ego-centric setting is more realistic for smart-home applications e.g., a
service robot. However, this also brings new challenges such as blurred images,
frequent speaker disappearance from the field of view of the wearer, and
occlusions. In this paper, we study egocentric audio-visual speaker DOA
estimation and deal with the challenges mentioned above. Specifically, we
propose a transformer-based audio-visual fusion method to estimate the relative
DOA of the speaker to the wearer, and design a training strategy to mitigate
the problem of the speaker disappearing from the camera's view. We also develop
a new dataset for simulating the out-of-view scenarios, by creating a scene
with a camera wearer walking around while a speaker is moving at the same time.
The experimental results show that our proposed method offers promising
performance in this new dataset in terms of tracking accuracy. Finally, we
adapt the proposed method for the multi-speaker scenario. Experiments on
EasyCom show the effectiveness of the proposed model for multiple speakers in
real scenarios, which achieves state-of-the-art results in the sphere active
speaker detection task and the wearer activity prediction task. The simulated
dataset and related code are available at
https://github.com/KawhiZhao/Egocentric-Audio-Visual-Speaker-Localization
Enzymatic Synthesis of Functional Structured Lipids from Glycerol and Naturally Phenolic Antioxidants
Glycerol is a valuable by-product in biodiesel production by transesterification, hydrolysis reaction, and soap manufacturing by saponification. The conversion of glycerol into value-added products has attracted growing interest due to the dramatic growth of the biodiesel industry in recent years. Especially, phenolic structured lipids have been widely studied due to their influence on food quality, which have antioxidant properties for the lipid food preservation. Actually, they are triacylglycerols that have been modified with phenolic acids to change their positional distribution in glycerol backbone by enzymatically catalyzed reactions. Due to lipases’ fatty acid selectivity and regiospecificity, lipase-catalyzed reactions have been promoted for offering the advantage of greater control over the positional distribution of fatty acids in glycerol backbone. Moreover, microreactors were applied in a wide range of enzymatic applications. Nowadays, phenolic structured lipids have attracted attention for their applications in cosmetic, pharmaceutical, and food industries, which definitely provide attributes that consumers will find valuable. Therefore, it is important that further research be conducted that will allow for better understanding and more control over the various esterification/transesterification processes and reduction in costs associated with large-scale production of the bioconversion of glycerol. The investigated approach is a promising and environmentally safe route for value-added products from glycerol
Research on method of vibration analysis of rubber tracked vehicle based on dynamic model
To understand the vibration characteristics of rubber track system in traveling, this research studied the small harvester installed with rubber track system and the dynamic model reflecting vibration characteristics of rubber track system on the ground was constructed. Comparing analysis results with measured experimental data obtained from vehicle test, it is proved that the dynamic model established by theoretical analysis can correctly and effectively predict actual movement condition and vibration characteristics of rubber track system, especially at low test vehicle speeds. The relative difference between measured data of vibration acceleration obtained from real vehicle tests and the theoretical value was in the range of –1.2 %-+18.2 %. The vibration prediction and analysis method of rubber tracked vehicle was discussed in this study, and important basic data were provided for the research of comfort evaluation of working posture and lightweight design of rubber tracked mechanism
Inhibitory effect and underlying mechanism of cinnamon and clove essential oils on Botryosphaeria dothidea and Colletotrichum gloeosporioides causing rots in postharvest bagging-free apple fruits
Bagging-free apple is more vulnerable to postharvest disease, which severely limits the cultivation pattern transformation of the apple industry in China. This study aimed to ascertain the dominant pathogens in postharvest bagging-free apples, to evaluate the efficacy of essential oil (EO) on inhibition of fungal growth, and to further clarify the molecular mechanism of this action. By morphological characteristics and rDNA sequence analyses, Botryosphaeria dothidea (B. dothidea) and Colletotrichum gloeosporioides (C. gloeosporioides) were identified as the main pathogens isolated from decayed bagging-free apples. Cinnamon and clove EO exhibited high inhibitory activities against mycelial growth both in vapor and contact phases under in vitro conditions. EO vapor at a concentration of 60 μL L−1 significantly reduced the incidence and lesion diameter of inoculated decay in vivo. Observations using a scanning electron microscope (SEM) and transmission electron microscope (TEM) revealed that EO changed the mycelial morphology and cellular ultrastructure and destroyed the integrity and structure of cell membranes and major organelles. Using RNA sequencing and bioinformatics, it was demonstrated that clove EO treatment impaired the cell membrane integrity and biological function via downregulating the genes involved in the membrane component and transmembrane transport. Simultaneously, a stronger binding affinity of trans-cinnamaldehyde and eugenol with CYP51 was assessed by in silico analysis, attenuating the activity of this ergosterol synthesis enzyme. Moreover, pronounced alternations in the oxidation/reduction reaction and critical materials metabolism of clove EO-treated C. gloeosporioides were also observed from transcriptomic data. Altogether, these findings contributed novel antimicrobial cellular and molecular mechanisms of EO, suggesting its potential use as a natural and useful preservative for controlling postharvest spoilage in bagging-free apples
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Audio-visual speaker tracking has drawn increasing attention over the past
few years due to its academic values and wide application. Audio and visual
modalities can provide complementary information for localization and tracking.
With audio and visual information, the Bayesian-based filter can solve the
problem of data association, audio-visual fusion and track management. In this
paper, we conduct a comprehensive overview of audio-visual speaker tracking. To
our knowledge, this is the first extensive survey over the past five years. We
introduce the family of Bayesian filters and summarize the methods for
obtaining audio-visual measurements. In addition, the existing trackers and
their performance on AV16.3 dataset are summarized. In the past few years, deep
learning techniques have thrived, which also boosts the development of audio
visual speaker tracking. The influence of deep learning techniques in terms of
measurement extraction and state estimation is also discussed. At last, we
discuss the connections between audio-visual speaker tracking and other areas
such as speech separation and distributed speaker tracking
Clarifying the mechanisms of the light-induced color formation of apple peel under dark conditions through metabolomics and transcriptomic analyses
Many studies have demonstrated that anthocyanin synthesis in apple peel is induced by light, but the color of bagged apple peel continues to change under dark conditions after light induction has not been characterized. Here, transcriptional and metabolic changes associated with changes in apple peel coloration in the dark after different light induction treatments were studied. Apple pericarp can achieve a normal color under complete darkness followed by light induction. Metabolomics analysis indicated that the expression levels of cyanidin-3-O-galactoside and cyanidin-3-O-glucoside were high, which might be associated with the red color development of apple peel. Transcriptome analysis revealed high expression levels of MdUFGTs, MdMYBs, and MdNACs, which might play a key role in light-induced anthocyanin accumulation under dark conditions. 13 key genes related to dark coloring after light induction was screened. The results of this study provide new insights into the mechanism of anthocyanin synthesis under dark conditions
Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts
Zero-shot text-to-speech aims at synthesizing voices with unseen speech
prompts. Previous large-scale multispeaker TTS models have successfully
achieved this goal with an enrolled recording within 10 seconds. However, most
of them are designed to utilize only short speech prompts. The limited
information in short speech prompts significantly hinders the performance of
fine-grained identity imitation. In this paper, we introduce Mega-TTS 2, a
generic zero-shot multispeaker TTS model that is capable of synthesizing speech
for unseen speakers with arbitrary-length prompts. Specifically, we 1) design a
multi-reference timbre encoder to extract timbre information from multiple
reference speeches; 2) and train a prosody language model with arbitrary-length
speech prompts; With these designs, our model is suitable for prompts of
different lengths, which extends the upper bound of speech quality for
zero-shot text-to-speech. Besides arbitrary-length prompts, we introduce
arbitrary-source prompts, which leverages the probabilities derived from
multiple P-LLM outputs to produce expressive and controlled prosody.
Furthermore, we propose a phoneme-level auto-regressive duration model to
introduce in-context learning capabilities to duration modeling. Experiments
demonstrate that our method could not only synthesize identity-preserving
speech with a short prompt of an unseen speaker but also achieve improved
performance with longer speech prompts. Audio samples can be found in
https://mega-tts.github.io/mega2_demo/