269 research outputs found

    Promoted Electronic Coupling of Acoustic Phonon Modes in Doped Semimetallic MoTe2

    Full text link
    As a prototype of the Weyl superconductor, layered molybdenum telluride (MoTe2) encompasses two semimetallic phases (1T_prime and Td) which differentiate from each other via a slight tilting of the out-of-plane lattice. Both phases are subjected to serious phase mixing which complicates the analysis of its origin of superconductivity. Herein, we explore the electron-phonon coupling (EPC) of the monolayer semimetallic MoTe2, without phase ambiguity under this thickness limit. Apart from the hardening or softening of phonon modes, the strength of the EPC can be strongly modulated by doping. Specifically, longitudinal and out-of-plane acoustic modes are significantly activated for electron doped MoTe2. This is ascribed to the presence of rich valley states and equispaced nesting bands which are dynamically populated under charge doping. Through comparing the monolayer and bilayer MoTe2, the strength of EPC is found to be less likely to depend on thickness for neutral samples but clearly promoted for thinner samples with electron doping, while for hole doping, the strength alters more significantly with the thickness than doping. Our work explains the puzzling issue of the doping sensitivity of the superconductivity in semimetallic MoTe2 and establishes the critical role of activating acoustic phonons in such low-dimensional materials

    Phononic transport in 1T prime-MoTe2: anisotropic structure with an isotropic lattice thermal conductivity

    Full text link
    Molybdenum ditelluride (MoTe2) is an unique transition metal dichalcogenide owing to its energetically comparable 1H and 1T prime phases. This implies a high chance of coexistence of 1H-1T prime heterostructures which poses great complexity in the measurement of the intrinsic lattice thermal conductivities (kappa). In this work, via first-principles calculations, we examine the lattice-wave propagation and thermal conduction in this highly structurally anisotropic 1T prime MoTe2. Our calculation shows that the 1T prime phase has a sound velocity of 2.13 km/s (longitudinal acoustic wave), much lower than that of the 1H phase (4.05 km /s), indicating a staggered transmission of lattice waves across the boundary from 1H to 1T prime phase. Interestingly, the highly anisotropic 1T prime MoTe2 shows nearly isotropic and limited kappa_L of 13.02 W/mK, owing to a large Gruneisen parameter of acoustic flexural mode, heavy masses of Mo and Te elements and a low phonon group velocity. Accumulative kappa_L as a function of mean free path (MFP) indicates phonons with MFP less than ~300 nm contribute 80% of kappa_L and an inflection point at ~600 nm. Our results will be critical for understanding of the size dependent kappa_L of nanostructured 1T prime MoTe2

    An empirical study of weakly supervised audio tagging embeddings for general audio representations

    Full text link
    We study the usability of pre-trained weakly supervised audio tagging (AT) models as feature extractors for general audio representations. We mainly analyze the feasibility of transferring those embeddings to other tasks within the speech and sound domains. Specifically, we benchmark weakly supervised pre-trained models (MobileNetV2 and EfficientNet-B0) against modern self-supervised learning methods (BYOL-A) as feature extractors. Fourteen downstream tasks are used for evaluation ranging from music instrument classification to language classification. Our results indicate that AT pre-trained models are an excellent transfer learning choice for music, event, and emotion recognition tasks. Further, finetuning AT models can also benefit speech-related tasks such as keyword spotting and intent classification.Comment: Odyssey 202

    Streaming Audio Transformers for Online Audio Tagging

    Full text link
    Transformers have emerged as a prominent model framework for audio tagging (AT), boasting state-of-the-art (SOTA) performance on the widely-used Audioset dataset. However, their impressive performance often comes at the cost of high memory usage, slow inference speed, and considerable model delay, rendering them impractical for real-world AT applications. In this study, we introduce streaming audio transformers (SAT) that combine the vision transformer (ViT) architecture with Transformer-Xl-like chunk processing, enabling efficient processing of long-range audio signals. Our proposed SAT is benchmarked against other transformer-based SOTA methods, achieving significant improvements in terms of mean average precision (mAP) at a delay of 2s and 1s, while also exhibiting significantly lower memory usage and computational overhead. Checkpoints are publicly available https://github.com/RicherMans/SAT

    UniKW-AT: Unified Keyword Spotting and Audio Tagging

    Full text link
    Within the audio research community and the industry, keyword spotting (KWS) and audio tagging (AT) are seen as two distinct tasks and research fields. However, from a technical point of view, both of these tasks are identical: they predict a label (keyword in KWS, sound event in AT) for some fixed-sized input audio segment. This work proposes UniKW-AT: An initial approach for jointly training both KWS and AT. UniKW-AT enhances the noise-robustness for KWS, while also being able to predict specific sound events and enabling conditional wake-ups on sound events. Our approach extends the AT pipeline with additional labels describing the presence of a keyword. Experiments are conducted on the Google Speech Commands V1 (GSCV1) and the balanced Audioset (AS) datasets. The proposed MobileNetV2 model achieves an accuracy of 97.53% on the GSCV1 dataset and an mAP of 33.4 on the AS evaluation set. Further, we show that significant noise-robustness gains can be observed on a real-world KWS dataset, greatly outperforming standard KWS approaches. Our study shows that KWS and AT can be merged into a single framework without significant performance degradation.Comment: Accepted in Interspeech202

    Mice Exposed to Chronic Intermittent Hypoxia Simulate Clinical Features of Deficiency of both Qi and Yin Syndrome in Traditional Chinese Medicine

    Get PDF
    Deficiency of both Qi and Yin Syndrome (DQYS) is one of the common syndromes in traditional Chinese medicine (TCM), mainly characterized by tiredness, emaciation, anorexia, fidget, palpitation and rapid pulse, and so forth. Currently, there is no available animal model which can reflect the clinical features of this syndrome. In the present paper, we observed the time-course changes of whole behavior, body weight, food intake, locomotive activity and electrocardiogram in mice exposed to chronic intermittent hypoxia for 6 weeks, and measured bleeding time at last according to the clinical features of DQYS and one key pathological factor. The results showed that the mice exposed to intermittent hypoxia for certain time presented lackluster hair, dull looking hair, resistance, attacking, body weight loss, food intake decline, locomotive activity decrease, heart rate quickening and T wave elevating, which were similar to the major clinical features of DQYS. Meanwhile, bleeding time shortening was also found, which was consistent with the clinical fact that DQYS often accompanied with blood stasis. The possible explanation was also outlined according to the available literature. Such findings suggested chronic intermittent hypoxia could induce similar symptoms and signs in mice accorded with the clinical features of DQYS, which provided a suitable animal model for evaluation of drugs for the treatment of this syndrome and further exploration of pathological process or correlation of the syndrome and related diseases

    CED: Consistent ensemble distillation for audio tagging

    Full text link
    Augmentation and knowledge distillation (KD) are well-established techniques employed in audio classification tasks, aimed at enhancing performance and reducing model sizes on the widely recognized Audioset (AS) benchmark. Although both techniques are effective individually, their combined use, called consistent teaching, hasn't been explored before. This paper proposes CED, a simple training framework that distils student models from large teacher ensembles with consistent teaching. To achieve this, CED efficiently stores logits as well as the augmentation methods on disk, making it scalable to large-scale datasets. Central to CED's efficacy is its label-free nature, meaning that only the stored logits are used for the optimization of a student model only requiring 0.3\% additional disk space for AS. The study trains various transformer-based models, including a 10M parameter model achieving a 49.0 mean average precision (mAP) on AS. Pretrained models and code are available at https://github.com/RicherMans/CED

    Understanding temporally weakly supervised training: A case study for keyword spotting

    Full text link
    The currently most prominent algorithm to train keyword spotting (KWS) models with deep neural networks (DNNs) requires strong supervision i.e., precise knowledge of the spoken keyword location in time. Thus, most KWS approaches treat the presence of redundant data, such as noise, within their training set as an obstacle. A common training paradigm to deal with data redundancies is to use temporally weakly supervised learning, which only requires providing labels on a coarse scale. This study explores the limits of DNN training using temporally weak labeling with applications in KWS. We train a simple end-to-end classifier on the common Google Speech Commands dataset with increased difficulty by randomly appending and adding noise to the training dataset. Our results indicate that temporally weak labeling can achieve comparable results to strongly supervised baselines while having a less stringent labeling requirement. In the presence of noise, weakly supervised models are capable to localize and extract target keywords without explicit supervision, leading to a performance increase compared to strongly supervised approaches
    corecore