877 research outputs found

    Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition

    Full text link
    In recent years, end-to-end speech recognition has emerged as a technology that integrates the acoustic, pronunciation dictionary, and language model components of the traditional Automatic Speech Recognition model. It is possible to achieve human-like recognition without the need to build a pronunciation dictionary in advance. However, due to the relative scarcity of training data on code-switching, the performance of ASR models tends to degrade drastically when encountering this phenomenon. Most past studies have simplified the learning complexity of the model by splitting the code-switching task into multiple tasks dealing with a single language and then learning the domain-specific knowledge of each language separately. Therefore, in this paper, we attempt to introduce language identification information into the middle layer of the ASR model's encoder. We aim to generate acoustic features that imply language distinctions in a more implicit way, reducing the model's confusion when dealing with language switching.Comment: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese languag

    DISTRIBUTION OF GRIP PRESSURE THROUGHOUT THE PHASES OF PUTTING IN ELITE GOLF COLLEGE PLAYERS

    Get PDF
    The purpose of this study is to investigate the distribution of grip pressure, force and the peak pressure of different phases during the putting stroke. Five elite college players with handicaps of 2-8 participated in the study. The Novel Pliance-x System and 150Hz 8- camera Motion Analysis Corporation System were used to collect grip pressure and identify each phase of the putting stroke. At each phase of the putting stroke, average grip pressure, peak pressure and grip force were investigated. Results indicated that lowest grip pressure occurred at address up to the top of backswing (2.41±1.36 Kpa). Grip pressure started to increase during the downswing and reached its peak, 0.02±0.05s, before impact (4.70±1.97 Kpa). The pressure reduced again after impact (4.36±2.06 Kpa). Results indicate that grip pressure does not remain the same throughout the stroke

    Mechanical ball shear, electromigration, and thermal cycling reliability testing on novel solder interconnects of highly integrated chips for advanced applications

    Get PDF
    In the near future, Ultra Large Scale Integrated Circuits (ULSI) with high integration has drawn the huge attention because of its potential applications in VR, AI, IoTs and automotive regions. Thermal budget and reliability concerns are two major issues that are urgently needed to be solved for these technologies. Since the increasing integration of ICs might lead to low yield concern, low fabrication temperature is expected to reduce the thermal impact on ICs properties. Besides, better reliability is also required to the electric devices for those to work under harsh outdoor environments. This study is tended to be focused on the novel solder bonds for the advanced ICs, including low temperature solder, Cu-core solder ball, and their response under various reliability tests. Three main reliability tests: (1) ball shear test, (2) electromigration test (EM) and (3) thermal cycling test (TCT), are conducted to evaluate the reliability of solder bonds. In this work, the novel Bi-40In solder alloy with improved mechanical property and the EM-resisted Cu-core solder ball are demonstrated. The re-designed low temperature solder joint reveals the superior ball shear strength than that of conventional eutectic Bi-33In joint. Additionally, the interconnects using Cu-core solder ball show the high resistance against EM under current stressing. Regarding TCT, the assemble joints with various grain structures are tested to realize the effects of Sn grain size on joint degradation and the possible ways for relieving the thermomechanical stress caused by TCT. The microstructure, elemental characteristics and grain structure are analyzed by FE-SEM, FE-EPMA and EBSD, respectively. The failure mechanisms for all reliability tests are addressed and discussed in details as well

    AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning

    Full text link
    Voice, as input, has progressively become popular on mobiles and seems to transcend almost entirely text input. Through voice, the voice search (VS) system can provide a more natural way to meet user's information needs. However, errors from the automatic speech recognition (ASR) system can be catastrophic to the VS system. Building on the recent advanced lightweight autoregressive retrieval model, which has the potential to be deployed on mobiles, leading to a more secure and personal VS assistant. This paper presents a novel study of VS leveraging autoregressive retrieval and tackles the crucial problems facing VS, viz. the performance drop caused by ASR noise, via data augmentations and contrastive learning, showing how explicit and implicit modeling the noise patterns can alleviate the problems. A series of experiments conducted on the Open-Domain Question Answering (ODSQA) confirm our approach's effectiveness and robustness in relation to some strong baseline systems

    An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement

    Full text link
    With the massive developments of end-to-end (E2E) neural networks, recent years have witnessed unprecedented breakthroughs in automatic speech recognition (ASR). However, the codeswitching phenomenon remains a major obstacle that hinders ASR from perfection, as the lack of labeled data and the variations between languages often lead to degradation of ASR performance. In this paper, we focus exclusively on improving the acoustic encoder of E2E ASR to tackle the challenge caused by the codeswitching phenomenon. Our main contributions are threefold: First, we introduce a novel disentanglement loss to enable the lower-layer of the encoder to capture inter-lingual acoustic information while mitigating linguistic confusion at the higher-layer of the encoder. Second, through comprehensive experiments, we verify that our proposed method outperforms the prior-art methods using pretrained dual-encoders, meanwhile having access only to the codeswitching corpus and consuming half of the parameterization. Third, the apparent differentiation of the encoders' output features also corroborates the complementarity between the disentanglement loss and the mixture-of-experts (MoE) architecture.Comment: ICASSP 202

    A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment

    Full text link
    Automatic Pronunciation Assessment (APA) plays a vital role in Computer-assisted Pronunciation Training (CAPT) when evaluating a second language (L2) learner's speaking proficiency. However, an apparent downside of most de facto methods is that they parallelize the modeling process throughout different speech granularities without accounting for the hierarchical and local contextual relationships among them. In light of this, a novel hierarchical approach is proposed in this paper for multi-aspect and multi-granular APA. Specifically, we first introduce the notion of sup-phonemes to explore more subtle semantic traits of L2 speakers. Second, a depth-wise separable convolution layer is exploited to better encapsulate the local context cues at the sub-word level. Finally, we use a score-restraint attention pooling mechanism to predict the sentence-level scores and optimize the component models with a multitask learning (MTL) framework. Extensive experiments carried out on a publicly-available benchmark dataset, viz. speechocean762, demonstrate the efficacy of our approach in relation to some cutting-edge baselines.Comment: Accepted to Interspeech 202

    Probing the DNA kink structure induced by the hyperthermophilic chromosomal protein Sac7d

    Get PDF
    Sac7d, a small, abundant, sequence-general DNA-binding protein from the hyperthermophilic archaeon Sulfolobus acidocaldarius, causes a single-step sharp kink in DNA (∼60°) via the intercalation of both Val26 and Met29. These two amino acids were systematically changed in size to probe their effects on DNA kinking. Eight crystal structures of five Sac7d mutant–DNA complexes have been analyzed. The DNA-binding pattern of the V26A and M29A single mutants is similar to that of the wild-type, whereas the V26A/M29A protein binds DNA without side chain intercalation, resulting in a smaller overall bending (∼50°). The M29F mutant inserts the Phe29 side chain orthogonally to the C2pG3 step without stacking with base pairs, inducing a sharp kink (∼80°). In the V26F/M29F-GCGATCGC complex, Phe26 intercalates deeply into DNA bases by stacking with the G3 base, whereas Phe29 is stacked on the G15 deoxyribose, in a way similar to those used by the TATA box-binding proteins. All mutants have reduced DNA-stabilizing ability, as indicated by their lower T(m) values. The DNA kink patterns caused by different combinations of hydrophobic side chains may be relevant in understanding the manner by which other minor groove-binding proteins interact with DNA
    • …
    corecore