2,241 research outputs found

    Improving the Performance of Online Neural Transducer Models

    Full text link
    Having a sequence-to-sequence model which can operate in an online fashion is important for streaming applications such as Voice Search. Neural transducer is a streaming sequence-to-sequence model, but has shown a significant degradation in performance compared to non-streaming models such as Listen, Attend and Spell (LAS). In this paper, we present various improvements to NT. Specifically, we look at increasing the window over which NT computes attention, mainly by looking backwards in time so the model still remains online. In addition, we explore initializing a NT model from a LAS-trained model so that it is guided with a better alignment. Finally, we explore including stronger language models such as using wordpiece models, and applying an external LM during the beam search. On a Voice Search task, we find with these improvements we can get NT to match the performance of LAS

    Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

    Full text link
    Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network. In this work, we look at one such sequence-to-sequence model, namely listen, attend and spell (LAS), and explore the possibility of training a single model to serve different English dialects, which simplifies the process of training multi-dialect systems without the need for separate AM, PM and LMs for each dialect. We show that simply pooling the data from all dialects into one LAS model falls behind the performance of a model fine-tuned on each dialect. We then look at incorporating dialect-specific information into the model, both by modifying the training targets by inserting the dialect symbol at the end of the original grapheme sequence and also feeding a 1-hot representation of the dialect information into all layers of the model. Experimental results on seven English dialects show that our proposed system is effective in modeling dialect variations within a single LAS model, outperforming a LAS model trained individually on each of the seven dialects by 3.1 ~ 16.5% relative.Comment: submitted to ICASSP 201

    A W-Band SPDT Switch with 15 dBm P1dB in 55-nm Bulk CMOS

    Get PDF
    © 2022 IEEE -This is the accepted manuscript version of an article which has been published in final form at https://doi.org/10.1109/LMWC.2022.3159529Power-handling capability of bulk CMOS-based single-pole double-throw switch operating in millimetre-wave and sub-THz region is significantly limited by the reduced threshold voltage of deeply scaled transistors. A unique design technique based on impedance transformation network is presented in this work, which improves 1-dB compression point, namely P1dB, without deteriorating other performance. To prove the presented solution is valid, a 70-100 GHz switch is designed and implemented in a 55-nm bulk CMOS technology. At 90 GHz, it achieves a measured P1dB of 15 dBm, an insertion loss of 3.5 dB and an isolation of 18 dB. The total area of the chip is only 0.14 mm2.Peer reviewe

    A 90-GHz Asymmetrical Single-Pole Double-Throw Switch with >19.5-dBm 1-dB Compression Point in Transmission Mode Using 55-nm Bulk CMOS Technology

    Get PDF
    © Copyright 2021 IEEE. This is the accepted manuscript version of an article which has been published in final form at https://doi.org/10.1109/TCSI.2021.3106231The millimeter-wave (mm-wave) single-pole double-throw (SPDT) switch designed in bulk CMOS technology has limited power-handling capability in terms of 1-dB compression point (P1dB) inherently. This is mainly due to the low threshold voltage of the switching transistors used for shunt-connected configuration. To solve this issue, an innovative approach is presented in this work, which utilizes a unique passive ring structure. It allows a relatively strong RF signal passing through the TX branch, while the switching transistors are turned on. Thus, the fundamental limitation for P1dB due to reduced threshold voltage is overcome. To prove the presented approach is feasible in practice, a 90-GHz asymmetrical SPDT switch is designed in a standard 55-nm bulk CMOS technology. The design has achieved an insertion loss of 3.2 dB and 3.6 dB in TX and RX mode, respectively. Moreover, more than 20 dB isolation is obtained in both modes. Because of using the proposed passive ring structure, a remarkable P1dB is achieved. No gain compression is observed at all, while a 19.5 dBm input power is injected into the TX branch of the designed SPDT switch. The die area of this design is only 0.26 mm2.Peer reviewe

    Scalable production of iPSC-derived human neurons to identify tau-lowering compounds by high-content screening

    Get PDF
    Lowering total tau levels is an attractive therapeutic strategy for Alzheimer's disease and other tauopathies. High-throughput screening in neurons derived from human induced pluripotent stem cells (iPSCs) is a powerful tool to identify tau-targeted therapeutics. However, such screens have been hampered by heterogeneous neuronal production, high cost and low yield, and multi-step differentiation procedures. We engineered an isogenic iPSC line that harbors an inducible neurogenin 2 transgene, a transcription factor that rapidly converts iPSCs to neurons, integrated at the AAVS1 locus. Using a simplified two-step protocol, we differentiated these iPSCs into cortical glutamatergic neurons with minimal well-to-well variability. We developed a robust high-content screening assay to identify tau-lowering compounds in LOPAC and identified adrenergic receptors agonists as a class of compounds that reduce endogenous human tau. These techniques enable the use of human neurons for high-throughput screening of drugs to treat neurodegenerative disease

    No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

    Full text link
    For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based units to words. However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units. In this work, we conduct detailed experiments which are aimed at quantifying the value of phoneme-based pronunciation lexica in the context of end-to-end models. We examine phoneme-based end-to-end models, which are contrasted against grapheme-based ones on a large vocabulary English Voice-search task, where we find that graphemes do indeed outperform phonemes. We also compare grapheme and phoneme-based approaches on a multi-dialect English task, which once again confirm the superiority of graphemes, greatly simplifying the system for recognizing multiple dialects
    corecore