63 research outputs found

    Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

    Full text link
    This paper presents an extension to train end-to-end Context-Aware Transformer Transducer ( CATT ) models by using a simple, yet efficient method of mining hard negative phrases from the latent space of the context encoder. During training, given a reference query, we mine a number of similar phrases using approximate nearest neighbour search. These sampled phrases are then used as negative examples in the context list alongside random and ground truth contextual information. By including approximate nearest neighbour phrases (ANN-P) in the context list, we encourage the learned representation to disambiguate between similar, but not identical, biasing phrases. This improves biasing accuracy when there are several similar phrases in the biasing inventory. We carry out experiments in a large-scale data regime obtaining up to 7% relative word error rate reductions for the contextual portion of test data. We also extend and evaluate CATT approach in streaming applications.Comment: 5 pages, 2 figures, 2 table

    Blending-target Domain Adaptation by Adversarial Meta-Adaptation Networks

    Full text link
    (Unsupervised) Domain Adaptation (DA) seeks for classifying target instances when solely provided with source labeled and target unlabeled examples for training. Learning domain-invariant features helps to achieve this goal, whereas it underpins unlabeled samples drawn from a single or multiple explicit target domains (Multi-target DA). In this paper, we consider a more realistic transfer scenario: our target domain is comprised of multiple sub-targets implicitly blended with each other, so that learners could not identify which sub-target each unlabeled sample belongs to. This Blending-target Domain Adaptation (BTDA) scenario commonly appears in practice and threatens the validities of most existing DA algorithms, due to the presence of domain gaps and categorical misalignments among these hidden sub-targets. To reap the transfer performance gains in this new scenario, we propose Adversarial Meta-Adaptation Network (AMEAN). AMEAN entails two adversarial transfer learning processes. The first is a conventional adversarial transfer to bridge our source and mixed target domains. To circumvent the intra-target category misalignment, the second process presents as ``learning to adapt'': It deploys an unsupervised meta-learner receiving target data and their ongoing feature-learning feedbacks, to discover target clusters as our ``meta-sub-target'' domains. These meta-sub-targets auto-design our meta-sub-target DA loss, which empirically eliminates the implicit category mismatching in our mixed target. We evaluate AMEAN and a variety of DA algorithms in three benchmarks under the BTDA setup. Empirical results show that BTDA is a quite challenging transfer setup for most existing DA algorithms, yet AMEAN significantly outperforms these state-of-the-art baselines and effectively restrains the negative transfer effects in BTDA.Comment: CVPR-19 (oral). Code is available at http://github.com/zjy526223908/BTD

    Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

    Full text link
    This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries, in terms of recognition accuracy and latency. We then explore the use of variable masking, where the attention masks are sampled from a target distribution at training time, to build models that can work in different configurations. Finally, we investigate how a single configurable model can be used to perform both first pass streaming recognition and second pass acoustic rescoring. Experiments show that chunked masking achieves a better accuracy vs latency trade-off compared to fixed masking, both with and without FastEmit. We also show that variable masking improves the accuracy by up to 8% relative in the acoustic re-scoring scenario.Comment: 5 pages, 4 figures, 2 Table

    A Study on the Comparison and Enhancement of Health Literacy of College Students in Guangdong Province in 2020 and 2022

    Get PDF
    In order to compare the health literacy level of college students in Guangdong province in 2020 and 2022, so as to provide a scientific basis for targeted health literacy intervention and policy formulation for college students in Guangdong province, surveys were respectively conducted in 2020 and 2022. The data collation and analysis were performed using SPSS 19.0 statistical software. The χ2 test was used to compare different health literacy, and logistic regression was performed to analyse the factors influencing health literacy. The results show that the general health literacy level of college students in Guangdong province in 2022 is 46.5%, 6.3% higher than 40.2% in 2020, which difference is statistically significant. The three dimensions and six aspects of health literacy all have improved. The results of both years showed that health skills, basic medical literacy and health information literacy were at a low level. According to logistic regression analysis, the health literacy level of senior students is higher than that of junior students,and those who have taken health related courses have higher health literacy level. The most desirable type of health knowledge is prevention and treatment of infectious diseases, and the new media access is becoming more popular among students to gain health knowledge. In conclusion, Guangdong college students’ health literacy is relatively high, but still needs to be improved, especially in health skills, basic medical care and health information literacy. The government, colleges and universities should work together to improve college students’ health literacy

    APT Weighted MRI as an Effective Imaging Protocol to Predict Clinical Outcome After Acute Ischemic Stroke

    Get PDF
    To explore the capability of the amide-proton-transfer weighted (APTW) magnetic resonance imaging (MRI) in the evaluation of clinical neurological deficit at the time of hospitalization and assessment of long-term daily functional outcome for patients with acute ischemic stroke (AIS). We recruited 55 AIS patients with brain MRI acquired within 24–48 h of symptom onset and followed up with their 90-day modified Rankin Scale (mRS) score. APT weighted MRI was performed for all the study subjects to measure APTW signal quantitatively in the acute ischemic area (APTWipsi) and the contralateral side (APTWcont). Change of the APT signal between the acute ischemic region and the contralateral side (ΔAPTW) was calculated. Maximum APTW signal (APTWmax) and minimal APTW signal (APTWmin) were also acquired to demonstrate APTW signals heterogeneity (APTWmax−min). In addition, all the patients were divided into 2 groups according to their 90-day mRS score (good prognosis group with mRS score <2 and poor prognosis group with mRS score ≥2). In the meantime, ΔAPTW of these groups was compared. We found that ΔAPTW was in good correlation with National Institutes of Health Stroke Scale (NIHSS) score (R2 = 0.578, p < 0.001) and 90-day mRS score (R2 = 0.55, p < 0.001). There was significant difference of ΔAPTW between patients with good prognosis and patients with poor prognosis. Plus, APTWmax−min was significantly different between two groups. These results suggested that APT weighted MRI could be used as an effective tool to assess the stroke severity and prognosis for patients with AIS, with APTW signal heterogeneity as a possible biomarker

    Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices

    Full text link
    Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput when targeting ultra-low power always-on devices. Latency, availability, cost, and privacy requirements are pushing recent IoT systems to process the data on the node, close to the sensor, with a very limited energy supply, and tight constraints on the memory size and processing capabilities precluding to run state-of-the-art DNNs. In this paper, we explore the combination of extreme quantization to a small-footprint binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller. Starting from an existing CNN for SED whose footprint (815 kB) exceeds the 512 kB of memory available on our platform, we retrain the network using binary filters and activations to match these memory constraints. (Fully) binary neural networks come with a natural drop in accuracy of 12-18% on the challenging ImageNet object recognition challenge compared to their equivalent full-precision baselines. This BNN reaches a 77.9% accuracy, just 7% lower than the full-precision version, with 58 kB (7.2 times less) for the weights and 262 kB (2.4 times less) memory in total. With our BNN implementation, we reach a peak throughput of 4.6 GMAC/s and 1.5 GMAC/s over the full network, including preprocessing with Mel bins, which corresponds to an efficiency of 67.1 GMAC/s/W and 31.3 GMAC/s/W, respectively. Compared to the performance of an ARM Cortex-M4 implementation, our system has a 10.3 times faster execution time and a 51.1 times higher energy-efficiency.Comment: 6 pages conferenc

    Structural Modifications of the Brain in Acclimatization to High-Altitude

    Get PDF
    Adaptive changes in respiratory and cardiovascular responses at high altitude (HA) have been well clarified. However, the central mechanisms underlying HA acclimatization remain unclear. Using voxel-based morphometry (VBM) and diffusion tensor imaging (DTI) with fractional anisotropy (FA) calculation, we investigated 28 Han immigrant residents (17–22 yr) born and raised at HA of 2616–4200 m in Qinghai-Tibetan Plateau for at least 17 years and who currently attended college at sea-level (SL). Their family migrated from SL to HA 2–3 generations ago and has resided at HA ever since. Control subjects were matched SL residents. HA residents (vs. SL) showed decreased grey matter volume in the bilateral anterior insula, right anterior cingulate cortex, bilateral prefrontal cortex, left precentral cortex, and right lingual cortex. HA residents (vs. SL) had significantly higher FA mainly in the bilateral anterior limb of internal capsule, bilateral superior and inferior longitudinal fasciculus, corpus callosum, bilateral superior corona radiata, bilateral anterior external capsule, right posterior cingulum, and right corticospinal tract. Higher FA values in those regions were associated with decreased or unchanged radial diffusivity coinciding with no change of longitudinal diffusivity in HA vs. SL group. Conversely, HA residents had lower FA in the left optic radiation and left superior longitudinal fasciculus. Our data demonstrates that HA acclimatization is associated with brain structural modifications, including the loss of regional cortical grey matter accompanied by changes in the white matter, which may underlie the physiological adaptation of residents at HA
    corecore