Search CORE

63 research outputs found

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

Author: Bleeker Maurits
Braun Stefan
Swietojanski Pawel
Zhuang Xiaodan
Publication venue
Publication date: 18/04/2023
Field of study

This paper presents an extension to train end-to-end Context-Aware Transformer Transducer ( CATT ) models by using a simple, yet efficient method of mining hard negative phrases from the latent space of the context encoder. During training, given a reference query, we mine a number of similar phrases using approximate nearest neighbour search. These sampled phrases are then used as negative examples in the context list alongside random and ground truth contextual information. By including approximate nearest neighbour phrases (ANN-P) in the context list, we encourage the learned representation to disambiguate between similar, but not identical, biasing phrases. This improves biasing accuracy when there are several similar phrases in the biasing inventory. We carry out experiments in a large-scale data regime obtaining up to 7% relative word error rate reductions for the contextual portion of test data. We also extend and evaluate CATT approach in streaming applications.Comment: 5 pages, 2 figures, 2 table

arXiv.org e-Print Archive

Blending-target Domain Adaptation by Adversarial Meta-Adaptation Networks

Author: Chen Ziliang
Liang Xiaodan
Lin Liang
Zhuang Jingyu
Publication venue
Publication date: 07/07/2019
Field of study

(Unsupervised) Domain Adaptation (DA) seeks for classifying target instances when solely provided with source labeled and target unlabeled examples for training. Learning domain-invariant features helps to achieve this goal, whereas it underpins unlabeled samples drawn from a single or multiple explicit target domains (Multi-target DA). In this paper, we consider a more realistic transfer scenario: our target domain is comprised of multiple sub-targets implicitly blended with each other, so that learners could not identify which sub-target each unlabeled sample belongs to. This Blending-target Domain Adaptation (BTDA) scenario commonly appears in practice and threatens the validities of most existing DA algorithms, due to the presence of domain gaps and categorical misalignments among these hidden sub-targets. To reap the transfer performance gains in this new scenario, we propose Adversarial Meta-Adaptation Network (AMEAN). AMEAN entails two adversarial transfer learning processes. The first is a conventional adversarial transfer to bridge our source and mixed target domains. To circumvent the intra-target category misalignment, the second process presents as ``learning to adapt'': It deploys an unsupervised meta-learner receiving target data and their ongoing feature-learning feedbacks, to discover target clusters as our ``meta-sub-target'' domains. These meta-sub-targets auto-design our meta-sub-target DA loss, which empirically eliminates the implicit category mismatching in our mixed target. We evaluate AMEAN and a variety of DA algorithms in three benchmarks under the BTDA setup. Empirical results show that BTDA is a quite challenging transfer setup for most existing DA algorithms, yet AMEAN significantly outperforms these state-of-the-art baselines and effectively restrains the negative transfer effects in BTDA.Comment: CVPR-19 (oral). Code is available at http://github.com/zjy526223908/BTD

arXiv.org e-Print Archive

Crossref

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

Author: Braun Stefan
Can Dogan
da Silva Thiago Fraga
Ghoshal Arnab
Hori Takaaki
Hsiao Roger
Mason Henry
McDermott Erik
Silovsky Honza
Swietojanski Pawel
Travadi Ruchir
Zhuang Xiaodan
Publication venue
Publication date: 02/11/2022
Field of study

This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries, in terms of recognition accuracy and latency. We then explore the use of variable masking, where the attention masks are sampled from a target distribution at training time, to build models that can work in different configurations. Finally, we investigate how a single configurable model can be used to perform both first pass streaming recognition and second pass acoustic rescoring. Experiments show that chunked masking achieves a better accuracy vs latency trade-off compared to fixed masking, both with and without FastEmit. We also show that variable masking improves the accuracy by up to 8% relative in the acoustic re-scoring scenario.Comment: 5 pages, 4 figures, 2 Table

arXiv.org e-Print Archive

A Study on the Comparison and Enhancement of Health Literacy of College Students in Guangdong Province in 2020 and 2022

Author: Dong Jian
Feng Juan
Guo Yintong
Jiang Dan
Li Zhuangwei
Lin Jieping
Wang Xiaodan
Zhuang Mengli
Publication venue: 'EDP Sciences'
Publication date: 01/01/2023
Field of study

In order to compare the health literacy level of college students in Guangdong province in 2020 and 2022, so as to provide a scientific basis for targeted health literacy intervention and policy formulation for college students in Guangdong province, surveys were respectively conducted in 2020 and 2022. The data collation and analysis were performed using SPSS 19.0 statistical software. The χ2 test was used to compare different health literacy, and logistic regression was performed to analyse the factors influencing health literacy. The results show that the general health literacy level of college students in Guangdong province in 2022 is 46.5%, 6.3% higher than 40.2% in 2020, which difference is statistically significant. The three dimensions and six aspects of health literacy all have improved. The results of both years showed that health skills, basic medical literacy and health information literacy were at a low level. According to logistic regression analysis, the health literacy level of senior students is higher than that of junior students，and those who have taken health related courses have higher health literacy level. The most desirable type of health knowledge is prevention and treatment of infectious diseases, and the new media access is becoming more popular among students to gain health knowledge. In conclusion, Guangdong college students’ health literacy is relatively high, but still needs to be improved, especially in health skills, basic medical care and health information literacy. The government, colleges and universities should work together to improve college students’ health literacy

Directory of Open Access Journals

APT Weighted MRI as an Effective Imaging Protocol to Predict Clinical Outcome After Acute Ischemic Stroke

Author: Caiyu Zhuang
Gang Xiao
Guisen Lin
Renhua Wu
Renhua Wu
Xiaodan Zong
Yanzi Chen
Yuanyu Shen
Zhiwei Shen
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

To explore the capability of the amide-proton-transfer weighted (APTW) magnetic resonance imaging (MRI) in the evaluation of clinical neurological deficit at the time of hospitalization and assessment of long-term daily functional outcome for patients with acute ischemic stroke (AIS). We recruited 55 AIS patients with brain MRI acquired within 24–48 h of symptom onset and followed up with their 90-day modified Rankin Scale (mRS) score. APT weighted MRI was performed for all the study subjects to measure APTW signal quantitatively in the acute ischemic area (APTWipsi) and the contralateral side (APTWcont). Change of the APT signal between the acute ischemic region and the contralateral side (ΔAPTW) was calculated. Maximum APTW signal (APTWmax) and minimal APTW signal (APTWmin) were also acquired to demonstrate APTW signals heterogeneity (APTWmax−min). In addition, all the patients were divided into 2 groups according to their 90-day mRS score (good prognosis group with mRS score <2 and poor prognosis group with mRS score ≥2). In the meantime, ΔAPTW of these groups was compared. We found that ΔAPTW was in good correlation with National Institutes of Health Stroke Scale (NIHSS) score (R2 = 0.578, p < 0.001) and 90-day mRS score (R2 = 0.55, p < 0.001). There was significant difference of ΔAPTW between patients with good prognosis and patients with poor prognosis. Plus, APTWmax−min was significantly different between two groups. These results suggested that APT weighted MRI could be used as an effective tool to assess the stroke severity and prognosis for patients with AIS, with APTW signal heterogeneity as a possible biomarker

Directory of Open Access Journals

Frontiers - Publisher Connector

Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices

Author: Aditya
Andrey
Angelo Garofalo
Annamaria
Annamaria
Benoit
Bert
Daniele Palossi
Darryl D.
Elisabetta
Francesco
Gianmarco
Gopika Premsankar
Jason
Luigi
Lukas
Massimo Alioto
Qi Meng
Ronald G
Shuchang Zhou
Xiaodan Zhuang
Yihui
Yueyue
Yundong Zhang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput when targeting ultra-low power always-on devices. Latency, availability, cost, and privacy requirements are pushing recent IoT systems to process the data on the node, close to the sensor, with a very limited energy supply, and tight constraints on the memory size and processing capabilities precluding to run state-of-the-art DNNs. In this paper, we explore the combination of extreme quantization to a small-footprint binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller. Starting from an existing CNN for SED whose footprint (815 kB) exceeds the 512 kB of memory available on our platform, we retrain the network using binary filters and activations to match these memory constraints. (Fully) binary neural networks come with a natural drop in accuracy of 12-18% on the challenging ImageNet object recognition challenge compared to their equivalent full-precision baselines. This BNN reaches a 77.9% accuracy, just 7% lower than the full-precision version, with 58 kB (7.2 times less) for the weights and 262 kB (2.4 times less) memory in total. With our BNN implementation, we reach a peak throughput of 4.6 GMAC/s and 1.5 GMAC/s over the full network, including preprocessing with Mel bins, which corresponds to an efficiency of 67.1 GMAC/s/W and 31.3 GMAC/s/W, respectively. Compared to the performance of an ARM Cortex-M4 implementation, our system has a 10.3 times faster execution time and a 51.1 times higher energy-efficiency.Comment: 6 pages conferenc

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Structural Modifications of the Brain in Acclimatization to High-Altitude

Author: A Hyafil
A Kanaan
A Sidaros
A Versace
A von Leupoldt
AJ Verberne
AL Green
AM Hogan
AP Binks
AR Frisancho
AR Frisancho
AR Frisancho
AR Frisancho
AR Frisancho
BY Huang
C Beaulieu
C Constantinidis
C Davatzikos
C Marconi
CM Beall
D Barazany
D Head
D Penaloza
DS Kimmerly
ED Schwartz
F Hoeft
FL Bookstein
G Thomalla
GF Jansen
H Jiang
HD Critchley
J Ashburner
J Peters
J Talairach
J Zhuang
JA Boero
JA Dani
JC LaManna
Jiaxing Zhang
Jinfu Shi
JW Papez
L Bernardi
L Chang
L Concha
LG Moore
LS Curran
M Jenkinson
M Rivera-Ch
M Wideroe
MJ Morrell
MJ Morrell
MP Paulus
MS Westerterp-Plantenga
NR Giuliani
P Mukherjee
Pedro Antonio Valdes-Sosa
PM Macey
PM Macey
PW Davenport
PW Hochachka
Qiyong Gong
RL Silton
S Bava
SF Witelson
SM Burns
SM Smith
SM Smith
T Beppu
T Wobrock
T Wu
TC Chua
TD Brutsaert
TD Wager
V Gulani
V Tripathy
VE Claydon
W Gao
Xiaodan Yan
Xuchu Weng
Y Zhang
Yijun Liu
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Adaptive changes in respiratory and cardiovascular responses at high altitude (HA) have been well clarified. However, the central mechanisms underlying HA acclimatization remain unclear. Using voxel-based morphometry (VBM) and diffusion tensor imaging (DTI) with fractional anisotropy (FA) calculation, we investigated 28 Han immigrant residents (17–22 yr) born and raised at HA of 2616–4200 m in Qinghai-Tibetan Plateau for at least 17 years and who currently attended college at sea-level (SL). Their family migrated from SL to HA 2–3 generations ago and has resided at HA ever since. Control subjects were matched SL residents. HA residents (vs. SL) showed decreased grey matter volume in the bilateral anterior insula, right anterior cingulate cortex, bilateral prefrontal cortex, left precentral cortex, and right lingual cortex. HA residents (vs. SL) had significantly higher FA mainly in the bilateral anterior limb of internal capsule, bilateral superior and inferior longitudinal fasciculus, corpus callosum, bilateral superior corona radiata, bilateral anterior external capsule, right posterior cingulum, and right corticospinal tract. Higher FA values in those regions were associated with decreased or unchanged radial diffusivity coinciding with no change of longitudinal diffusivity in HA vs. SL group. Conversely, HA residents had lower FA in the left optic radiation and left superior longitudinal fasciculus. Our data demonstrates that HA acclimatization is associated with brain structural modifications, including the loss of regional cortical grey matter accompanied by changes in the white matter, which may underlie the physiological adaptation of residents at HA

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Institute of Psychology,Chinese Academy Of Sciences

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

Xiamen University Institutional Repository