6,625 research outputs found
The evaluation of pilots performance and mental workload by eye movement
Pilots make important decisions often using ambiguous information, while under stresses and with very little time. During flight operations detecting the warning light of system failure is a task with real-world application relates to measurement of pilot's performance and eye movement. The demand for a pilotās visual and situational awareness in multiple tasks can be detrimental during pilotsā mental overload conditions. The purpose of this research is to evaluate the relationship between pilotās mental workload and operational performance by eye tracking. Collecting eye movement data during flight operations in a virtual reality of flight simulator provided useful information to analysis participantsā cognitive processes. There were 36 pilots participated in this research, the experience of flight hours between 320 and 2,920, the range of age between 26 and 51 years old. The apparatus included Applied Science Laboratories (ASL) eye tracking, IDF flight simulator and NASA_TLX for data collection. The results show that pilots with high SA detecting hydraulic malfunction have shorter total fixation duration on Air Speed Indicator and longer total fixation duration on Altitude Indicator, Vertical Speed Indicator, Right multi-display and Left multi-display compared with pilots without detecting the signal of hydraulic malfunction. Pilotsā total fixation time on Integration Control Panel, Altitude Indicator, Attitude Indicator and Right Multi-display, and pilotsā subjective rating on NASA-TLX effort dimension for the mission of close pattern have significant relationship with pilotsā performance on the operational time for completing the tactic mission. Experienced pilots operate aircraft familiar with monitoring Airspeed Indicator and kinetic maneuvering result in less fuel consumption. This study could provide guidelines for future training design to reduce pilots mental workload and improve situational awareness for enhancing flight safety
Spectral Analysis for Semantic Segmentation with Applications on Feature Truncation and Weak Annotation
We propose spectral analysis to investigate the correlation between the
accuracy and the resolution of segmentation maps for semantic segmentation. The
current networks predict segmentation maps on the down-sampled grid of images
to alleviate the computational cost. Moreover, these networks can be trained by
weak annotations that utilize only the coarse contour of segmentation maps.
Despite the successful achievement of these works utilizing the low-frequency
information of segmentation maps, however, the accuracy of resultant
segmentation maps may also be degraded in the regions near object boundaries.
It is yet unclear for a theoretical guideline to determine an optimal
down-sampled grid to strike the balance between the cost and the accuracy of
segmentation. We analyze the objective function (cross-entropy) and network
back-propagation process in frequency domain. We discover that cross-entropy
and key features of CNN are mainly contributed by the low-frequency components
of segmentation maps. This further provides us quantitative results to
determine the efficacy of down-sampled grid of segmentation maps. The analysis
is then validated on the two applications: the feature truncation method and
the block-wise annotation that limit the high-frequency components of the CNN
features and annotation, respectively. The results agree with our analysis.
Thus the success of the existing work utilizing low-frequency information of
segmentation maps now has theoretical foundation.Comment: 21 page
Improved Noisy Student Training for Automatic Speech Recognition
Recently, a semi-supervised learning method known as "noisy student training"
has been shown to improve image classification performance of deep networks
significantly. Noisy student training is an iterative self-training method that
leverages augmentation to improve network performance. In this work, we adapt
and improve noisy student training for automatic speech recognition, employing
(adaptive) SpecAugment as the augmentation method. We find effective methods to
filter, balance and augment the data generated in between self-training
iterations. By doing so, we are able to obtain word error rates (WERs)
4.2%/8.6% on the clean/noisy LibriSpeech test sets by only using the clean 100h
subset of LibriSpeech as the supervised set and the rest (860h) as the
unlabeled set. Furthermore, we are able to achieve WERs 1.7%/3.4% on the
clean/noisy LibriSpeech test sets by using the unlab-60k subset of LibriLight
as the unlabeled set for LibriSpeech 960h. We are thus able to improve upon the
previous state-of-the-art clean/noisy test WERs achieved on LibriSpeech 100h
(4.74%/12.20%) and LibriSpeech (1.9%/4.1%).Comment: 5 pages, 5 figures, 4 tables; v2: minor revisions, reference adde
No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models
For decades, context-dependent phonemes have been the dominant sub-word unit
for conventional acoustic modeling systems. This status quo has begun to be
challenged recently by end-to-end models which seek to combine acoustic,
pronunciation, and language model components into a single neural network. Such
systems, which typically predict graphemes or words, simplify the recognition
process since they remove the need for a separate expert-curated pronunciation
lexicon to map from phoneme-based units to words. However, there has been
little previous work comparing phoneme-based versus grapheme-based sub-word
units in the end-to-end modeling framework, to determine whether the gains from
such approaches are primarily due to the new probabilistic model, or from the
joint learning of the various components with grapheme-based units.
In this work, we conduct detailed experiments which are aimed at quantifying
the value of phoneme-based pronunciation lexica in the context of end-to-end
models. We examine phoneme-based end-to-end models, which are contrasted
against grapheme-based ones on a large vocabulary English Voice-search task,
where we find that graphemes do indeed outperform phonemes. We also compare
grapheme and phoneme-based approaches on a multi-dialect English task, which
once again confirm the superiority of graphemes, greatly simplifying the system
for recognizing multiple dialects
Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
The subjective perception of emotion leads to inconsistent labels from human
annotators. Typically, utterances lacking majority-agreed labels are excluded
when training an emotion classifier, which cause problems when encountering
ambiguous emotional expressions during testing. This paper investigates three
methods to handle ambiguous emotion. First, we show that incorporating
utterances without majority-agreed labels as an additional class in the
classifier reduces the classification performance of the other emotion classes.
Then, we propose detecting utterances with ambiguous emotions as out-of-domain
samples by quantifying the uncertainty in emotion classification using
evidential deep learning. This approach retains the classification accuracy
while effectively detects ambiguous emotion expressions. Furthermore, to obtain
fine-grained distinctions among ambiguous emotions, we propose representing
emotion as a distribution instead of a single class label. The task is thus
re-framed from classification to distribution estimation where every individual
annotation is taken into account, not just the majority opinion. The evidential
uncertainty measure is extended to quantify the uncertainty in emotion
distribution estimation. Experimental results on the IEMOCAP and CREMA-D
datasets demonstrate the superior capability of the proposed method in terms of
majority class prediction, emotion distribution estimation, and uncertainty
estimation
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
Attention-based encoder-decoder architectures such as Listen, Attend, and
Spell (LAS), subsume the acoustic, pronunciation and language model components
of a traditional automatic speech recognition (ASR) system into a single neural
network. In previous work, we have shown that such architectures are comparable
to state-of-theart ASR systems on dictation tasks, but it was not clear if such
architectures would be practical for more challenging tasks such as voice
search. In this work, we explore a variety of structural and optimization
improvements to our LAS model which significantly improve performance. On the
structural side, we show that word piece models can be used instead of
graphemes. We also introduce a multi-head attention architecture, which offers
improvements over the commonly-used single-head attention. On the optimization
side, we explore synchronous training, scheduled sampling, label smoothing, and
minimum word error rate optimization, which are all shown to improve accuracy.
We present results with a unidirectional LSTM encoder for streaming
recognition. On a 12, 500 hour voice search task, we find that the proposed
changes improve the WER from 9.2% to 5.6%, while the best conventional system
achieves 6.7%; on a dictation task our model achieves a WER of 4.1% compared to
5% for the conventional system.Comment: ICASSP camera-ready versio
Secondary Metabolites from the Leaves of Aquilaria agallocha
Twelve compounds, including three flavonoids, 5-hydroxy-4ĆĀ¢,7- dimethoxyflavone (1) [22], luteolin-7,3ĆĀ¢,4ĆĀ¢-trimethyl ether (2) and 5,3ĆĀ¢- dihydroxy-7,4ĆĀ¢-dimethoxyflavone (3), five benzenoids, methylparaben (4), vanillic acid (5), p-hydroxybenzoic acid (6), syringic acid (7), and isovanillic acid (8) and four steroids, b-sitosterol (9), stigmasterol (10), b-sitostenone (11) and stigmasta-4,22-dien-3- one (12) were isolated from the leaves of Aquilaria agallocha (Thymelaeaceae). All of these compounds (1-12) were obtained for the first time from the leaves of this plant
Single-crystalline Ī“-Ni2Si nanowires with excellent physical properties
[[abstract]]In this article, we report the synthesis of single-crystalline nickel silicide nanowires (NWs) via chemical vapor deposition method using NiCl2Ā·6H2O as a single-source precursor. Various morphologies of Ī“-Ni2Si NWs were successfully acquired by controlling the growth conditions. The growth mechanism of the Ī“-Ni2Si NWs was thoroughly discussed and identified with microscopy studies. Field emission measurements show a low turn-on field (4.12 V/Ī¼m), and magnetic property measurements show a classic ferromagnetic characteristic, which demonstrates promising potential applications for field emitters, magnetic storage, and biological cell separation.[[notice]]č£ę£å®ē¢[[incitationindex]]SCI[[booktype]]é»åē[[booktype]]ē“
Geographical heterogeneity and influenza infection within households
Although it has been suggested that schoolchildren vaccination reduces influenza morbidity and mortality in the community, it is unknown whether geographical heterogeneity would affect vaccine effectiveness
- ā¦