6,625 research outputs found

    The evaluation of pilots performance and mental workload by eye movement

    Get PDF
    Pilots make important decisions often using ambiguous information, while under stresses and with very little time. During flight operations detecting the warning light of system failure is a task with real-world application relates to measurement of pilot's performance and eye movement. The demand for a pilotā€™s visual and situational awareness in multiple tasks can be detrimental during pilotsā€™ mental overload conditions. The purpose of this research is to evaluate the relationship between pilotā€™s mental workload and operational performance by eye tracking. Collecting eye movement data during flight operations in a virtual reality of flight simulator provided useful information to analysis participantsā€™ cognitive processes. There were 36 pilots participated in this research, the experience of flight hours between 320 and 2,920, the range of age between 26 and 51 years old. The apparatus included Applied Science Laboratories (ASL) eye tracking, IDF flight simulator and NASA_TLX for data collection. The results show that pilots with high SA detecting hydraulic malfunction have shorter total fixation duration on Air Speed Indicator and longer total fixation duration on Altitude Indicator, Vertical Speed Indicator, Right multi-display and Left multi-display compared with pilots without detecting the signal of hydraulic malfunction. Pilotsā€™ total fixation time on Integration Control Panel, Altitude Indicator, Attitude Indicator and Right Multi-display, and pilotsā€™ subjective rating on NASA-TLX effort dimension for the mission of close pattern have significant relationship with pilotsā€™ performance on the operational time for completing the tactic mission. Experienced pilots operate aircraft familiar with monitoring Airspeed Indicator and kinetic maneuvering result in less fuel consumption. This study could provide guidelines for future training design to reduce pilots mental workload and improve situational awareness for enhancing flight safety

    Spectral Analysis for Semantic Segmentation with Applications on Feature Truncation and Weak Annotation

    Full text link
    We propose spectral analysis to investigate the correlation between the accuracy and the resolution of segmentation maps for semantic segmentation. The current networks predict segmentation maps on the down-sampled grid of images to alleviate the computational cost. Moreover, these networks can be trained by weak annotations that utilize only the coarse contour of segmentation maps. Despite the successful achievement of these works utilizing the low-frequency information of segmentation maps, however, the accuracy of resultant segmentation maps may also be degraded in the regions near object boundaries. It is yet unclear for a theoretical guideline to determine an optimal down-sampled grid to strike the balance between the cost and the accuracy of segmentation. We analyze the objective function (cross-entropy) and network back-propagation process in frequency domain. We discover that cross-entropy and key features of CNN are mainly contributed by the low-frequency components of segmentation maps. This further provides us quantitative results to determine the efficacy of down-sampled grid of segmentation maps. The analysis is then validated on the two applications: the feature truncation method and the block-wise annotation that limit the high-frequency components of the CNN features and annotation, respectively. The results agree with our analysis. Thus the success of the existing work utilizing low-frequency information of segmentation maps now has theoretical foundation.Comment: 21 page

    Improved Noisy Student Training for Automatic Speech Recognition

    Full text link
    Recently, a semi-supervised learning method known as "noisy student training" has been shown to improve image classification performance of deep networks significantly. Noisy student training is an iterative self-training method that leverages augmentation to improve network performance. In this work, we adapt and improve noisy student training for automatic speech recognition, employing (adaptive) SpecAugment as the augmentation method. We find effective methods to filter, balance and augment the data generated in between self-training iterations. By doing so, we are able to obtain word error rates (WERs) 4.2%/8.6% on the clean/noisy LibriSpeech test sets by only using the clean 100h subset of LibriSpeech as the supervised set and the rest (860h) as the unlabeled set. Furthermore, we are able to achieve WERs 1.7%/3.4% on the clean/noisy LibriSpeech test sets by using the unlab-60k subset of LibriLight as the unlabeled set for LibriSpeech 960h. We are thus able to improve upon the previous state-of-the-art clean/noisy test WERs achieved on LibriSpeech 100h (4.74%/12.20%) and LibriSpeech (1.9%/4.1%).Comment: 5 pages, 5 figures, 4 tables; v2: minor revisions, reference adde

    No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

    Full text link
    For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based units to words. However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units. In this work, we conduct detailed experiments which are aimed at quantifying the value of phoneme-based pronunciation lexica in the context of end-to-end models. We examine phoneme-based end-to-end models, which are contrasted against grapheme-based ones on a large vocabulary English Voice-search task, where we find that graphemes do indeed outperform phonemes. We also compare grapheme and phoneme-based approaches on a multi-dialect English task, which once again confirm the superiority of graphemes, greatly simplifying the system for recognizing multiple dialects

    Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation

    Full text link
    The subjective perception of emotion leads to inconsistent labels from human annotators. Typically, utterances lacking majority-agreed labels are excluded when training an emotion classifier, which cause problems when encountering ambiguous emotional expressions during testing. This paper investigates three methods to handle ambiguous emotion. First, we show that incorporating utterances without majority-agreed labels as an additional class in the classifier reduces the classification performance of the other emotion classes. Then, we propose detecting utterances with ambiguous emotions as out-of-domain samples by quantifying the uncertainty in emotion classification using evidential deep learning. This approach retains the classification accuracy while effectively detects ambiguous emotion expressions. Furthermore, to obtain fine-grained distinctions among ambiguous emotions, we propose representing emotion as a distribution instead of a single class label. The task is thus re-framed from classification to distribution estimation where every individual annotation is taken into account, not just the majority opinion. The evidential uncertainty measure is extended to quantify the uncertainty in emotion distribution estimation. Experimental results on the IEMOCAP and CREMA-D datasets demonstrate the superior capability of the proposed method in terms of majority class prediction, emotion distribution estimation, and uncertainty estimation

    State-of-the-art Speech Recognition With Sequence-to-Sequence Models

    Full text link
    Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. In previous work, we have shown that such architectures are comparable to state-of-theart ASR systems on dictation tasks, but it was not clear if such architectures would be practical for more challenging tasks such as voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural side, we show that word piece models can be used instead of graphemes. We also introduce a multi-head attention architecture, which offers improvements over the commonly-used single-head attention. On the optimization side, we explore synchronous training, scheduled sampling, label smoothing, and minimum word error rate optimization, which are all shown to improve accuracy. We present results with a unidirectional LSTM encoder for streaming recognition. On a 12, 500 hour voice search task, we find that the proposed changes improve the WER from 9.2% to 5.6%, while the best conventional system achieves 6.7%; on a dictation task our model achieves a WER of 4.1% compared to 5% for the conventional system.Comment: ICASSP camera-ready versio

    Secondary Metabolites from the Leaves of Aquilaria agallocha

    Get PDF
    Twelve compounds, including three flavonoids, 5-hydroxy-4ƂĀ¢,7- dimethoxyflavone (1) [22], luteolin-7,3ƂĀ¢,4ƂĀ¢-trimethyl ether (2) and 5,3ƂĀ¢- dihydroxy-7,4ƂĀ¢-dimethoxyflavone (3), five benzenoids, methylparaben (4), vanillic acid (5), p-hydroxybenzoic acid (6), syringic acid (7), and isovanillic acid (8) and four steroids, b-sitosterol (9), stigmasterol (10), b-sitostenone (11) and stigmasta-4,22-dien-3- one (12) were isolated from the leaves of Aquilaria agallocha (Thymelaeaceae). All of these compounds (1-12) were obtained for the first time from the leaves of this plant

    Single-crystalline Ī“-Ni2Si nanowires with excellent physical properties

    Get PDF
    [[abstract]]In this article, we report the synthesis of single-crystalline nickel silicide nanowires (NWs) via chemical vapor deposition method using NiCl2Ā·6H2O as a single-source precursor. Various morphologies of Ī“-Ni2Si NWs were successfully acquired by controlling the growth conditions. The growth mechanism of the Ī“-Ni2Si NWs was thoroughly discussed and identified with microscopy studies. Field emission measurements show a low turn-on field (4.12 V/Ī¼m), and magnetic property measurements show a classic ferromagnetic characteristic, which demonstrates promising potential applications for field emitters, magnetic storage, and biological cell separation.[[notice]]č£œę­£å®Œē•¢[[incitationindex]]SCI[[booktype]]電子ē‰ˆ[[booktype]]ē“™

    Geographical heterogeneity and influenza infection within households

    Get PDF
    Although it has been suggested that schoolchildren vaccination reduces influenza morbidity and mortality in the community, it is unknown whether geographical heterogeneity would affect vaccine effectiveness
    • ā€¦
    corecore