11,090 research outputs found

    Long-term learning behavior in a recurrent neural network for sound recognition

    Get PDF
    In this paper, the long-term learning properties of an artificial neural network model, designed for sound recognition and computational auditory scene analysis in general, are investigated. The model is designed to run for long periods of time (weeks to months) on low-cost hardware, used in a noise monitoring network, and builds upon previous work by the same authors. It consists of three neural layers, connected to each other by feedforward and feedback excitatory connections. It is shown that the different mechanisms that drive auditory attention emerge naturally from the way in which neural activation and intra-layer inhibitory connections are implemented in the model. Training of the artificial neural network is done following the Hebb principle, dictating that "Cells that fire together, wire together", with some important modifications, compared to standard Hebbian learning. As the model is designed to be on-line for extended periods of time, also learning mechanisms need to be adapted to this. The learning needs to be strongly attention-and saliency-driven, in order not to waste available memory space for sounds that are of no interest to the human listener. The model also implements plasticity, in order to deal with new or changing input over time, without catastrophically forgetting what it already learned. On top of that, it is shown that also the implementation of shortterm memory plays an important role in the long-term learning properties of the model. The above properties are investigated and demonstrated by training on real urban sound recordings

    Computing driver tiredness and fatigue in automobile via eye tracking and body movements

    Get PDF
    The aim of this paper is to classify the driver tiredness and fatigue in automobile via eye tracking and body movements using deep learning based Convolutional Neural Network (CNN) algorithm. Vehicle driver face localization serves as one of the most widely used real-world applications in fields like toll control, traffic accident scene analysis, and suspected vehicle tracking. The research proposed a CNN classifier for simultaneously localizing the region of human face and eye positioning. The classifier, rather than bounding rectangles, gives bounding quadrilaterals, which gives a more precise indication for vehicle driver face localization. The adjusted regions are preprocessed to remove noise and passed to the CNN classifier for real time processing. The preprocessing of the face features extracts connected components, filters them by size, and groups them into face expressions. The employed CNN is the well-known technology for human face recognition. One we aim to extract the facial landmarks from the frames, we will then leverage classification models and deep learning based convolutional neural networks that predict the state of the driver as 'Alert' or 'Drowsy' for each of the frames extracted. The CNN model could predict the output state labels (Alert/Drowsy) for each frame, but we wanted to take care of sequential image frames as that is extremely important while predicting the state of an individual. The process completes, if all regions have a sufficiently high score or a fixed number of retries are exhausted. The output consists of the detected human face type, the list of regions including the extracted mouth and eyes with recognition reliability through CNN with an accuracy of 98.57% with 100 epochs of training and testing

    Attention Allocation Aid for Visual Search

    Full text link
    This paper outlines the development and testing of a novel, feedback-enabled attention allocation aid (AAAD), which uses real-time physiological data to improve human performance in a realistic sequential visual search task. Indeed, by optimizing over search duration, the aid improves efficiency, while preserving decision accuracy, as the operator identifies and classifies targets within simulated aerial imagery. Specifically, using experimental eye-tracking data and measurements about target detectability across the human visual field, we develop functional models of detection accuracy as a function of search time, number of eye movements, scan path, and image clutter. These models are then used by the AAAD in conjunction with real time eye position data to make probabilistic estimations of attained search accuracy and to recommend that the observer either move on to the next image or continue exploring the present image. An experimental evaluation in a scenario motivated from human supervisory control in surveillance missions confirms the benefits of the AAAD.Comment: To be presented at the ACM CHI conference in Denver, Colorado in May 201

    Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements

    Full text link
    Emotion evoked by an advertisement plays a key role in influencing brand recall and eventual consumer choices. Automatic ad affect recognition has several useful applications. However, the use of content-based feature representations does not give insights into how affect is modulated by aspects such as the ad scene setting, salient object attributes and their interactions. Neither do such approaches inform us on how humans prioritize visual information for ad understanding. Our work addresses these lacunae by decomposing video content into detected objects, coarse scene structure, object statistics and actively attended objects identified via eye-gaze. We measure the importance of each of these information channels by systematically incorporating related information into ad affect prediction models. Contrary to the popular notion that ad affect hinges on the narrative and the clever use of linguistic and social cues, we find that actively attended objects and the coarse scene structure better encode affective information as compared to individual scene objects or conspicuous background elements.Comment: Accepted for publication in the Proceedings of 20th ACM International Conference on Multimodal Interaction, Boulder, CO, US

    Machine Analysis of Facial Expressions

    Get PDF
    No abstract
    corecore