5,055 research outputs found

    Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

    Full text link
    Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360 degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of a vision `teacher' method and a sound `student' method -- the student method is trained to generate the same results as the teacher method. This way, the auditory system can be trained without using human annotations. We also propose two auxiliary tasks namely, a) a novel task on Spatial Sound Super-resolution to increase the spatial resolution of sounds, and b) dense depth prediction of the scene. We then formulate the three tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results on the dataset show that 1) our method achieves promising results for semantic prediction and the two auxiliary tasks; and 2) the three tasks are mutually beneficial -- training them together achieves the best performance and 3) the number and orientations of microphones are both important. The data and code will be released to facilitate the research in this new direction.Comment: Project page: https://www.trace.ethz.ch/publications/2020/sound_perception/index.htm

    Considering Human Factors in Risk Maps for Robust and Foresighted Driver Warning

    Full text link
    Driver support systems that include human states in the support process is an active research field. Many recent approaches allow, for example, to sense the driver's drowsiness or awareness of the driving situation. However, so far, this rich information has not been utilized much for improving the effectiveness of support systems. In this paper, we therefore propose a warning system that uses human states in the form of driver errors and can warn users in some cases of upcoming risks several seconds earlier than the state of the art systems not considering human factors. The system consists of a behavior planner Risk Maps which directly changes its prediction of the surrounding driving situation based on the sensed driver errors. By checking if this driver's behavior plan is objectively safe, a more robust and foresighted driver warning is achieved. In different simulations of a dynamic lane change and intersection scenarios, we show how the driver's behavior plan can become unsafe, given the estimate of driver errors, and experimentally validate the advantages of considering human factors

    An on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoods

    Get PDF
    Preprint del artículo públicado online el 31 de mayo 2018Voice activity detection (VAD) is an essential task in expert systems that rely on oral interfaces. The VAD module detects the presence of human speech and separates speech segments from silences and non-speech noises. The most popular current on-line VAD systems are based on adaptive parameters which seek to cope with varying channel and noise conditions. The main disadvantages of this approach are the need for some initialisation time to properly adjust the parameters to the incoming signal and uncertain performance in the case of poor estimation of the initial parameters. In this paper we propose a novel on-line VAD based only on previous training which does not introduce any delay. The technique is based on a strategy that we have called Multi-Normalisation Scoring (MNS). It consists of obtaining a vector of multiple observation likelihood scores from normalised mel-cepstral coefficients previously computed from different databases. A classifier is then used to label the incoming observation likelihood vector. Encouraging results have been obtained with a Multi-Layer Perceptron (MLP). This technique can generalise for unseen noise levels and types. A validation experiment with two current standard ITU-T VAD algorithms demonstrates the good performance of the method. Indeed, lower classification error rates are obtained for non-speech frames, while results for speech frames are similar.This work was partially supported by the EU (ERDF) under grant TEC2015-67163-C2-1-R (RESTORE) (MINECO/ERDF, EU) and by the Basque Government under grant KK-2017/00043 (BerbaOla)

    Memory Biases in Left Versus Right Implied Motion

    Get PDF
    People remember moving objects as having moved farther along in their path of motion than is actually the case; this is known as representational momentum (RM). Some authors have argued that RM is an internalization of environmental properties such as physical momentum and gravity. Five experiments demonstrated that a similar memory bias could not have been learned from the environment. For right-handed Ss, objects apparently moving to the right engendered a larger memory bias in the direction of motion than did those moving to the left. This effect, clearly not derived from real-world lateral asymmetries, was relatively insensitive to changes in apparent velocity and the type of object used, and it may be confined to objects in the left half of visual space. The left–right effect may be an intrinsic property of the visual operating system, which may in turn have affected certain cultural conventions of left and right in art and other domains. (PsycINFO Database Record (c) 2016 APA, all rights reserved

    The Eye: A Light Weight Mobile Application for Visually Challenged People Using Improved YOLOv5l Algorithm

    Get PDF
    The eye is an essential sensory organ that allows us to perceive our surroundings at a glance. Losing this sense can result in numerous challenges in daily life. However, society is designed for the majority, which can create even more difficulties for visually impaired individuals. Therefore, empowering them and promoting self-reliance are crucial. To address this need, we propose a new Android application called “The Eye” that utilizes Machine Learning (ML)-based object detection techniques to recognize objects in real-time using a smartphone camera or a camera attached to a stick. The article proposed an improved YOLOv5l algorithm to improve object detection in visual applications. YOLOv5l has a larger model size and captures more complex features and details, leading to enhanced object detection accuracy compared to smaller variants like YOLOv5s and YOLOv5m. The primary enhancement in the improved YOLOv5l algorithm is integrating L1 and L2 regularization techniques. These techniques prevent overfitting and improve generalization by adding a regularization term to the loss function during training. Our approach combines image processing and text-to-speech conversion modules to produce reliable results. The Android text-to-speech module is then used to convert the object recognition results into an audio output. According to the experimental results, the improved YOLOv5l has higher detection accuracy than the original YOLOv5 and can detect small, multiple, and overlapped targets with higher accuracy. This study contributes to the advancement of technology to help visually impaired individuals become more self-sufficient and confident. Doi: 10.28991/ESJ-2023-07-05-011 Full Text: PD

    When object color is a red herring: extraneous perceptual information hinders word learning via referent selection

    Get PDF
    Learning words from ambiguous naming events is difficult. In such situations, children struggle with not attending to task irrelevant information when learning object names. The current study reduces the problem space of learning names for object categories by holding color constant between the target and other extraneous objects. We examine how this influences two types of word learning (retention and generalization) in both 30-month-old children (Experiment 1) and the iCub humanoid robot (Experiment 2). Overall, all children and iCub performed well on the retention trials, but they were only able to generalize the novel names to new exemplars of the target categories if the objects were originally encountered in sets with objects of the same colors, not if the objects were originally encountered in sets with objects of different colors. These data demonstrate that less information presented during the learning phase narrows the problem space and leads to better word learning success for both children and iCub. Findings are discussed in terms of cognitive load and desirable difficulties

    Robust Models for Operator Workload Estimation

    Get PDF
    When human-machine system operators are overwhelmed, judicious employment of automation can be beneficial. Ideally, a system which can accurately estimate current operator workload can make better choices when to employ automation. Supervised machine learning models can be trained to estimate workload in real time from operator physiological data. Unfortunately, estimating operator workload using trained models is limited: using a model trained in one context can yield poor estimation of workload in another. This research examines the utility of three algorithms (linear regression, regression trees, and Artificial Neural Networks) in terms of cross-application workload prediction. The study is conducted for a remotely piloted aircraft simulation under several context-switch scenarios -- across two tasks, four task conditions, and seven human operators. Regression tree models were able to cross predict both task conditions of one task type within a reasonable level of error, and could accurately predict workload for one operator when trained on data from the other six. Six physiological input subsets were identified based on method of measurement, and were shown to produce superior cross-application models compared to models utilizing all input features in certain instances. Models utilizing only EEG features show the most potential for decreasing cross application error

    Advances in Intelligent Vehicle Control

    Get PDF
    This book is a printed edition of the Special Issue Advances in Intelligent Vehicle Control that was published in the journal Sensors. It presents a collection of eleven papers that covers a range of topics, such as the development of intelligent control algorithms for active safety systems, smart sensors, and intelligent and efficient driving. The contributions presented in these papers can serve as useful tools for researchers who are interested in new vehicle technology and in the improvement of vehicle control systems
    corecore