24 research outputs found

    Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

    Full text link
    Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360 degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of a vision `teacher' method and a sound `student' method -- the student method is trained to generate the same results as the teacher method. This way, the auditory system can be trained without using human annotations. We also propose two auxiliary tasks namely, a) a novel task on Spatial Sound Super-resolution to increase the spatial resolution of sounds, and b) dense depth prediction of the scene. We then formulate the three tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results on the dataset show that 1) our method achieves promising results for semantic prediction and the two auxiliary tasks; and 2) the three tasks are mutually beneficial -- training them together achieves the best performance and 3) the number and orientations of microphones are both important. The data and code will be released to facilitate the research in this new direction.Comment: Project page: https://www.trace.ethz.ch/publications/2020/sound_perception/index.htm

    Behavioral and molecular genetics of reading-related AM and FM detection thresholds

    Get PDF
    Auditory detection thresholds for certain frequencies of both amplitude modulated (AM) and frequency modulated (FM) dynamic auditory stimuli are associated with reading in typically developing and dyslexic readers. We present the first behavioral and molecular genetic characterization of these two auditory traits. Two extant extended family datasets were given reading tasks and psychoacoustic tasks to determine FM 2 Hz and AM 20 Hz sensitivity thresholds. Univariate heritabilities were significant for both AM (h2 = 0.20) and FM (h2 = 0.29). Bayesian posterior probability of linkage (PPL) analysis found loci for AM (12q, PPL = 81 %) and FM (10p, PPL = 32 %; 20q, PPL = 65 %). Bivariate heritability analyses revealed that FM is genetically correlated with reading, while AM was not. Bivariate PPL analysis indicates that FM loci (10p, 20q) are not also associated with reading

    Sensory theories of developmental dyslexia: three challenges for research.

    Get PDF
    Recent years have seen the publication of a range of new theories suggesting that the basis of dyslexia might be sensory dysfunction. In this Opinion article, the evidence for and against several prominent sensory theories of dyslexia is closely scrutinized. Contrary to the causal claims being made, my analysis suggests that many proposed sensory deficits might result from the effects of reduced reading experience on the dyslexic brain. I therefore suggest that longitudinal studies of sensory processing, beginning in infancy, are required to successfully identify the neural basis of developmental dyslexia. Such studies could have a powerful impact on remediation.This is the accepted manuscript. The final version is available from NPG at http://www.nature.com/nrn/journal/v16/n1/abs/nrn3836.html

    Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

    No full text
    Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360 degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of a vision `teacher' method and a sound `student' method -- the student method is trained to generate the same results as the teacher method. This way, the auditory system can be trained without using human annotations. We also propose two auxiliary tasks namely, a) a novel task on Spatial Sound Super-resolution to increase the spatial resolution of sounds, and b) dense depth prediction of the scene. We then formulate the three tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results on the dataset show that 1) our method achieves good results for all the three tasks; and 2) the three tasks are mutually beneficial -- training them together achieves the best performance and 3) the number and the orientations of microphones are both important.ISSN:0302-9743ISSN:1611-334
    corecore