67,347 research outputs found

    Self-Supervised Audio-Visual Co-Segmentation

    Full text link
    Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data. In this paper we develop a neural network model for visual object segmentation and sound source separation that learns from natural videos through self-supervision. The model is an extension of recently proposed work that maps image pixels to sounds. Here, we introduce a learning approach to disentangle concepts in the neural networks, and assign semantic categories to network feature channels to enable independent image segmentation and sound source separation after audio-visual training on videos. Our evaluations show that the disentangled model outperforms several baselines in semantic segmentation and sound source separation.Comment: Accepted to ICASSP 201

    Voltage-sensitive dye imaging reveals tonotopic organization of auditory cortex spontaneous activity

    Get PDF
    Imaging neural activity across a large (several mm) cortical area with high temporal and spatial resolution is desirable, for example in the auditory system to measure cortical processing across a broad frequency spectrum. Voltage-sensitive dye imaging (VSDI) has a unique combination of properties making this possible, but so far studies have been limited to studying simple sparsely-presented sensory stimuli. We demonstrate the feasibility of long-acquisition VSDI (using the dye RH-1691) in auditory cortex while presenting complex time-varying acoustic stimuli or silence. Using a dense array of partially-overlapping 50 ms tone pips (8 frequencies per octave spanning six octaves), we obtained high-resolution spectrotemporal receptive fields (STRFs) simultaneously across the majority of the guinea pig primary auditory cortical fields (A1 and DC). Long epochs of spontaneous activity were also measured, permitting a comparison of spontaneous activity patterns with functional architecture. By grouping all pixels in areas A1 and DC according to sound frequency preference (obtained from STRFs), we reveal that spontaneous activity (such as cortical spindles) show complex spatial patterns, which are organized according to sound frequency preference within and across cortical areas. More specifically, spontaneous activity correlation decreases as frequency preference diverges within A1 or DC; but additionally, pixels in A1 are also highly correlated with (even far-away) pixels in DC sharing similar frequency preference. These properties of patterned cortical spontaneous activity constrain mechanistic hypotheses regarding their genesis. Beyond these observations, the feasibility of VSDI with continuous stimulation or silence permits measuring population activity during long-lasting sound patterns, which is necessary for examining cortical dynamics and sensory-context dependent processing

    Full Waveform Inversion for Time-Distance Helioseismology

    Full text link
    Inferring interior properties of the Sun from photospheric measurements of the seismic wavefield constitutes the helioseismic inverse problem. Deviations in seismic measurements (such as wave travel times) from their fiducial values estimated for a given model of the solar interior imply that the model is inaccurate. Contemporary inversions in local helioseismology assume that properties of the solar interior are linearly related to measured travel-time deviations. It is widely known, however, that this assumption is invalid for sunspots and active regions, and likely for supergranular flows as well. Here, we introduce nonlinear optimization, executed iteratively, as a means of inverting for the sub-surface structure of large-amplitude perturbations. Defining the penalty functional as the L2L_2 norm of wave travel-time deviations, we compute the the total misfit gradient of this functional with respect to the relevant model parameters %(only sound speed in this case) at each iteration around the corresponding model. The model is successively improved using either steepest descent, conjugate gradient, or quasi-Newton limited-memory BFGS. Performing nonlinear iterations requires privileging pixels (such as those in the near-field of the scatterer), a practice not compliant with the standard assumption of translational invariance. Measurements for these inversions, although similar in principle to those used in time-distance helioseismology, require some retooling. For the sake of simplicity in illustrating the method, we consider a 2-D inverse problem with only a sound-speed perturbation.Comment: 24 pages, 10 figures, to appear in Ap

    Local Visual Microphones: Improved Sound Extraction from Silent Video

    Full text link
    Sound waves cause small vibrations in nearby objects. A few techniques exist in the literature that can extract sound from video. In this paper we study local vibration patterns at different image locations. We show that different locations in the image vibrate differently. We carefully aggregate local vibrations and produce a sound quality that improves state-of-the-art. We show that local vibrations could have a time delay because sound waves take time to travel through the air. We use this phenomenon to estimate sound direction. We also present a novel algorithm that speeds up sound extraction by two to three orders of magnitude and reaches real-time performance in a 20KHz video.Comment: Accepted to BMVC 201

    Benchmarking Image Processing Algorithms for Unmanned Aerial System-Assisted Crack Detection in Concrete Structures

    Get PDF
    This paper summarizes the results of traditional image processing algorithms for detection of defects in concrete using images taken by Unmanned Aerial Systems (UASs). Such algorithms are useful for improving the accuracy of crack detection during autonomous inspection of bridges and other structures, and they have yet to be compared and evaluated on a dataset of concrete images taken by UAS. The authors created a generic image processing algorithm for crack detection, which included the major steps of filter design, edge detection, image enhancement, and segmentation, designed to uniformly compare dierent edge detectors. Edge detection was carried out by six filters in the spatial (Roberts, Prewitt, Sobel, and Laplacian of Gaussian) and frequency (Butterworth and Gaussian) domains. These algorithms were applied to fifty images each of defected and sound concrete. Performances of the six filters were compared in terms of accuracy, precision, minimum detectable crack width, computational time, and noise-to-signal ratio. In general, frequency domain techniques were slower than spatial domain methods because of the computational intensity of the Fourier and inverse Fourier transformations used to move between spatial and frequency domains. Frequency domain methods also produced noisier images than spatial domain methods. Crack detection in the spatial domain using the Laplacian of Gaussian filter proved to be the fastest, most accurate, and most precise method, and it resulted in the finest detectable crack width. The Laplacian of Gaussian filter in spatial domain is recommended for future applications of real-time crack detection using UAS
    • …
    corecore