Search CORE

29,550 research outputs found

Assessing neural network scene classification from degraded images

Author: Cooper EA
Cullen NC
Greene MR
Tadros T
Publication venue: eScholarship, University of California
Publication date: 01/09/2019
Field of study

Scene recognition is an essential component of both machine and biological vision. Recent advances in computer vision using deep convolutional neural networks (CNNs) have demonstrated impressive sophistication in scene recognition, through training on large datasets of labeled scene images (Zhou et al. 2018, 2014). One criticism of CNN-based approaches is that performance may not generalize well beyond the training image set (Torralba and Efros 2011), and may be hampered by minor image modifications, which in some cases are barely perceptible to the human eye (Goodfellow et al. 2015; Szegedy et al. 2013). While these “adversarial examples” may be unlikely in natural contexts, during many real-world visual tasks scene information can be degraded or limited due to defocus blur, camera motion, sensor noise, or occluding objects. Here, we quantify the impact of several image degradations (some common, and some more exotic) on indoor/outdoor scene classification using CNNs. For comparison, we use human observers as a benchmark, and also evaluate performance against classifiers using limited, manually selected descriptors. While the CNNs outperformed the other classifiers and rivaled human accuracy for intact images, our results show that their classification accuracy is more affected by image degradations than human observers. On a practical level, however, accuracy of the CNNs remained well above chance for a wide range of image manipulations that disrupted both local and global image statistics. We also examine the level of image-by-image agreement with human observers, and find that the CNNs' agreement with observers varied as a function of the nature of image manipulation. In many cases, this agreement was not substantially different from the level one would expect to observe for two independent classifiers. Together, these results suggest that CNN-based scene classification techniques are relatively robust to several image degradations. However, the pattern of classifications obtained for ambiguous images does not appear to closely reflect the strategies employed by human observers

Bates College: SCARAB (Scholarly Communication and Research at Bates)

eScholarship - University of California

Recommended from our members

Multi-line Adaptive Perimetry (MAP): A New Procedure for Quantifying Visual Field Integrity for Rapid Assessment of Macular Diseases.

Author: Biles Mandy K
Davey Pinakin G
Maniglia Marcello
Seitz Aaron R
Thurman Steven M
Visscher Kristina M
Publication venue: eScholarship, University of California
Publication date: 01/09/2018
Field of study

PurposeIn order to monitor visual defects associated with macular degeneration (MD), we present a new psychophysical assessment called multiline adaptive perimetry (MAP) that measures visual field integrity by simultaneously estimating regions associated with perceptual distortions (metamorphopsia) and visual sensitivity loss (scotoma).MethodsWe first ran simulations of MAP with a computerized model of a human observer to determine optimal test design characteristics. In experiment 1, predictions of the model were assessed by simulating metamorphopsia with an eye-tracking device with 20 healthy vision participants. In experiment 2, eight patients (16 eyes) with macular disease completed two MAP assessments separated by about 12 weeks, while a subset (10 eyes) also completed repeated Macular Integrity Assessment (MAIA) microperimetry and Amsler grid exams.ResultsResults revealed strong repeatability of MAP and high accuracy, sensitivity, and specificity (0.89, 0.81, and 0.90, respectively) in classifying patient eyes with severe visual impairment. We also found a significant relationship in terms of the spatial patterns of performance across visual field loci derived from MAP and MAIA microperimetry. However, there was a lack of correspondence between MAP and subjective Amsler grid reports in isolating perceptually distorted regions.ConclusionsThese results highlight the validity and efficacy of MAP in producing quantitative maps of visual field disturbances, including simultaneous mapping of metamorphopsia and sensitivity impairment.Translational relevanceFuture work will be needed to assess applicability of this examination for potential early detection of MD symptoms and/or portable assessment on a home device or computer

eScholarship - University of California

A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes

Author: Browing Andrew N.
Grossberg Stephen
Mingolla Ennio
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/12/2008
Field of study

Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016

Boston University Institutional Repository (OpenBU)

Recommended from our members

Predicting glaucomatous visual field deterioration through short multivariate time series modelling

Author: Liu X
Swift S
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

In bio-medical domains there are many applications involving the modelling of multivariate time series (MTS) data. One area that has been largely overlooked so far is the particular type of time series where the data set consists of a large number of variables but with a small number of observations. In this paper we describe the development of a novel computational method based on genetic algorithms that bypasses the size restrictions of traditional statistical MTS methods, makes no distribution assumptions, and also locates the order and associated parameters as a whole step. We apply this method to the prediction and modelling of glaucomatous visual field deterioration

Brunel University Research Archive

A neural network approach to audio-assisted movie dialogue detection

Author: Alatan
Birge
Constantine Kotropoulos
Emmanouil Benetos
Freund
Freund
Hosmer
Ioannis Pitas
Jelinek
Kotti
Král
Lehane
Margarita Kotti
Papoulis
Platt
Reiss
Stoica
Trelea
Webb
Zhai
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons, voted perceptrons, radial basis function networks, support vector machines, and particle swarm optimization-based multilayer perceptrons are tested. Experiments are carried out to validate the feasibility of the aforementioned approach by using ground-truth indicator functions determined by human observers on 6 different movies. A total of 41 dialogue instances and another 20 non-dialogue instances is employed. The average detection accuracy achieved is high, ranging between 84.78%±5.499% and 91.43%±4.239%

CiteSeerX

City Research Online

Crossref

Spiral - Imperial College Digital Repository