52 research outputs found
Cortical Dynamics of 3-D Vision and Figure-Ground Pop-Out
Air Force Office of Scientific Research (90-0175); Defense Advanced Research Projects Agency (90-0083); Office of Naval Research (N00014-91-J-4100
How Transferable are Reasoning Patterns in VQA?
Since its inception, Visual Question Answering (VQA) is notoriously known as
a task, where models are prone to exploit biases in datasets to find shortcuts
instead of performing high-level reasoning. Classical methods address this by
removing biases from training data, or adding branches to models to detect and
remove biases. In this paper, we argue that uncertainty in vision is a
dominating factor preventing the successful learning of reasoning in vision and
language problems. We train a visual oracle and in a large scale study provide
experimental evidence that it is much less prone to exploiting spurious dataset
biases compared to standard models. We propose to study the attention
mechanisms at work in the visual oracle and compare them with a SOTA
Transformer-based model. We provide an in-depth analysis and visualizations of
reasoning patterns obtained with an online visualization tool which we make
publicly available (https://reasoningpatterns.github.io). We exploit these
insights by transferring reasoning patterns from the oracle to a SOTA
Transformer-based VQA model taking standard noisy visual inputs via
fine-tuning. In experiments we report higher overall accuracy, as well as
accuracy on infrequent answers for each question type, which provides evidence
for improved generalization and a decrease of the dependency on dataset biases
Pedestrian detection in far infrared images
This paper presents an experimental study on pedestrian classification and detection in far infrared (FIR) images. The study includes an in-depth evaluation of several combinations of features and classifiers, which include features previously used for daylight scenarios, as well as a new descriptor (HOPE - Histograms of Oriented Phase Energy), specifically targeted to infrared images, and a new adaptation of a latent variable SVM approach to FIR images. The presented results are validated on a new classification and detection dataset of FIR images collected in outdoor environments from a moving vehicle. The classification space contains 16152 pedestrians and 65440 background samples evenly selected from several sequences acquired at different temperatures and different illumination conditions. The detection dataset consist on 15224 images with ground truth information. The authors are making this dataset public for benchmarking new detectors in the area of intelligent vehicles and field robotics applications.This work was supported by the Spanish Government
through the Cicyt projects FEDORA (GRANT
TRA2010-20225-C03-01) and Driver Distraction
Detector System (GRANT TRA2011-29454- C03-
02), and the Comunidad de Madrid through the project
SEGVAUTO (S2009/DPI-1509)
Shape Representation in Primate Visual Area 4 and Inferotemporal Cortex
The representation of contour shape is an essential component of object recognition, but the cortical mechanisms underlying it are incompletely understood, leaving it a fundamental open question in neuroscience. Such an understanding would be useful theoretically as well as in developing computer vision and Brain-Computer Interface applications. We ask two fundamental questions: “How is contour shape represented in cortex and how can neural models and computer vision algorithms more closely approximate this?” We begin by analyzing the statistics of contour curvature variation and develop a measure of salience based upon the arc length over which it remains within a constrained range. We create a population of V4-like cells – responsive to a particular local contour conformation located at a specific position on an object’s boundary – and demonstrate high recognition accuracies classifying handwritten digits in the MNIST database and objects in the MPEG-7 Shape Silhouette database. We compare the performance of the cells to the “shape-context” representation (Belongie et al., 2002) and achieve roughly comparable recognition accuracies using a small test set. We analyze the relative contributions of various feature sensitivities to recognition accuracy and robustness to noise. Local curvature appears to be the most informative for shape recognition. We create a population of IT-like cells, which integrate specific information about the 2-D boundary shapes of multiple contour fragments, and evaluate its performance on a set of real images as a function of the V4 cell inputs. We determine the sub-population of cells that are most effective at identifying a particular category. We classify based upon cell population response and obtain very good results. We use the Morris-Lecar neuronal model to more realistically illustrate the previously explored shape representation pathway in V4 – IT. We demonstrate recognition using spatiotemporal patterns within a winnerless competition network with FitzHugh-Nagumo model neurons. Finally, we use the Izhikevich neuronal model to produce an enhanced response in IT, correlated with recognition, via gamma synchronization in V4. Our results support the hypothesis that the response properties of V4 and IT cells, as well as our computer models of them, function as robust shape descriptors in the object recognition process
Convolutional neural networks for face recognition and finger-vein biometric identification
The Convolutional Neural Network (CNN), a variant of the Multilayer Perceptron (MLP), has shown promise in solving complex recognition problems, particularly in visual pattern recognition. However, the classical LeNet-5 CNN model, which most solutions are based on, is highly compute-intensive. This CNN also suffers from long training time, due to the large number of layers that ranges from six to eight. In this research, a CNN model with a reduced complexity is proposed for application in face recognition and finger-vein biometric identification. A simpler architecture is obtained by fusing convolutional and subsampling layers into one layer, in conjunction with a partial connection scheme applied between the first two layers in the network. As a result, the total number of layers is reduced to four. The number of feature maps at each layer is optimized according to the type of image database being processed. Consequently, the numbers of network parameters (including neurons, trainable parameters and connections) are significantly reduced, essentially increasing the generalization ability of the network. The Stochastic Diagonal Levenberg-Marquadt (SDLM) backpropagation algorithm is modified and applied in the training of the proposed network. With this learning algorithm, the convergence rate is accelerated such that the proposed CNN converges within 15 epochs. For face recognition, the proposed CNN achieves recognition rates of 100.00% and 99.50% for AT&T and AR Purdue face databases respectively. Recognition time on the AT&T database is less than 0.003 seconds. These results outperform previous existing works. In addition, when compared with the other CNN-based face recognizer, the proposed CNN model has the least number of network parameters, hence better generalization ability. A training scheme is also proposed to recognize new categories without full CNN training. In this research, a novel CNN solution for the finger-vein biometric identification problem is also proposed. To the best of knowledge, there is no previous work reported in literature that applied CNN for finger-vein recognition. The proposed method is efficient in that simple preprocessing algorithms are deployed. The CNN design is adapted on a finger-vein database, which is developed in-house and contains 81 subjects. A recognition accuracy of 99.38% is achieved, which is similar to the results of state-of-the-art work. In conclusion, the success of the research in solving face recognition and finger-vein biometric identification problems proves the feasibility of the proposed CNN model in any pattern recognition system
The Inhuman Overhang: On Differential Heterogenesis and Multi-Scalar Modeling
As a philosophical paradigm, differential heterogenesis offers us a novel descriptive vantage with which to inscribe Deleuze’s virtuality within the terrain of “differential becoming,” conjugating “pure saliences” so as to parse economies, microhistories, insurgencies, and epistemological evolutionary processes that can be conceived of independently from their representational form. Unlike Gestalt theory’s oppositional constructions, the advantage of this aperture is that it posits a dynamic context to both media and its analysis, rendering them functionally tractable and set in relation to other objects, rather than as sedentary identities. Surveying the genealogy of differential heterogenesis with particular interest in the legacy of Lautman’s dialectic, I make the case for a reading of the Deleuzean virtual that departs from an event-oriented approach, galvanizing Sarti and Citti’s dynamic a priori vis-à-vis Deleuze’s philosophy of difference. Specifically, I posit differential heterogenesis as frame with which to examine our contemporaneous epistemic shift as it relates to multi-scalar computational modeling while paying particular attention to neuro-inferential modes of inductive learning and homologous cognitive architecture. Carving a bricolage between Mark Wilson’s work on the “greediness of scales” and Deleuze’s “scales of reality”, this project threads between static ecologies and active externalism vis-à-vis endocentric frames of reference and syntactical scaffolding
Line Based Multi-Range Asymmetric Conditional Random Field For Terrestrial Laser Scanning Data Classification
Terrestrial Laser Scanning (TLS) is a ground-based, active imaging method that rapidly acquires accurate, highly dense three-dimensional point cloud of object surfaces by laser range finding. For fully utilizing its benefits, developing a robust method to classify many objects of interests from huge amounts of laser point clouds is urgently required. However, classifying massive TLS data faces many challenges, such as complex urban scene, partial data acquisition from occlusion. To make an automatic, accurate and robust TLS data classification, we present a line-based multi-range asymmetric Conditional Random Field algorithm.
The first contribution is to propose a line-base TLS data classification method. In this thesis, we are interested in seven classes: building, roof, pedestrian road (PR), tree, low man-made object (LMO), vehicle road (VR), and low vegetation (LV). The line-based classification is implemented in each scan profile, which follows the line profiling nature of laser scanning mechanism.Ten conventional local classifiers are tested, including popular generative and discriminative classifiers, and experimental results validate that the line-based method can achieve satisfying classification performance. However, local classifiers implement labeling task on individual line independently of its neighborhood, the inference of which often suffers from similar local appearance across different object classes. The second contribution is to propose a multi-range asymmetric Conditional Random Field (maCRF) model, which uses object context as post-classification to improve the performance of a local generative classifier. The maCRF incorporates appearance, local smoothness constraint, and global scene layout regularity together into a probabilistic graphical model. The local smoothness enforces that lines in a local area to have the same class label, while scene layout favours an asymmetric regularity of spatial arrangement between different object classes within long-range, which is considered both in vertical (above-bellow relation) and horizontal (front-behind) directions. The asymmetric regularity allows capturing directional spatial arrangement between pairwise objects (e.g. it allows ground is lower than building, not vice-versa). The third contribution is to extend the maCRF model by adding across scan profile context, which is called Across scan profile Multi-range Asymmetric Conditional Random Field (amaCRF) model. Due to the sweeping nature of laser scanning, the sequentially acquired TLS data has strong spatial dependency, and the across scan profile context can provide more contextual information. The final contribution is to propose a sequential classification strategy. Along the sweeping direction of laser scanning, amaCRF models were sequentially constructed. By dynamically updating posterior probability of common scan profiles, contextual information propagates through adjacent scan profiles
3D Object Recognition Based On Constrained 2D Views
The aim of the present work was to build a novel 3D object recognition system capable of classifying
man-made and natural objects based on single 2D views. The approach to this problem
has been one motivated by recent theories on biological vision and multiresolution analysis. The
project's objectives were the implementation of a system that is able to deal with simple 3D
scenes and constitutes an engineering solution to the problem of 3D object recognition, allowing
the proposed recognition system to operate in a practically acceptable time frame.
The developed system takes further the work on automatic classification of marine phytoplank-
(ons, carried out at the Centre for Intelligent Systems, University of Plymouth. The thesis discusses
the main theoretical issues that prompted the fundamental system design options. The
principles and the implementation of the coarse data channels used in the system are described.
A new multiresolution representation of 2D views is presented, which provides the classifier
module of the system with coarse-coded descriptions of the scale-space distribution of potentially
interesting features. A multiresolution analysis-based mechanism is proposed, which directs
the system's attention towards potentially salient features. Unsupervised similarity-based
feature grouping is introduced, which is used in coarse data channels to yield feature signatures
that are not spatially coherent and provide the classifier module with salient descriptions of object
views. A simple texture descriptor is described, which is based on properties of a special wavelet
transform.
The system has been tested on computer-generated and natural image data sets, in conditions
where the inter-object similarity was monitored and quantitatively assessed by human subjects,
or the analysed objects were very similar and their discrimination constituted a difficult task even
for human experts. The validity of the above described approaches has been proven. The studies
conducted with various statistical and artificial neural network-based classifiers have shown that
the system is able to perform well in all of the above mentioned situations. These investigations
also made possible to take further and generalise a number of important conclusions drawn during
previous work carried out in the field of 2D shape (plankton) recognition, regarding the behaviour
of multiple coarse data channels-based pattern recognition systems and various classifier
architectures.
The system possesses the ability of dealing with difficult field-collected images of objects and
the techniques employed by its component modules make possible its extension to the domain
of complex multiple-object 3D scene recognition. The system is expected to find immediate applicability
in the field of marine biota classification
- …