10 research outputs found

    Eccentricity dependent deep neural networks: Modeling invariance in human vision

    Get PDF
    Humans can recognize objects in a way that is invariant to scale, translation, and clutter. We use invariance theory as a conceptual basis, to computationally model this phenomenon. This theory discusses the role of eccentricity in human visual processing, and is a generalization of feedforward convolutional neural networks (CNNs). Our model explains some key psychophysical observations relating to invariant perception, while maintaining important similarities with biological neural architectures. To our knowledge, this work is the first to unify explanations of all three types of invariance, all while leveraging the power and neurological grounding of CNNs

    Conditional Random Fields for Multi-Camera Object Detection

    Get PDF
    We formulate a model for multi-class object detection in a multi-camera environment. From our knowledge, this is the first time that this problem is addressed taken into account different object classes simultaneously. Given several images of the scene taken from different angles, our system estimates the ground plane location of the objects from the output of several object detectors applied at each viewpoint. We cast the problem as an energy minimization modeled with a Conditional Random Field (CRF). Instead of predicting the presence of an object at each image location independently, we simultaneously predict the labeling of the entire scene. Our CRF is able to take into account occlusions between objects and contextual constraints among them. We propose an effective iterative strategy that renders tractable the underlying optimization problem, and learn the parameters of the model with the max-margin paradigm. We evaluate the performance of our model on several challenging multi-camera pedestrian detection datasets namely PETS 2009 and EPFL terrace sequence. We also introduce a new dataset in which multiple classes of objects appear simultaneously in the scene. It is here where we show that our method effectively handles occlusions in the multi-class case

    Hierarchical conditional random fields for parts based models matching

    No full text
    A parts based model is a parametrization of an object class using a collection of landmarks following the object structure. The matching of parts based models is one of the problems where pairwise Conditional Random Fields have been successfully applied. The main reason of their effectiveness is tractable inference and learning due to the simplicity of involved graphs, usually trees. However, these models do not consider possible patterns of statistics among sets of landmarks, and thus they sufffer from using too myopic information. To overcome this limitation, we propoese a novel structure based on a hierarchical Conditional Random Fields, which we explain in the first part of this memory. We build a hierarchy of combinations of landmarks, where matching is performed taking into account the whole hierarchy. To preserve tractable inference we effectively sample the label set. We test our method on facial feature selection and human pose estimation on two challenging datasets: Buffy and MultiPIE. In the second part of this memory, we present a novel approach to multiple kernel combination that relies on stacked classification. This method can be used to evaluate the landmarks of the parts-based model approach. Our method is based on combining responses of a set of independent classifiers for each individual kernel. Unlike earlier approaches that linearly combine kernel responses, our approach uses them as inputs to another set of classifiers. We will show that we outperform state-of-the-art methods on most of the standard benchmark datasets

    Active semantic segmentation on a time budget

    No full text
    Efficient algorithms for object recognition are crucial for the newly robotics and computer vision applications that demand real-time and on-line methods. Some examples are autonomous systems, navigating robots, autonomous driving. In this work, we focus on efficient semantic segmentation, which is the problem of labeling each pixel of an image with a semantic class. Our aim is to speed-up all of the parts of the semantic segmentation pipeline. We also aim at delivering a labeling solution on a time budget, that can be decided on-the-fly. For this purpose, we analyze all the components of the semantic segmentation pipeline, and identify the computational bottleneck of each of them. The different components of the pipeline are over-segmenting the image with local regions, extracting features and classify the local regions, and the final inference of the image labeling with semantic classes. We focus on each of these steps. First, we introduce a new superpixel algorithm to over-segment the image. Our superpixel method runs in real-time and can deliver a solution at any time budget. Then, for feature extraction, we focus on the framework that computes descriptors and encodes them, followed by a pooling step. We see that the encoding step is the bottleneck, for computational efficiency and performance. We present a novel assignment-based encoding formulation, that allows for the design of a new, very efficient, encoding. Finally, the image labeling output is obtained modeling the dependencies with a Conditional Random Field (CRF). In semantic image segmentation, the computational cost of instantiating the potentials is much higher than MAP inference. We introduce Active MAP inference to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest as unknown, and to estimate the MAP labeling from such incomplete energy function. We perform experiments on all proposed methods for the different parts of the semantic segmentation pipeline. We show that our superpixel extraction achieves higher accuracy than state-of-the-art on standard superpixel benchmark, while it runs in real-time. We test our feature encoding on standard image classification and segmentation benchmarks, and we show that our method achieves competitive results with the state-of-the-art, and requires less time and memory. Finally, results for semantic segmentation benchmark show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.Effiziente Algorithmen zur Objekterkennung sind essentiell für neue Anwendungen im Bereich der Robotik und Bildererkennung, die Echtzeit- und Online-Methoden verlangen. Einige Beispiele sind autonome Systeme, mobile Robotik, und autonomes Autofahren. In dieser Arbeit konzentrieren wir uns auf effiziente semantische Segmentierung, das ein Problem der Kennzeichnung jedes Pixel eines Bildes mit einer semantischen Klasse ist. Unser Ziel ist die Beschleunigung aller Komponenten des Ablaufs der semantischen Segmentierung. Zudem wollen wir eine zeit-effiziente Lösung zur Pixel-Kennzeichnung bieten, die spontan abgerufen werden kann. Hierzu analysieren wir alle Komponenten des Segmentierungs-Ablaufes, und identifizieren die rechnerischen Engpässe jedes einzelnen. Die verschiedenen Komponenten dieses Ablaufs sind Segmentierung des Bildes in Superpixel, Extrahierung von Merkmalen und Klassifizierung der Superpixel, und letztendlich die Ableitung von den Bildmarkierungen zu semantischen Klassen. Wir gehen auf jeden dieser Schritte genauer ein. Zunächst führen wir einen neuen Super-Pixel Algorithmus zur Segmentierung des Bildes ein. Unser Superpixel Verfahren lauft in Echtzeit ab und ist in der Lage zu jedem Zeitpunkt eine Lösung zu liefern. Beim Extrahieren der Merkmale konzentrieren wir uns auf das Programmiergerüst, das die Deskriptoren berechnet und codiert, gefolgt von einem Bündelungs-Schritt. Wir erkennen, dass der Schritt der Kodierung den Engpass für Recheneffizienz und Leistung darstellt. Wir präsentieren hier eine neue Zuordnungs-basierte Formulierung, das die Konstruktion einer neuartigen sehr effizienten Codierung ermöglicht. Die Ausgabe der Bilder-Kennzeichnung schliesslich wird durch Modellierung der Abhängigkeiten mit Conditional Random Field (CRF) erhalten. In semantischer Bildsegmentierung ist der Rechenaufwand der Instanziierung der Potentiale viel höher als MAP Inferenz. Wir führen eine Active MAP Inferenz ein, die zu jedem Zeitpunkt eine Teilmenge der Potentiale in der Energiefunktion instanziert, wahrend der Rest als unbekannt angenommen wird, und zugleich die MAP Kennzeichnung von solch unvollständigen Energiefunktionen abschätzt. Wir testen jede einzelne der vorgeschlagenen Methoden zu den verschiedenen Komponenten des semantischen Segmentierungsvorgangs und zeigen, dass unsere Superpixel Extrahierung höhere Genauigkeit erzielt als Stand der Technik Superpixel Methoden, abgesehen davon, dass sie noch dazu in Echtzeit ausgeführt wird. Wir testen unsere Codierung an Standard Bildklassifikations- und Segmentierungsmethoden, und zeigen, dass unsere Methode wettbewerbsfähige Ergebnisse erzielt, während sie Zeit- und Speicherplatz-effizienter ist. Zuletzt können wir zeigen, dass Active MAP Inferenz ähnliche Genauigkeiten erzielt wie andere Stand der Technik Segmentierungsmethoden, aber mit wesentlichem Gewinn an Effizienz

    Unravelling representations in scene-selective brain regions using scene parsing deep neural networks

    No full text
    Visual scene perception is mediated by a set of cortical regions that respond preferentially to images of scenes, including the occipital place area (OPA) and parahippocampal place area (PPA). However, the differential contribution of OPA and PPA to scene perception remains an open research question. In this study, we take a deep neural network (DNN)-based computational approach to investigate the differences in OPA and PPA function. In a first step we search for a computational model that predicts fMRI responses to scenes in OPA and PPA well. We find that DNNs trained to predict scene components (e.g., wall, ceiling, floor) explain higher variance uniquely in OPA and PPA than a DNN trained to predict scene category (e.g., bathroom, kitchen, office). This result is robust across several DNN architectures. On this basis, we then determine whether particular scene components predicted by DNNs differentially account for unique variance in OPA and PPA. We find that variance in OPA responses uniquely explained by the navigation-related floor component is higher compared to the variance explained by the wall and ceiling components. In contrast, PPA responses are better explained by the combination of wall and floor, that is scene components that together contain the structure and texture of the scene. This differential sensitivity to scene components suggests differential functions of OPA and PPA in scene processing. Moreover, our results further highlight the potential of the proposed computational approach as a general tool in the investigation of the neural basis of human scene perception

    A large and rich EEG dataset for modeling human visual object recognition

    No full text
    The human brain achieves visual object recognition through multiple stages of linear and nonlinear transformations operating at a millisecond scale. To predict and explain these rapid transformations, computational neuroscientists employ machine learning modeling techniques. However, state-of-the-art models require massive amounts of data to properly train, and to the present day there is a lack of vast brain datasets which extensively sample the temporal dynamics of visual object recognition. Here we collected a large and rich dataset of high temporal resolution EEG responses to images of objects on a natural background. This dataset includes 10 participants, each with 82,160 trials spanning 16,740 image conditions. Through computational modeling we established the quality of this dataset in five ways. First, we trained linearizing encoding models that successfully synthesized the EEG responses to arbitrary images. Second, we correctly identified the recorded EEG data image conditions in a zero-shot fashion, using EEG synthesized responses to hundreds of thousands of candidate image conditions. Third, we show that both the high number of conditions as well as the trial repetitions of the EEG dataset contribute to the trained models’ prediction accuracy. Fourth, we built encoding models whose predictions well generalize to novel participants. Fifth, we demonstrate full end-to-end training of randomly initialized DNNs that output EEG responses for arbitrary input images. We release this dataset as a tool to foster research in visual neuroscience and computer vision

    Single Image Video Prediction with Auto-Regressive GANs

    No full text
    In this paper, we introduce an approach for future frames prediction based on a single input image. Our method is able to generate an entire video sequence based on the information contained in the input frame. We adopt an autoregressive approach in our generation process, i.e., the output from each time step is fed as the input to the next step. Unlike other video prediction methods that use “one shot” generation, our method is able to preserve much more details from the input image, while also capturing the critical pixel-level changes between the frames. We overcome the problem of generation quality degradation by introducing a “complementary mask” module in our architecture, and we show that this allows the model to only focus on the generation of the pixels that need to be changed, and to reuse those that should remain static from its previous frame. We empirically validate our methods against various video prediction models on the UT Dallas Dataset, and show that our approach is able to generate high quality realistic video sequences from one static input image. In addition, we also validate the robustness of our method by testing a pre-trained model on the unseen ADFES facial expression dataset. We also provide qualitative results of our model tested on a human action dataset: The Weizmann Action database

    BOLD Moments: modeling short visual events through a video fMRI dataset and metadata

    No full text
    Grasping the meaning of everyday visual events is a fundamental feat of human intelligence that hinges on diverse neural processes ranging from vision to higher-level cognition. Deciphering the neural basis of visual event understanding requires rich, extensive, and appropriately designed experimental data. However, this type of data is hitherto missing. To fill this gap, we introduce the BOLD Moments Dataset (BMD), a large dataset of whole-brain fMRI responses to over 1,000 short (3s) naturalistic video clips and accompanying metadata. We show visual events interface with an array of processes, extending even to memory, and we reveal a match in hierarchical processing between brains and video-computable deep neural networks. Furthermore, we showcase that BMD successfully captures temporal dynamics of visual events at second resolution. BMD thus establishes a critical groundwork for investigations of the neural basis of visual event understanding

    Optimization of adsorptive removal of α-toluic acid by CaO2 nanoparticles using response surface methodology

    Get PDF
    The present work addresses the optimization of process parameters for adsorptive removal of α-toluic acid by calcium peroxide (CaO2) nanoparticles using response surface methodology (RSM). CaO2 nanoparticles were synthesized by chemical precipitation method and confirmed by Transmission electron microscopy (TEM) and high-resolution TEM (HRTEM) analysis which shows the CaO2 nanoparticles size range of 5–15 nm. A series of batch adsorption experiments were performed using CaO2 nanoparticles to remove α-toluic acid from the aqueous solution. Further, an experimental based central composite design (CCD) was developed to study the interactive effect of CaO2 adsorbent dosage, initial concentration of α-toluic acid, and contact time on α-toluic acid removal efficiency (response) and optimization of the process. Analysis of variance (ANOVA) was performed to determine the significance of the individual and the interactive effects of variables on the response. The model predicted response showed a good agreement with the experimental response, and the coefficient of determination, (R2) was 0.92. Among the variables, the interactive effect of adsorbent dosage and the initial α-toluic acid concentration was found to have more influence on the response than the contact time. Numerical optimization of process by RSM showed the optimal adsorbent dosage, initial concentration of α-toluic acid, and contact time as 0.03 g, 7.06 g/L, and 34 min respectively. The predicted removal efficiency was 99.50%. The experiments performed under these conditions showed α-toluic acid removal efficiency up to 98.05%, which confirmed the adequacy of the model prediction

    Suicidal ideation in a European Huntington's disease population.

    No full text
    corecore