1,829 research outputs found

    The computational magic of the ventral stream

    Get PDF
    I argue that the sample complexity of (biological, feedforward) object recognition is mostly due to geometric image transformations and conjecture that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations.

In the first part of the paper I describe a class of simple and biologically plausible memory-based modules that learn transformations from unsupervised visual experience. The main theorems show that these modules provide (for every object) a signature which is invariant to local affine transformations and approximately invariant for other transformations. I also prove that,
in a broad class of hierarchical architectures, signatures remain invariant from layer to layer. The identification of these memory-based modules with complex (and simple) cells in visual areas leads to a theory of invariant recognition for the ventral stream.

In the second part, I outline a theory about hierarchical architectures that can learn invariance to transformations. I show that the memory complexity of learning affine transformations is drastically reduced in a hierarchical architecture that factorizes transformations in terms of the subgroup of translations and the subgroups of rotations and scalings. I then show how translations are automatically selected as the only learnable transformations during development by enforcing small apertures – eg small receptive fields – in the first layer.

In a third part I show that the transformations represented in each area can be optimized in terms of storage and robustness, as a consequence determining the tuning of the neurons in the area, rather independently (under normal conditions) of the statistics of natural images. I describe a model of learning that can be proved to have this property, linking in an elegant way the spectral properties of the signatures with the tuning of receptive fields in different areas. A surprising implication of these theoretical results is that the computational goals and some of the tuning properties of cells in the ventral stream may follow from symmetry properties (in the sense of physics) of the visual world through a process of unsupervised correlational learning, based on Hebbian synapses. In particular, simple and complex cells do not directly care about oriented bars: their tuning is a side effect of their role in translation invariance. Across the whole ventral stream the preferred features reported for neurons in different areas are only a symptom of the invariances computed and represented.

The results of each of the three parts stand on their own independently of each other. Together this theory-in-fieri makes several broad predictions, some of which are:

-invariance to small transformations in early areas (eg translations in V1) may underly stability of visual perception (suggested by Stu Geman);

-each cell’s tuning properties are shaped by visual experience of image transformations during developmental and adult plasticity;

-simple cells are likely to be the same population as complex cells, arising from different convergence of the Hebbian learning rule. The input to complex “complex” cells are dendritic branches with simple cell properties;

-class-specific transformations are learned and represented at the top of the ventral stream hierarchy; thus class-specific modules such as faces, places and possibly body areas should exist in IT;

-the type of transformations that are learned from visual experience depend on the size of the receptive fields and thus on the area (layer in the models) – assuming that the size increases with layers;

-the mix of transformations learned in each area influences the tuning properties of the cells oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells);

-features must be discriminative and invariant: invariance to transformations is the primary determinant of the tuning of cortical neurons rather than statistics of natural images.

The theory is broadly consistent with the current version of HMAX. It explains it and extend it in terms of unsupervised learning, a broader class of transformation invariance and higher level modules. The goal of this paper is to sketch a comprehensive theory with little regard for mathematical niceties. If the theory turns out to be useful there will be scope for deep mathematics, ranging from group representation tools to wavelet theory to dynamics of learning

    The Computational Magic of the Ventral Stream: Towards a Theory

    Get PDF
    I conjecture that the sample complexity of object recognition is mostly due to geometric image transformations and that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations. The most surprising implication of the theory emerging from these assumptions is that the computational goals and detailed properties of cells in the ventral stream follow from symmetry properties of the visual world through a process of unsupervised correlational learning.

From the assumption of a hierarchy of areas with receptive fields of increasing size the theory predicts that the size of the receptive fields determines which transformations are learned during development and then factored out during normal processing; that the transformation represented in each area determines the tuning of the neurons in the aerea, independently of the statistics of natural images; and that class-specific transformations are learned and represented at the top of the ventral stream hierarchy.

Some of the main predictions of this theory-in-fieri are:
1. the type of transformation that are learned from visual experience depend on the size (measured in terms of wavelength) and thus on the area (layer in the models) – assuming that the aperture size increases with layers;
2. the mix of transformations learned determine the properties of the receptive fields – oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells);
3. invariance to small translations in V1 may underly stability of visual perception
4. class-specific modules – such as faces, places and possibly body areas – should exist in IT to process images of object classes

    Segmentation of Optic Disc in Fundus Images using Convolutional Neural Networks for Detection of Glaucoma

    Full text link
    The condition of the vascular network of human eye is an important diagnostic factor in ophthalmology. Its segmentation in fundus imaging is a difficult task due to various anatomical structures like blood vessel, optic cup, optic disc, macula and fovea. Blood vessel segmentation can assist in the detection of pathological changes which are possible indicators for arteriosclerosis, retinopathy, microaneurysms and macular degeneration. The segmentation of optic disc and optic cup from retinal images is used to calculate an important indicator, cup-to disc ratio( CDR) accurately to help the professionals in the detection of Glaucoma in fundus images.In this proposed work, an automated segmentation of anatomical structures in fundus images such as blood vessel and optic disc is done using Convolutional Neural Networks (CNN) . A Convolutional Neural Network is a composite of multiple elementary processing units, each featuring several weighted inputs and one output, performing convolution of input signals with weights and transforming the outcome with some form of nonlinearity. The units are arranged in rectangular layers (grids), and their locations in a layer correspond to pixels in an input image. The spatial arrangement of units is the primary characteristics that makes CNNs suitable for processing visual information; the other features are local connectivity, parameter sharing and pooling of hidden units. The advantage of CNN is that it can be trained repeatedly so more features can be found. An average accuracy of 95.64% is determined in the classification of blood vessel or not. Optic cup is also segmented from the optic disc by Fuzzy C Means Clustering (FCM). This proposed algorithm is tested on a sample of hospital images and CDR value is determined. The obtained values of CDR is compared with the given values of the sample images and hence the performance of proposed system in which Convolutional Neural Networks for segmentation is employed, is excellent in automated detection of healthy and Glaucoma images

    Cross-Spectral Full and Partial Face Recognition: Preprocessing, Feature Extraction and Matching

    Get PDF
    Cross-spectral face recognition remains a challenge in the area of biometrics. The problem arises from some real-world application scenarios such as surveillance at night time or in harsh environments, where traditional face recognition techniques are not suitable or limited due to usage of imagery obtained in the visible light spectrum. This motivates the study conducted in the dissertation which focuses on matching infrared facial images against visible light images. The study outspreads from aspects of face recognition such as preprocessing to feature extraction and to matching.;We address the problem of cross-spectral face recognition by proposing several new operators and algorithms based on advanced concepts such as composite operators, multi-level data fusion, image quality parity, and levels of measurement. To be specific, we experiment and fuse several popular individual operators to construct a higher-performed compound operator named GWLH which exhibits complementary advantages of involved individual operators. We also combine a Gaussian function with LBP, generalized LBP, WLD and/or HOG and modify them into multi-lobe operators with smoothed neighborhood to have a new type of operators named Composite Multi-Lobe Descriptors. We further design a novel operator termed Gabor Multi-Levels of Measurement based on the theory of levels of measurements, which benefits from taking into consideration the complementary edge and feature information at different levels of measurements.;The issue of image quality disparity is also studied in the dissertation due to its common occurrence in cross-spectral face recognition tasks. By bringing the quality of heterogeneous imagery closer to each other, we successfully achieve an improvement in the recognition performance. We further study the problem of cross-spectral recognition using partial face since it is also a common problem in practical usage. We begin with matching heterogeneous periocular regions and generalize the topic by considering all three facial regions defined in both a characteristic way and a mixture way.;In the experiments we employ datasets which include all the sub-bands within the infrared spectrum: near-infrared, short-wave infrared, mid-wave infrared, and long-wave infrared. Different standoff distances varying from short to intermediate and long are considered too. Our methods are compared with other popular or state-of-the-art methods and are proven to be advantageous

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Blood clearance and tissue distribution of PEGylated and non-PEGylated gold nanorods after intravenous administration in rats\ud

    Get PDF
    Aims: To develop and determine the safety of gold nanorods, whose aspect ratios can be tuned to obtain plasmon peaks between 650 and 850 nm, as contrast enhancing agents for diagnostic and therapeutic applications. Materials & methods: In this study we compared the blood clearance and tissue distribution of cetyl trimethyl ammonium bromide (CTAB)-capped and polyethylene glycol (PEG)-coated gold nanorods after intravenous injection in the tail vein of rats. The gold content in blood and various organs was measured quantitatively with inductively coupled plasma mass spectrometry. Results & discussion: The CTAB-capped gold nanorods were almost immediately (<15 min) cleared from the blood circulation whereas the PEGylation of gold nanorods resulted in a prolonged blood circulation with a half-life time of 19 h and more wide spread tissue distribution. While for the CTAB-capped gold nanorods the tissue distribution was limited to liver, spleen and lung, the PEGylated gold nanorods also distributed to kidney, heart, thymus, brain and testes. PEGylation of the gold nanorods resulted in the spleen being the organ with the highest exposure, whereas for the non-PEGylated CTAB-capped gold nanorods the liver was the organ with the highest exposure, per gram of organ. Conclusion: The PEGylation of gold nanorods resulted in a prolongation of the blood clearance and the highest organ exposure in the spleen. In view of the time frame (up to 48 h) of the observed presence in blood circulation, PEGylated gold nanorods can be considered to be promising candidates for therapeutic and diagnostic imaging purpose

    Enhancing methanol electro-oxidation by double oxide incorporation with platinum on carbon nanotubes

    Get PDF
    Name taken from "public.pdf"Electro-oxidation of methanol was investigated on platinum catalyst supported on a double oxide nanocomposite support of tin oxide and carbon-doped titanium dioxide. The Pt-SnO2-C-/CNTs electro catalyst demonstrated about 20% and 44% percent higher forward peak current density than Pt-SnO2/CNTs and E-TEK, respectively. The kinetics of formation of the oxygenated species (-OH) groups was dramatically enhanced upon doped titanium dioxide incorporation. This leads to a much lower onset potential of adsorbed CO oxidation. Doping by carbon in the titanium dioxide makes the oxide support electrically conductive. Shifting in Pt oxidation state was noted, which refers to the influence of the metal oxide incorporation with platinum on the CNTs. The purpose of the coating is to avoid corrosion at high potentials and enhance the durability of the catalyst in an acidic medium. Therefore, the oxide coating was the key factor for the enhanced stability of the Pt catalyst and for the increase of current density during methanol electro-oxidation
    corecore