Search CORE

6,223 research outputs found

View subspaces for indexing and retrieval of 3D models

Author: Dutagaci Helin
Godil Afzal
Sankur Bulent
Yemez Yücel
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 13/05/2011
Field of study

View-based indexing schemes for 3D object retrieval are gaining popularity since they provide good retrieval results. These schemes are coherent with the theory that humans recognize objects based on their 2D appearances. The viewbased techniques also allow users to search with various queries such as binary images, range images and even 2D sketches. The previous view-based techniques use classical 2D shape descriptors such as Fourier invariants, Zernike moments, Scale Invariant Feature Transform-based local features and 2D Digital Fourier Transform coefficients. These methods describe each object independent of others. In this work, we explore data driven subspace models, such as Principal Component Analysis, Independent Component Analysis and Nonnegative Matrix Factorization to describe the shape information of the views. We treat the depth images obtained from various points of the view sphere as 2D intensity images and train a subspace to extract the inherent structure of the views within a database. We also show the benefit of categorizing shapes according to their eigenvalue spread. Both the shape categorization and data-driven feature set conjectures are tested on the PSB database and compared with the competitor view-based 3D shape retrieval algorithmsComment: Three-Dimensional Image Processing (3DIP) and Applications (Proceedings Volume) Proceedings of SPIE Volume: 7526 Editor(s): Atilla M. Baskurt ISBN: 9780819479198 Date: 2 February 201

arXiv.org e-Print Archive

Crossref

Koç University Digital Collections

View-tolerant face recognition and Hebbian learning imply mirror-symmetric neural tuning to head orientation

Author: Abbott
Adelson
Afraz
Anselmi
Bart
Bart
Berkes
Bruce
Cox
DiCarlo
Fabio Anselmi
Farzmahdi
Freiwald
Földiák
Hassoun
Hengen
Hung
Isik
Isik
Joel Z. Leibo
Keck
Ku
Leibo
Li
Meyers
Miyashita
Moeller
Oja
Oja
Poggio
Qianli Liao
Riesenhuber
Rolls
Sanger
Serre
Tan
Thorpe
Tomaso Poggio
Tsao
Tsao
Tsao
Turrigiano
Wallis
Winrich A. Freiwald
Wiskott
Publication venue
Publication date: 03/06/2016
Field of study

The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and relatively robust against identity-preserving transformations like depth-rotations. Current computational models of object recognition, including recent deep learning networks, generate these properties through a hierarchy of alternating selectivity-increasing filtering and tolerance-increasing pooling operations, similar to simple-complex cells operations. While simulations of these models recapitulate the ventral stream's progression from early view-specific to late view-tolerant representations, they fail to generate the most salient property of the intermediate representation for faces found in the brain: mirror-symmetric tuning of the neural population to head orientation. Here we prove that a class of hierarchical architectures and a broad set of biologically plausible learning rules can provide approximate invariance at the top level of the network. While most of the learning rules do not yield mirror-symmetry in the mid-level representations, we characterize a specific biologically-plausible Hebb-type learning rule that is guaranteed to generate mirror-symmetric tuning to faces tuning at intermediate levels of the architecture

arXiv.org e-Print Archive

DSpace@MIT

Crossref

PubMed Central

Representation Learning: A Review and New Perspectives

Author: Bengio Yoshua
Courville Aaron
Vincent Pascal
Publication venue
Publication date: 01/01/2014
Field of study

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

arXiv.org e-Print Archive

CiteSeerX

Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis

Author: Ahn Euijoon
Feng Dagan
Fulham Michael
Kim Jinman
Kumar Ashnil
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The availability of large-scale annotated image datasets and recent advances in supervised deep learning methods enable the end-to-end derivation of representative image features that can impact a variety of image analysis problems. Such supervised approaches, however, are difficult to implement in the medical domain where large volumes of labelled data are difficult to obtain due to the complexity of manual annotation and inter- and intra-observer variability in label assignment. We propose a new convolutional sparse kernel network (CSKN), which is a hierarchical unsupervised feature learning framework that addresses the challenge of learning representative visual features in medical image analysis domains where there is a lack of annotated training data. Our framework has three contributions: (i) We extend kernel learning to identify and represent invariant features across image sub-patches in an unsupervised manner. (ii) We initialise our kernel learning with a layer-wise pre-training scheme that leverages the sparsity inherent in medical images to extract initial discriminative features. (iii) We adapt a multi-scale spatial pyramid pooling (SPP) framework to capture subtle geometric differences between learned visual features. We evaluated our framework in medical image retrieval and classification on three public datasets. Our results show that our CSKN had better accuracy when compared to other conventional unsupervised methods and comparable accuracy to methods that used state-of-the-art supervised convolutional neural networks (CNNs). Our findings indicate that our unsupervised CSKN provides an opportunity to leverage unannotated big data in medical imaging repositories.Comment: Accepted by Medical Image Analysis (with a new title 'Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis'). The manuscript is available from following link (https://doi.org/10.1016/j.media.2019.06.005

arXiv.org e-Print Archive

ResearchOnline at James Cook University

Sparse Modeling for Image and Vision Processing

Author: Ecole Normale Supérieure
Francis Bach
Francis Bach
Hal Id Hal
Jean Ponce
Jean Ponce
Julien Mairal
Julien Mairal
Sparse Modeling Image
Vision Processing
Publication venue
Publication date: 01/01/2014
Field of study

In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA), and Sparse Coding Algorithms

Author: Anderson Ariana E.
Brody Arthur L.
Douglas Pamela K.
Wu Ying Nian
Xie Jianwen
Publication venue
Publication date: 01/07/2016
Field of study

Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet mathematical constraints such as sparse coding and positivity both provide alternate biologically-plausible frameworks for generating brain networks. Non-negative Matrix Factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms (

L1

Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks for different constraints are used as basis functions to encode the observed functional activity at a given time point. These encodings are decoded using machine learning to compare both the algorithms and their assumptions, using the time series weights to predict whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. For classifying cognitive activity, the sparse coding algorithm of

L1

Regularized Learning consistently outperformed 4 variations of ICA across different numbers of networks and noise levels (p

<

0.001). The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy. Within each algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p

<

0.001). The success of sparse coding algorithms may suggest that algorithms which enforce sparse coding, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation

Author: Klein Benjamin
Lev Guy
Sadeh Gil
Wolf Lior
Publication venue
Publication date: 24/01/2015
Field of study

In the traditional object recognition pipeline, descriptors are densely sampled over an image, pooled into a high dimensional non-linear representation and then passed to a classifier. In recent years, Fisher Vectors have proven empirically to be the leading representation for a large variety of applications. The Fisher Vector is typically taken as the gradients of the log-likelihood of descriptors, with respect to the parameters of a Gaussian Mixture Model (GMM). Motivated by the assumption that different distributions should be applied for different datasets, we present two other Mixture Models and derive their Expectation-Maximization and Fisher Vector expressions. The first is a Laplacian Mixture Model (LMM), which is based on the Laplacian distribution. The second Mixture Model presented is a Hybrid Gaussian-Laplacian Mixture Model (HGLMM) which is based on a weighted geometric mean of the Gaussian and Laplacian distribution. An interesting property of the Expectation-Maximization algorithm for the latter is that in the maximization step, each dimension in each component is chosen to be either a Gaussian or a Laplacian. Finally, by using the new Fisher Vectors derived from HGLMMs, we achieve state-of-the-art results for both the image annotation and the image search by a sentence tasks.Comment: new version includes text synthesis by an RNN and experiments with the COCO benchmar

arXiv.org e-Print Archive

CiteSeerX