421 research outputs found

    NON-LINEAR AND SPARSE REPRESENTATIONS FOR MULTI-MODAL RECOGNITION

    Get PDF
    In the first part of this dissertation, we address the problem of representing 2D and 3D shapes. In particular, we introduce a novel implicit shape representation based on Support Vector Machine (SVM) theory. Each shape is represented by an analytic decision function obtained by training an SVM, with a Radial Basis Function (RBF) kernel, so that the interior shape points are given higher values. This empowers support vector shape (SVS) with multifold advantages. First, the representation uses a sparse subset of feature points determined by the support vectors, which significantly improves the discriminative power against noise, fragmentation and other artifacts that often come with the data. Second, the use of the RBF kernel provides scale, rotation, and translation invariant features, and allows a shape to be represented accurately regardless of its complexity. Finally, the decision function can be used to select reliable feature points. These features are described using gradients computed from highly consistent decision functions instead of conventional edges. Our experiments on 2D and 3D shapes demonstrate promising results. The availability of inexpensive 3D sensors like Kinect necessitates the design of new representation for this type of data. We present a 3D feature descriptor that represents local topologies within a set of folded concentric rings by distances from local points to a projection plane. This feature, called as Concentric Ring Signature (CORS), possesses similar computational advantages to point signatures yet provides more accurate matches. CORS produces compact and discriminative descriptors, which makes it more robust to noise and occlusions. It is also well-known to computer vision researchers that there is no universal representation that is optimal for all types of data or tasks. Sparsity has proved to be a good criterion for working with natural images. This motivates us to develop efficient sparse and non-linear learning techniques for automatically extracting useful information from visual data. Specifically, we present dictionary learning methods for sparse and redundant representations in a high-dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and KSVD can be made non-linear. We analyse their kernel constructions and demonstrate their effectiveness through several experiments on classification problems. It is shown that non-linear dictionary learning approaches can provide significantly better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by different types of degradations. Visual descriptors are often high dimensional. This results in high computational complexity for sparse learning algorithms. Motivated by this observation, we introduce a novel framework, called sparse embedding (SE), for simultaneous dimensionality reduction and dictionary learning. We formulate an optimization problem for learning a transformation from the original signal domain to a lower-dimensional one in a way that preserves the sparse structure of data. We propose an efficient optimization algorithm and present its non-linear extension based on the kernel methods. One of the key features of our method is that it is computationally efficient as the learning is done in the lower-dimensional space and it discards the irrelevant part of the signal that derails the dictionary learning process. Various experiments show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive algorithms on signal recovery and object classification tasks. In many practical applications, we are often confronted with the situation where the data that we use to train our models are different from that presented during the testing. In the final part of this dissertation, we present a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N), which makes use of the old data to improve the performance of a system operating on a new domain. Our network jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as traversing deeper into the hierarchy. Experimental results show that our method consistently outperforms the current state-of-the-art by a significant margin. Moreover, we found that a multi-layer {DASH-N} has an edge over the single-layer DASH-N

    Remembrance of Odors Past Human Olfactory Cortex in Cross-Modal Recognition Memory

    Get PDF
    AbstractEpisodic memory is often imbued with multisensory richness, such that the recall of an event can be endowed with the sights, sounds, and smells of its prior occurrence. While hippocampus and related medial temporal structures are implicated in episodic memory retrieval, the participation of sensory-specific cortex in representing the qualities of an episode is less well established. We combined functional magnetic resonance imaging (fMRI) with a cross-modal paradigm, where objects were presented with odors during memory encoding. We then examined the effect of odor context on neural responses at retrieval when these same objects were presented alone. Primary olfactory (piriform) cortex, as well as anterior hippocampus, was activated during the successful retrieval of old (compared to new) objects. Our findings indicate that sensory features of the original engram are preserved in unimodal olfactory cortex. We suggest that reactivation of memory traces distributed across modality-specific brain areas underpins the sensory qualities of episodic memories

    Cross-Modal Object Recognition Is Viewpoint-Independent

    Get PDF
    BACKGROUND: Previous research suggests that visual and haptic object recognition are viewpoint-dependent both within- and cross-modally. However, this conclusion may not be generally valid as it was reached using objects oriented along their extended y-axis, resulting in differential surface processing in vision and touch. In the present study, we removed this differential by presenting objects along the z-axis, thus making all object surfaces more equally available to vision and touch. METHODOLOGY/PRINCIPAL FINDINGS: Participants studied previously unfamiliar objects, in groups of four, using either vision or touch. Subsequently, they performed a four-alternative forced-choice object identification task with the studied objects presented in both unrotated and rotated (180 degrees about the x-, y-, and z-axes) orientations. Rotation impaired within-modal recognition accuracy in both vision and touch, but not cross-modal recognition accuracy. Within-modally, visual recognition accuracy was reduced by rotation about the x- and y-axes more than the z-axis, whilst haptic recognition was equally affected by rotation about all three axes. Cross-modal (but not within-modal) accuracy correlated with spatial (but not object) imagery scores. CONCLUSIONS/SIGNIFICANCE: The viewpoint-independence of cross-modal object identification points to its mediation by a high-level abstract representation. The correlation between spatial imagery scores and cross-modal performance suggest that construction of this high-level representation is linked to the ability to perform spatial transformations. Within-modal viewpoint-dependence appears to have a different basis in vision than in touch, possibly due to surface occlusion being important in vision but not touch

    Multisensory Motion Perception in 3\u20134 Month-Old Infants

    Get PDF
    Human infants begin very early in life to take advantage of multisensory information by extracting the invariant amodal information that is conveyed redundantly by multiple senses. Here we addressed the question as to whether infants can bind multisensory moving stimuli, and whether this occurs even if the motion produced by the stimuli is only illusory. Three- to 4-month-old infants were presented with two bimodal pairings: visuo-tactile and audio-visual. Visuo-tactile pairings consisted of apparently vertically moving bars (the Barber Pole illusion) moving in either the same or opposite direction with a concurrent tactile stimulus consisting of strokes given on the infant\u2019s back. Audio-visual pairings consisted of the Barber Pole illusion in its visual and auditory version, the latter giving the impression of a continuous rising or ascending pitch. We found that infants were able to discriminate congruently (same direction) vs. incongruently moving (opposite direction) pairs irrespective of modality (Experiment 1). Importantly, we also found that congruently moving visuo-tactile and audio-visual stimuli were preferred over incongruently moving bimodal stimuli (Experiment 2). Our findings suggest that very young infants are able to extract motion as amodal component and use it to match stimuli that only apparently move in the same direction

    Towards Simulating Humans in Augmented Multi-party Interaction

    Get PDF
    Human-computer interaction requires modeling of the user. A user profile typically contains preferences, interests, characteristics, and interaction behavior. However, in its multimodal interaction with a smart environment the user displays characteristics that show how the user, not necessarily consciously, verbally and nonverbally provides the smart environment with useful input and feedback. Especially in ambient intelligence environments we encounter situations where the environment supports interaction between the environment, smart objects (e.g., mobile robots, smart furniture) and human participants in the environment. Therefore it is useful for the profile to contain a physical representation of the user obtained by multi-modal capturing techniques. We discuss the modeling and simulation of interacting participants in the European AMI research project

    Face Recognition using 3D Facial Shape and Color Map Information: Comparison and Combination

    Full text link
    In this paper, we investigate the use of 3D surface geometry for face recognition and compare it to one based on color map information. The 3D surface and color map data are from the CAESAR anthropometric database. We find that the recognition performance is not very different between 3D surface and color map information using a principal component analysis algorithm. We also discuss the different techniques for the combination of the 3D surface and color map information for multi-modal recognition by using different fusion approaches and show that there is significant improvement in results. The effectiveness of various techniques is compared and evaluated on a dataset with 200 subjects in two different positions.Comment: Proceedings of SPIE Vol. 5404 Biometric Technology for Human Identification, Anil K. Jain; Nalini K. Ratha, Editors, pp.351-361, ISBN: 9780819453273 Date: 25 August 200
    • …
    corecore