72 research outputs found

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

    Sparse Representation-Based Framework for Preprocessing Brain MRI

    Get PDF
    This thesis addresses the use of sparse representations, specifically Dictionary Learning and Sparse Coding, for pre-processing brain MRI, so that the processed image retains the fine details of the original image, to improve the segmentation of brain structures, to assess whether there is any relationship between alterations in brain structures and the behavior of young offenders. Denoising an MRI while keeping fine details is a difficult task; however, the proposed method, based on sparse representations, NLM, and SVD can filter noise while prevents blurring, artifacts, and residual noise. Segmenting an MRI is a non-trivial task; because normally the limits between regions in these images may be neither clear nor well defined, due to the problems which affect MRI. However, this method, from both the label matrix of the segmented MRI and the original image, yields a new improved label matrix in which improves the limits among regions.DoctoradoDoctor en Ingeniería de Sistemas y Computació

    NON-LINEAR AND SPARSE REPRESENTATIONS FOR MULTI-MODAL RECOGNITION

    Get PDF
    In the first part of this dissertation, we address the problem of representing 2D and 3D shapes. In particular, we introduce a novel implicit shape representation based on Support Vector Machine (SVM) theory. Each shape is represented by an analytic decision function obtained by training an SVM, with a Radial Basis Function (RBF) kernel, so that the interior shape points are given higher values. This empowers support vector shape (SVS) with multifold advantages. First, the representation uses a sparse subset of feature points determined by the support vectors, which significantly improves the discriminative power against noise, fragmentation and other artifacts that often come with the data. Second, the use of the RBF kernel provides scale, rotation, and translation invariant features, and allows a shape to be represented accurately regardless of its complexity. Finally, the decision function can be used to select reliable feature points. These features are described using gradients computed from highly consistent decision functions instead of conventional edges. Our experiments on 2D and 3D shapes demonstrate promising results. The availability of inexpensive 3D sensors like Kinect necessitates the design of new representation for this type of data. We present a 3D feature descriptor that represents local topologies within a set of folded concentric rings by distances from local points to a projection plane. This feature, called as Concentric Ring Signature (CORS), possesses similar computational advantages to point signatures yet provides more accurate matches. CORS produces compact and discriminative descriptors, which makes it more robust to noise and occlusions. It is also well-known to computer vision researchers that there is no universal representation that is optimal for all types of data or tasks. Sparsity has proved to be a good criterion for working with natural images. This motivates us to develop efficient sparse and non-linear learning techniques for automatically extracting useful information from visual data. Specifically, we present dictionary learning methods for sparse and redundant representations in a high-dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and KSVD can be made non-linear. We analyse their kernel constructions and demonstrate their effectiveness through several experiments on classification problems. It is shown that non-linear dictionary learning approaches can provide significantly better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by different types of degradations. Visual descriptors are often high dimensional. This results in high computational complexity for sparse learning algorithms. Motivated by this observation, we introduce a novel framework, called sparse embedding (SE), for simultaneous dimensionality reduction and dictionary learning. We formulate an optimization problem for learning a transformation from the original signal domain to a lower-dimensional one in a way that preserves the sparse structure of data. We propose an efficient optimization algorithm and present its non-linear extension based on the kernel methods. One of the key features of our method is that it is computationally efficient as the learning is done in the lower-dimensional space and it discards the irrelevant part of the signal that derails the dictionary learning process. Various experiments show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive algorithms on signal recovery and object classification tasks. In many practical applications, we are often confronted with the situation where the data that we use to train our models are different from that presented during the testing. In the final part of this dissertation, we present a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N), which makes use of the old data to improve the performance of a system operating on a new domain. Our network jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as traversing deeper into the hierarchy. Experimental results show that our method consistently outperforms the current state-of-the-art by a significant margin. Moreover, we found that a multi-layer {DASH-N} has an edge over the single-layer DASH-N

    Towards an Architecture for Efficient Distributed Search of Multimodal Information

    Get PDF
    The creation of very large-scale multimedia search engines, with more than one billion images and videos, is a pressing need of digital societies where data is generated by multiple connected devices. Distributing search indexes in cloud environments is the inevitable solution to deal with the increasing scale of image and video collections. The distribution of such indexes in this setting raises multiple challenges such as the even partitioning of data space, load balancing across index nodes and the fusion of the results computed over multiple nodes. The main question behind this thesis is how to reduce and distribute the multimedia retrieval computational complexity? This thesis studies the extension of sparse hash inverted indexing to distributed settings. The main goal is to ensure that indexes are uniformly distributed across computing nodes while keeping similar documents on the same nodes. Load balancing is performed at both node and index level, to guarantee that the retrieval process is not delayed by nodes that have to inspect larger subsets of the index. Multimodal search requires the combination of the search results from individual modalities and document features. This thesis studies rank fusion techniques focused on reducing complexity by automatically selecting only the features that improve retrieval effectiveness. The achievements of this thesis span both distributed indexing and rank fusion research. Experiments across multiple datasets show that sparse hashes can be used to distribute documents and queries across index entries in a balanced and redundant manner across nodes. Rank fusion results show that is possible to reduce retrieval complexity and improve efficiency by searching only a subset of the feature indexes

    Algoritmos avanzados para detección del síndrome de apnea-hipopnea obstructiva del sueño

    Get PDF
    El Síndrome de Apnea-Hipopnea Obstructiva del Sueño (SAHOS) es un trastorno del sueño muy prevalente en la población general y con afectación de múltiples órganos. Se estima que esta patología afecta entre el 3% y 5% de la población adulta en todo el mundo y aumenta con la edad. Si bien el SAHOS es más frecuente en adultos, afecta también a niños con una prevalencia cercana al 3%. Los eventos respiratorios asociados al SAHOS durante el sueño ocurren como consecuencia de una alteración anatómico-funcional de la vía aérea superior que producen su estrechamiento parcial (hipopnea) o su bloqueo total (apnea). Para establecer el grado de severidad del SAHOS, se define el Índice de Apnea-Hipopnea. Éste índice representa el número de eventos de apnea-hipopnea por hora de sueño. El estudio de referencia para el correcto diagnóstico del SAHOS es la Polisomnografía nocturna. Dado que este tipo de estudio requiere no solo de la medición simultánea de una gran cantidad de señales fisiológicas, sino también de una infraestructura especial y de personal calificado, es de muy difícil acceso y muy costosa en términos de tiempo y dinero.En esta tesis se aborda el diseño, desarrollo, implementación y evaluación de tres métodos para el reconocimiento automático de los eventos de apnea-hipopnea a partir del análisis y procesamiento de las señales de saturación de oxígeno en sangre (SaO2). En particular, se presentan dos métodos de selección de características denominados MDAS y MDCS, los cuales se basan en representaciones ralas de señales de SaO2 para el reconocimiento de eventos de apnea-hipopnea. Además, en esta tesis se introduce una nueva medida de discriminabilidad binaria denotada por DCAF, la cual es capaz de detectar átomos discriminativos en un diccionario. Asimismo, esta medida permite cuantificar eficientemente sus correspondientes grados de discriminabilidad, lo cual resulta útil a los efectos de la clasificación. Los métodos MDAS y MDCS hacen uso de la media DCAF para detectar los átomos más discriminativos de un diccionario dado y, a partir de ellos, realizan la selección de características. En particular, el método MDCS utiliza la medida DCAF para seleccionar los átomos más discriminativos y, a partir de ellos, construir un sub-diccionario. En base a los experimentos desarrollados en esta tesis, el desempeño de la nueva medida DCAF fue comparada con el de varias otras medidas de información del estado del arte. Los resultados muestran que DCAF logró un muy buen desempeño. Por otro lado, el nuevo método MDCS fue comparado con otros tres métodos del estado de arte, superando significativamente el desempeño de todos ellos.Esta tesis introduce además una extensión del problema de clasificación binaria a uno multi-clase. En este contexto, se propone una generalización de la medida DCAF (la cual tiene en cuenta solo dos clases en los datos) a más de dos clases. En particular, la nueva medida de discriminabilidad combinada no solo tiene en cuenta la probabilidad condicional de activación de los átomos en un diccionario dada la clase y el valor de su correspondiente coeficiente de activación, sino que también incorpora el efecto que éste tiene sobre el error total de representación. Asimismo, se presenta un nuevo método iterativo llamado DAS-KSVD para el aprendizaje de diccionarios estructurados en el contexto de problemas de clasificación multi-clase, que utiliza ésta medida. El nuevo método permite detectar los átomos más discriminativos para cada una de las clases. Utilizando una base de datos de dígitos manuscritos ampliamente utilizada en la literatura, se realizó un análisis del desempeño del método DAS-KSVD obteniéndose tasas de reconocimiento superiores a las obtenidas por técnicas semejantes del estado del arte. También se utilizó el nuevo método DAS-KSVD en un problema de clasificación multi-clase asociado al SAHOS. Los resultados obtenidos muestran que éste método tiene un muy buen desempeño en la detección de la patología.Fil: Rolon, Roman Emanuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin

    Investigation of new learning methods for visual recognition

    Get PDF
    Visual recognition is one of the most difficult and prevailing problems in computer vision and pattern recognition due to the challenges in understanding the semantics and contents of digital images. Two major components of a visual recognition system are discriminatory feature representation and efficient and accurate pattern classification. This dissertation therefore focuses on developing new learning methods for visual recognition. Based on the conventional sparse representation, which shows its robustness for visual recognition problems, a series of new methods is proposed. Specifically, first, a new locally linear K nearest neighbor method, or LLK method, is presented. The LLK method derives a new representation, which is an approximation to the ideal representation, by optimizing an objective function based on a host of criteria for sparsity, locality, and reconstruction. The novel representation is further processed by two new classifiers, namely, an LLK based classifier (LLKc) and a locally linear nearest mean based classifier (LLNc), for visual recognition. The proposed classifiers are shown to connect to the Bayes decision rule for minimum error. Second, a new generative and discriminative sparse representation (GDSR) method is proposed by taking advantage of both a coarse modeling of the generative information and a modeling of the discriminative information. The proposed GDSR method integrates two new criteria, namely, a discriminative criterion and a generative criterion, into the conventional sparse representation criterion. A new generative and discriminative sparse representation based classification (GDSRc) method is then presented based on the derived new representation. Finally, a new Score space based multiple Metric Learning (SML) method is presented for a challenging visual recognition application, namely, recognizing kinship relations or kinship verification. The proposed SML method, which goes beyond the conventional Mahalanobis distance metric learning, not only learns the distance metric but also models the generative process of features by taking advantage of the score space. The SML method is optimized by solving a constrained, non-negative, and weighted variant of the sparse representation problem. To assess the feasibility of the proposed new learning methods, several visual recognition tasks, such as face recognition, scene recognition, object recognition, computational fine art analysis, action recognition, fine grained recognition, as well as kinship verification are applied. The experimental results show that the proposed new learning methods achieve better performance than the other popular methods
    corecore