13 research outputs found
3D Object classification using a volumetric deep neural network: An efficient Octree Guided Auxiliary Learning approach
We consider the recent challenges of 3D shape analysis based on a volumetric CNN that requires a huge computational power. This high-cost approach forces to reduce the volume resolutions when applying 3D CNN on volumetric data. In this context, we propose a multiorientation volumetric deep neural network (MV-DNN) for 3D object classification with octree generating low-cost volumetric features. In comparison to conventional octree representations, we propose to limit the octree partition to a certain depth to reserve all leaf octants with sparsity features. This allows for improved learning of complex 3D features and increased prediction of object labels at both low and high resolutions. Our auxiliary learning approach predicts object classes based on the subvolume parts of a 3D object that improve the classification accuracy compared to other existing 3D volumetric CNN methods. In addition, the influence of views and depths of the 3D model on the classification performance is investigated through extensive experiments applied to the ModelNet40 database. Our deep learning framework runs significantly faster and consumes less memory than full voxel representations and demonstrate the effectiveness of our octree-based auxiliary learning approach for exploring high resolution 3D models. Experimental results reveal the superiority of our MV-DNN that achieves better classification accuracy compared to state-of-art methods on two public databases
3D Object Classification Using a Volumetric Deep Neural Network: An Efficient Octree Guided Auxiliary Learning Approach
© 2013 IEEE. We consider the recent challenges of 3D shape analysis based on a volumetric CNN that requires a huge computational power. This high-cost approach forces to reduce the volume resolutions when applying 3D CNN on volumetric data. In this context, we propose a multiorientation volumetric deep neural network (MV-DNN) for 3D object classification with octree generating low-cost volumetric features. In comparison to conventional octree representations, we propose to limit the octree partition to a certain depth to reserve all leaf octants with sparsity features. This allows for improved learning of complex 3D features and increased prediction of object labels at both low and high resolutions. Our auxiliary learning approach predicts object classes based on the subvolume parts of a 3D object that improve the classification accuracy compared to other existing 3D volumetric CNN methods. In addition, the influence of views and depths of the 3D model on the classification performance is investigated through extensive experiments applied to the ModelNet40 database. Our deep learning framework runs significantly faster and consumes less memory than full voxel representations and demonstrate the effectiveness of our octree-based auxiliary learning approach for exploring high resolution 3D models. Experimental results reveal the superiority of our MV-DNN that achieves better classification accuracy compared to state-of-art methods on two public databases
Recommended from our members
Multiscale wavelet representations for mammographic feature analysis
This paper introduces a novel approach for accomplishing mammographic feature analysis through multiresolution representations. We show that efficient (nonredundant) representations may be identified from digital mammography and used to enhance specific mammographic features within a continuum of scale space. The multiresolution decomposition of wavelet transforms provides a natural hierarchy in which to embed an interactive paradigm for accomplishing scale space feature analysis. Choosing wavelets (or analyzing functions) that are simultaneously localized in both space and frequency, results in a powerful methodology for image analysis. Multiresolution and orientation selectivity, known biological mechanisms in primate vision, are ingrained in wavelet representations and inspire the techniques presented in this paper. Our approach includes local analysis of complete multiscale representations. Mammograms are reconstructed from wavelet coefficients, enhanced by linear, exponential and constant weight functions localized in scale space. By improving the visualization of breast pathology we can improve the changes of early detection of breast cancers (improve quality) while requiring less time to evaluate mammograms for most patients (lower costs)
Multimodal Neuroimaging Feature Learning for Multiclass Diagnosis of Alzheimer's Disease
The accurate diagnosis of Alzheimer's disease (AD) is essential for patient care and will be increasingly important as disease modifying agents become available, early in the course of the disease. Although studies have applied machine learning methods for the computer-aided diagnosis of AD, a bottleneck in the diagnostic performance was shown in previous methods, due to the lacking of efficient strategies for representing neuroimaging biomarkers. In this study, we designed a novel diagnostic framework with deep learning architecture to aid the diagnosis of AD. This framework uses a zero-masking strategy for data fusion to extract complementary information from multiple data modalities. Compared to the previous state-of-the-art workflows, our method is capable of fusing multimodal neuroimaging features in one setting and has the potential to require less labeled data. A performance gain was achieved in both binary classification and multiclass classification of AD. The advantages and limitations of the proposed framework are discussed
Towards Realistic Facial Expression Recognition
Automatic facial expression recognition has attracted significant attention over the past decades. Although substantial progress has been achieved for certain scenarios (such as frontal faces in strictly controlled laboratory settings), accurate recognition of facial expression in realistic environments remains unsolved for the most part. The main objective of this thesis is to investigate facial expression recognition in unconstrained environments. As one major problem faced by the literature is the lack of realistic training and testing data, this thesis presents a web search based framework to collect realistic facial expression dataset from the Web. By adopting an active learning based method to remove noisy images from text based image search results, the proposed approach minimizes the human efforts during the dataset construction and maximizes the scalability for future research. Various novel facial expression features are then proposed to address the challenges imposed by the newly collected dataset. Finally, a spectral embedding based feature fusion framework is presented to combine the proposed facial expression features to form a more descriptive representation. This thesis also systematically investigates how the number of frames of a facial expression sequence can affect the performance of facial expression recognition algorithms, since facial expression sequences may be captured under different frame rates in realistic scenarios. A facial expression keyframe selection method is proposed based on keypoint based frame representation. Comprehensive experiments have been performed to demonstrate the effectiveness of the presented methods
Reconnaissance des expressions faciales pour l’assistance ambiante
Au cours de ces dernières décennies, le monde a connu d’importants changements démographiques et notamment au niveau de la population âgée qui a fortement augmenté. La prise d’âge a comme conséquence directe non seulement une perte progressive des facultés cognitives, mais aussi un risque plus élevé d’être atteint de maladies neurodégénératives telles qu’Alzheimer et Parkinson. La perte des facultés cognitives cause une diminution de l’autonomie et par conséquent, une assistance quotidienne doit être fournie à ces individus afin d’assurer leur bien-être. Les établissements ainsi que le personnel spécialisé censés les prendre en charge représentent un lourd fardeau pour l’économie. Pour cette raison, d’autres solutions moins coûteuses et plus optimisées doivent être proposées. Avec l’avènement des nouvelles technologies de l’information et de la communication, il est devenu de plus en plus aisé de développer des solutions permettant de fournir une assistance adéquate aux personnes souffrant de déficiences cognitives. Les maisons intelligentes représentent l’une des solutions les plus répandues. Elles exploitent différents types de capteurs pour la collecte de données, des algorithmes et méthodes d’apprentissage automatique pour l’extraction/traitement de l’information et des actionneurs pour le déclenchement d’une réponse fournissant une assistance adéquate. Parmi les différentes sources de données qui sont exploitées, les images/vidéos restent les plus riches en termes de quantité. Les données récoltées permettent non seulement la reconnaissance d’activités, mais aussi la détection d’erreur durant l’exécution de tâches/activités de la vie quotidienne. La reconnaissance automatique des émotions trouve de nombreuses applications dans notre vie quotidienne telles que l’interaction homme-machine, l’éducation, la sécurité, le divertissement, la vision robotique et l’assistance ambiante. Cependant, les émotions restent un sujet assez complexe à cerner et de nombreuses études en psychologie et sciences cognitives continuent d’être effectuées. Les résultats obtenus servent de base afin de développer des approches plus efficaces. Les émotions humaines peuvent être perçues à travers différentes modalités telle que la voix, la posture, la gestuelle et les expressions faciales. En se basant sur les travaux de Mehrabian, les expressions faciales représentent la modalité la plus pertinente pour la reconnaissance automatique des émotions. Ainsi, l’un des objectifs de ce travail de recherche consistera à proposer des méthodes permettant l’identification des six émotions de base à savoir : la joie, la peur, la colère, la surprise, le dégoût et la tristesse. Les méthodes proposées exploitent des données d’entrée statiques et dynamiques, elles se basent aussi sur différents types de descripteurs/représentations (géométrique, apparence et hybride). Après avoir évalué les performances des méthodes proposées avec des bases de données benchmark à savoir : JAFFE, KDEF, RaFD, CK+, MMI et MUG. L’objectif principal de ce travail de recherche réside dans l’utilisation des expressions faciales afin d’améliorer les performances des systèmes d’assistance existants. Ainsi, des expérimentations ont été conduites au sein de l’environnement intelligent LIARA afin de collecter des données de validation, et ce, en suivant un protocole d’expérimentation spécifique. Lors de l’exécution d’une tâche de la vie quotidienne (préparation du café), deux types de données ont été récoltés. Les données RFID ont permis de valider la méthode de reconnaissance automatique des actions utilisateurs ainsi que la détection automatique d’erreurs. Quant aux données faciales, elles ont permis d’évaluer la contribution des expressions faciales afin d’améliorer les performances du système d’assistance en termes de détection d’erreurs. Avec une réduction du taux de fausses détections dépassant les 20%, l’objectif fixé a été atteint avec succè