11 research outputs found
SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval
To address 3D object retrieval, substantial efforts have been made to
generate highly discriminative descriptors of 3D objects represented by a
single modality, e.g., voxels, point clouds or multi-view images. It is
promising to leverage the complementary information from multi-modality
representations of 3D objects to further improve retrieval performance.
However, multi-modality 3D object retrieval is rarely developed and analyzed on
large-scale datasets. In this paper, we propose self-and-cross attention based
aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object
retrieval. With deep features extracted from point clouds and multi-view
images, we design two types of feature aggregation modules, namely the
In-Modality Aggregation Module (IMAM) and the Cross-Modality Aggregation Module
(CMAM), for effective feature fusion. IMAM leverages a self-attention mechanism
to aggregate multi-view features while CMAM exploits a cross-attention
mechanism to interact point cloud features with multi-view features. The final
descriptor of a 3D object for object retrieval can be obtained via
concatenating the aggregated features from both modules. Extensive experiments
and analysis are conducted on three datasets, ranging from small to large
scale, to show the superiority of the proposed SCA-PVNet over the
state-of-the-art methods
Attire detection and retrieval based on region proposals with convolutional neural network
Region Proposals with Convolutional Neural Network Features (RCNN), an object detection algorithm, has a good performance on Visual Object Classes Challenge 2012 [1]. There are two main approaches to improve the performance of it. The first one is to apply high-capacity Convolutional Neutral Network (CNN) with region proposals to localize and segment the object. The other one is to perform supervised pre-training when the labelled data is insufficient.
The goal of this project is to build an attire detection system using Region Proposals with Convolutional Neural Network Features. In order to study RCNN, we introduce some concepts related to it. We explain the definitions of object detection, Neural Network (NN) and Convolutional Neural Network (CNN) in detail. The description of RCNN contains two parts. The first part is the method of region proposal, and the second part is the CNN architecture. Then we describe the attire detection system and the process of dataset construction in detail. Finally, we summarize and discuss the testing results.
The testing results show RCNN have a good performance on attire object detection. The mean average precision (mAP) based on all categories is 57.26%. Based on the testing results, we find that the quality and amount of training data have a great effect on the performance of attire detection system.Master of Science (Signal Processing
Deep residual pooling network for texture recognition
Current deep learning-based texture recognition methods extract spatial orderless features from pre-trained deep learning models that are trained on large-scale image datasets. These methods either produce high dimensional features or have multiple steps like dictionary learning, feature encoding and dimension reduction. In this paper, we propose a novel end-to-end learning framework that not only overcomes these limitations, but also demonstrates faster learning. The proposed framework incorporates a residual pooling layer consisting of a residual encoding module and an aggregation module. The residual encoder preserves the spatial information for improved feature learning and the aggregation module generates orderless feature for classification through a simple averaging. The feature has the lowest dimension among previous deep texture recognition approaches, yet it achieves state-of-the-art performance on benchmark texture recognition datasets such as FMD, DTD, 4D Light and one industry dataset used for metal surface anomaly detection. Additionally, the proposed method obtains comparable results on the MIT-Indoor scene recognition dataset. Our codes are available at https://github.com/maoshangbo/DRP-Texture-Recognition.This work was conducted within the Rolls-Royce@NTU Corporate Lab under the project DACS 2.1: Artificial Intelligence (AI) for Smart Image Understanding with support from the Industry Alignment Fund (IAF) Singapore under the Corp Lab@University Scheme
Unsupervised feature learning with sparse Bayesian auto-encoding based extreme learning machine
Extreme learning machine (ELM) is a popular method in machine learning with extremely few parameters, fast learning speed and model efficiency. Unsupervised feature learning based ELM receives rising research focus. Recently the ELM auto-encoder (ELM-AE) was proposed for this task, which develops the ELM based compact feature learning without sacrificing elegant solution. Compared with ELM-AE and following â„“1-regularized ELM-AE, we introduce a sparse Bayesian learning scheme into ELM-AE for better generalization capability. A parallel training strategy is also integrated to improve time-efficiency of multi-output sparse Bayesian learning. Furthermore, pruning hidden nodes for better performance and efficiency according to estimated variances of prior distribution of output weights is achieved. Experiments on several datasets verify the effectiveness and efficiency of our proposed ELM-AE for unsupervised feature learning, compared with PCA, NMF, ELM-AE and â„“1-regularized ELM-AE
R-ELMNet: regularized extreme learning machine network
Principal component analysis network (PCANet), as an unsupervised shallow network, demonstrates noticeable effectiveness on datasets of various volumes. It carries a two-layer convolution with PCA as filter learning method, followed by a block-wise histogram post-processing stage. Following the structure of PCANet, extreme learning machine auto-encoder (ELM-AE) variants are employed to replace the PCA's role, which come from extreme learning machine network (ELMNet) and hierarchical ELMNet. ELMNet emphasizes the importance of orthogonal projection while overlooking non-linearity. The latter introduces complex pre-processing to overcome drawback of non-linear ELM-AE. In this paper, we analyze intrinsic characteristics of ELM-AE variants and accordingly propose a regularized ELM-AE, which combines non-linearity learning capability and approximately orthogonal projection. Experiments on image classification show the effectiveness compared to supervised convolutional neural networks and related shallow networks on unsupervised feature learning