53,745 research outputs found

    SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval

    Full text link
    To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors of 3D objects represented by a single modality, e.g., voxels, point clouds or multi-view images. It is promising to leverage the complementary information from multi-modality representations of 3D objects to further improve retrieval performance. However, multi-modality 3D object retrieval is rarely developed and analyzed on large-scale datasets. In this paper, we propose self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object retrieval. With deep features extracted from point clouds and multi-view images, we design two types of feature aggregation modules, namely the In-Modality Aggregation Module (IMAM) and the Cross-Modality Aggregation Module (CMAM), for effective feature fusion. IMAM leverages a self-attention mechanism to aggregate multi-view features while CMAM exploits a cross-attention mechanism to interact point cloud features with multi-view features. The final descriptor of a 3D object for object retrieval can be obtained via concatenating the aggregated features from both modules. Extensive experiments and analysis are conducted on three datasets, ranging from small to large scale, to show the superiority of the proposed SCA-PVNet over the state-of-the-art methods

    View subspaces for indexing and retrieval of 3D models

    Full text link
    View-based indexing schemes for 3D object retrieval are gaining popularity since they provide good retrieval results. These schemes are coherent with the theory that humans recognize objects based on their 2D appearances. The viewbased techniques also allow users to search with various queries such as binary images, range images and even 2D sketches. The previous view-based techniques use classical 2D shape descriptors such as Fourier invariants, Zernike moments, Scale Invariant Feature Transform-based local features and 2D Digital Fourier Transform coefficients. These methods describe each object independent of others. In this work, we explore data driven subspace models, such as Principal Component Analysis, Independent Component Analysis and Nonnegative Matrix Factorization to describe the shape information of the views. We treat the depth images obtained from various points of the view sphere as 2D intensity images and train a subspace to extract the inherent structure of the views within a database. We also show the benefit of categorizing shapes according to their eigenvalue spread. Both the shape categorization and data-driven feature set conjectures are tested on the PSB database and compared with the competitor view-based 3D shape retrieval algorithmsComment: Three-Dimensional Image Processing (3DIP) and Applications (Proceedings Volume) Proceedings of SPIE Volume: 7526 Editor(s): Atilla M. Baskurt ISBN: 9780819479198 Date: 2 February 201

    Improving 3D Shape Retrieval Methods based on Bag-of-Feature Approach by using Local Codebooks

    No full text
    Also available online at http://www.sersc.org/journals/IJFGCN/vol5_no4/3.pdfInternational audienceRecent investigations illustrate that view-based methods, with pose normalization pre-processing get better performances in retrieving rigid models than other approaches and still the most popular and practical methods in the field of 3D shape retrieval. In this paper we present an improvement of 3D shape retrieval methods based on bag-of features approach. These methods use this approach to integrate a set of features extracted from 2D views of the 3D objects using the SIFT (Scale Invariant Feature Transform) algorithm into histograms using vector quantization which is based on a global visual codebook. In order to improve the retrieval performances, we propose to associate to each 3D object its local visual codebook instead of a unique global codebook. The experimental results obtained on the Princeton Shape Benchmark database, for the BF-SIFT method proposed by Ohbuchi, et al., and CM-BOF proposed by Zhouhui, et al., show that the proposed approach performs better than the original approach

    3D Model Retrieval with Spherical Harmonics and Moments

    Full text link
    We consider 3D object retrieval in which a polygonal mesh serves as a query and similar objects are retrieved from a collection of 3D objects. Algorithms proceed ๏ฌrst by a normalization step in which models are transformed into canonical coordinates. Second, feature vectors are extracted and compared with those derived from normalized models in the search space. In the feature vector space nearest neighbors are computed and ranked. Retrieved objects are displayed for inspection, selection, and processing. Our feature vectors are based on rays cast from the center of mass of the object. For each ray the object extent in the ray direction yields a sample of a function on the sphere. We compared two kinds of representations of this function, namely spherical harmonics and moments. Our empirical comparison using precision-recall diagrams for retrieval results in a data base of 3D models showed that the method using spherical harmonics performed better

    SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2019. 8. ์ด๊ฒฝ๋ฌด.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 3D ๋ฌผ์ฒด๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๊ธฐ์œ„ํ•˜์—ฌ ์ž…์ฒดํ™”๋ฒ•์˜ ํˆฌ์‚ฌ๋ฅผ ํ™œ์šฉํ•œ ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ € ์ž…์ฒดํ™”๋ฒ•์˜ ํˆฌ์‚ฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 3D ์ž…๋ ฅ ์˜์ƒ์„ 2D ํ‰๋ฉด ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. ๋˜ํ•œ, ๊ฐ์ฒด์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์–•์€ 2Dํ•ฉ์„ฑ๊ณฑ์‹ ์…ฉ๋ง(CNN)์„ ์ œ์‹œํ•˜๊ณ , ๋‹ค์ค‘์‹œ์ ์œผ๋กœ๋ถ€ํ„ฐ ์–ป์€ ๊ฐ์ฒด ์นดํ…Œ๊ณ ๋ฆฌ์˜ ์ถ”์ •๊ฐ’๋“ค์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์•™์ƒ๋ธ” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ด๋ฅผ์œ„ํ•ด (1) ์ž…์ฒดํ™”๋ฒ•ํˆฌ์‚ฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ 3D ๊ฐ์ฒด๋ฅผ 2D ํ‰๋ฉด ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  (2) ๋‹ค์ค‘์‹œ์  ์˜์ƒ๋“ค์˜ ํŠน์ง•์ ์„ ํ•™์Šต (3) ํšจ๊ณผ์ ์ด๊ณ  ๊ฐ•์ธํ•œ ์‹œ์ ์˜ ํŠน์ง•์ ์„ ์„ ๋ณ„ํ•œ ํ›„ (4) ๋‹ค์ค‘์‹œ์  ์•™์ƒ๋ธ”์„ ํ†ตํ•œ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” 4๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ ํ•™์Šต๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋งค์šฐ ์ ์€ ๋ชจ๋ธ์˜ ํ•™์Šต ๋ณ€์ˆ˜์™€ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๊ณผ ๋™์‹œ์— ๊ฐ์ฒด ๋ถ„๋ฅ˜ ๋ฐ ๊ฒ€์ƒ‰์—์„œ์˜ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ ์žˆ์Œ์„ ์ฆ๋ช…ํ•˜์˜€๋‹ค.We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects. We first transform a 3D input volume into a 2D planar image using stereographic projection. We then present a shallow 2D convolutional neural network (CNN) to estimate the object category followed by view ensemble, which combines the responses from multiple views of the object to further enhance the predictions. Specifically, the proposed approach consists of four stages: (1) Stereographic projection of a 3D object, (2) view-specific feature learning, (3) view selection and (4) view ensemble. The proposed approach performs comparably to the state-of-the-art methods while having substantially lower GPU memory as well as network parameters. Despite its lightness, the experiments on 3D object classification and shape retrievals demonstrate the high performance of the proposed method.1 INTRODUCTION 2 Related Work 2.1 Point cloud-based methods 2.2 3D model-based methods 2.3 2D/2.5D image-based methods 3 Proposed Stereographic Projection Network 3.1 Stereographic Representation 3.2 Network Architecture 3.3 View Selection 3.4 View Ensemble 4 Experimental Evaluation 4.1 Datasets 4.2 Training 4.3 Choice of Stereographic Projection 4.4 Test on View Selection Schemes 4.5 3D Object Classification 4.6 Shape Retrieval 4.7 Implementation 5 ConclusionsMaste
    • โ€ฆ
    corecore