Search CORE

1,432 research outputs found

Learning View-Model Joint Relevance for 3D Object Retrieval

Author: Dong Jiyang
He Ning
Lu Ke
Shao Ling
Xue Jian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/01/2015
Field of study

3D object retrieval has attracted extensive research efforts and become an important task in recent years. It is noted that how to measure the relevance between 3D objects is still a difficult issue. Most of the existing methods employ just the model-based or view-based approaches, which may lead to incomplete information for 3D object representation. In this paper, we propose to jointly learn the view-model relevance among 3D objects for retrieval, in which the 3D objects are formulated in different graph structures. With the view information, the multiple views of 3D objects are employed to formulate the 3D object relationship in an object hypergraph structure. With the model data, the model-based features are extracted to construct an object graph to describe the relationship among the 3D objects. The learning on the two graphs is conducted to estimate the relevance among the 3D objects, in which the view/model graph weights can be also optimized in the learning process. This is the first work to jointly explore the view-based and model-based relevance among the 3D objects in a graph-based framework. The proposed method has been evaluated in three data sets. The experimental results and comparison with the state-of-the-art methods demonstrate the effectiveness on retrieval accuracy of the proposed 3D object retrieval method

Northumbria Research Link

University of East Anglia digital repository

Mapping, Localization and Path Planning for Image-based Navigation using Visual Features and Map

Author: Chhatkuli Ajad
Paudel Danda Pani
Probst Thomas
Thoma Janine
Van Gool Luc
Publication venue
Publication date: 01/01/2019
Field of study

Building on progress in feature representations for image retrieval, image-based localization has seen a surge of research interest. Image-based localization has the advantage of being inexpensive and efficient, often avoiding the use of 3D metric maps altogether. That said, the need to maintain a large number of reference images as an effective support of localization in a scene, nonetheless calls for them to be organized in a map structure of some kind. The problem of localization often arises as part of a navigation process. We are, therefore, interested in summarizing the reference images as a set of landmarks, which meet the requirements for image-based navigation. A contribution of this paper is to formulate such a set of requirements for the two sub-tasks involved: map construction and self-localization. These requirements are then exploited for compact map representation and accurate self-localization, using the framework of a network flow problem. During this process, we formulate the map construction and self-localization problems as convex quadratic and second-order cone programs, respectively. We evaluate our methods on publicly available indoor and outdoor datasets, where they outperform existing methods significantly.Comment: CVPR 2019, for implementation see https://github.com/janinethom

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization

Author: Cadena Cesar
Debraine Frédéric
Dymczyk Marcin
Sarlin Paul-Edouard
Siegwart Roland
Publication venue
Publication date: 01/01/2018
Field of study

Many robotics applications require precise pose estimates despite operating in large and changing environments. This can be addressed by visual localization, using a pre-computed 3D model of the surroundings. The pose estimation then amounts to finding correspondences between 2D keypoints in a query image and 3D points in the model using local descriptors. However, computational power is often limited on robotic platforms, making this task challenging in large-scale environments. Binary feature descriptors significantly speed up this 2D-3D matching, and have become popular in the robotics community, but also strongly impair the robustness to perceptual aliasing and changes in viewpoint, illumination and scene structure. In this work, we propose to leverage recent advances in deep learning to perform an efficient hierarchical localization. We first localize at the map level using learned image-wide global descriptors, and subsequently estimate a precise pose from 2D-3D matches computed in the candidate places only. This restricts the local search and thus allows to efficiently exploit powerful non-binary descriptors usually dismissed on resource-constrained devices. Our approach results in state-of-the-art localization performance while running in real-time on a popular mobile platform, enabling new prospects for robotics research.Comment: CoRL 2018 Camera-ready (fix typos and update citations

arXiv.org e-Print Archive

Repository for Publications and Research Data

Video Registration in Egocentric Vision under Day and Night Illumination Changes

Author: Alletto Stefano
Cucchiara Rita
Serra Giuseppe
Publication venue
Publication date: 28/07/2016
Field of study

With the spread of wearable devices and head mounted cameras, a wide range of application requiring precise user localization is now possible. In this paper we propose to treat the problem of obtaining the user position with respect to a known environment as a video registration problem. Video registration, i.e. the task of aligning an input video sequence to a pre-built 3D model, relies on a matching process of local keypoints extracted on the query sequence to a 3D point cloud. The overall registration performance is strictly tied to the actual quality of this 2D-3D matching, and can degrade if environmental conditions such as steep changes in lighting like the ones between day and night occur. To effectively register an egocentric video sequence under these conditions, we propose to tackle the source of the problem: the matching process. To overcome the shortcomings of standard matching techniques, we introduce a novel embedding space that allows us to obtain robust matches by jointly taking into account local descriptors, their spatial arrangement and their temporal robustness. The proposal is evaluated using unconstrained egocentric video sequences both in terms of matching quality and resulting registration performance using different 3D models of historical landmarks. The results show that the proposed method can outperform state of the art registration algorithms, in particular when dealing with the challenges of night and day sequences

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

On Rearrangement of Items Stored in Stacks

Author: Szegedy Mario
Yu Jingjin
Publication venue
Publication date: 06/05/2020
Field of study

There are

n \ge 2

stacks, each filled with

d

items, and one empty stack. Every stack has capacity

d > 0

. A robot arm, in one stack operation (step), may pop one item from the top of a non-empty stack and subsequently push it onto a stack not at capacity. In a {\em labeled} problem, all

nd

items are distinguishable and are initially randomly scattered in the

n

stacks. The items must be rearranged using pop-and-pushs so that in the end, the

k^{\rm th}

stack holds items

(k-1)d +1, \ldots, kd

, in that order, from the top to the bottom for all

1 \le k \le n

. In an {\em unlabeled} problem, the

nd

items are of

n

types of

d

each. The goal is to rearrange items so that items of type

k

are located in the

k^{\rm th}

stack for all

1 \le k \le n

. In carrying out the rearrangement, a natural question is to find the least number of required pop-and-pushes. Our main contributions are: (1) an algorithm for restoring the order of

n^2

items stored in an

n \times n

table using only

2n

column and row permutations, and its generalization, and (2) an algorithm with a guaranteed upper bound of

O(nd)

steps for solving both versions of the stack rearrangement problem when

d \le \lceil cn \rceil

for arbitrary fixed positive number

c

. In terms of the required number of steps, the labeled and unlabeled version have lower bounds

\Omega(nd + nd{\frac{\log d}{\log n}})

and

\Omega(nd)

, respectively

arXiv.org e-Print Archive

Learning 3D Scene Priors with 2D Supervision

Author: Dai Angela
Han Xiaoguang
Nie Yinyu
Nießner Matthias
Publication venue
Publication date: 25/11/2022
Field of study

Holistic 3D scene understanding entails estimation of both layout configuration and object geometry in a 3D environment. Recent works have shown advances in 3D scene estimation from various input modalities (e.g., images, 3D scans), by leveraging 3D supervision (e.g., 3D bounding boxes or CAD models), for which collection at scale is expensive and often intractable. To address this shortcoming, we propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth. Instead, we rely on 2D supervision from multi-view RGB images. Our method represents a 3D scene as a latent vector, from which we can progressively decode to a sequence of objects characterized by their class categories, 3D bounding boxes, and meshes. With our trained autoregressive decoder representing the scene prior, our method facilitates many downstream applications, including scene synthesis, interpolation, and single-view reconstruction. Experiments on 3D-FRONT and ScanNet show that our method outperforms state of the art in single-view reconstruction, and achieves state-of-the-art results in scene synthesis against baselines which require for 3D supervision.Comment: Video: https://youtu.be/YT7MEdygRoY Project: https://yinyunie.github.io/sceneprior-page

arXiv.org e-Print Archive

k-Partite Graph Reinforcement and its Application in Multimedia Information Retrieval

Author: GAO Yue
Ji Rongrong
SHEN Jialie
WANG Meng
ZHA Zheng-Jun
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

10.1016/j.ins.2012.01.003Information Sciences194224-239ISIJ

Institutional Knowledge at Singapore Management University

ScholarBank@NUS

3D Shape Knowledge Graph for Cross-domain and Cross-modal 3D Shape Retrieval

Author: Chang Rihao
Hao Tong
Liu Anan
Nie Weizhi
Publication venue
Publication date: 26/10/2022
Field of study

With the development of 3D modeling and fabrication, 3D shape retrieval has become a hot topic. In recent years, several strategies have been put forth to address this retrieval issue. However, it is difficult for them to handle cross-modal 3D shape retrieval because of the natural differences between modalities. In this paper, we propose an innovative concept, namely, geometric words, which is regarded as the basic element to represent any 3D or 2D entity by combination, and assisted by which, we can simultaneously handle cross-domain or cross-modal retrieval problems. First, to construct the knowledge graph, we utilize the geometric word as the node, and then use the category of the 3D shape as well as the attribute of the geometry to bridge the nodes. Second, based on the knowledge graph, we provide a unique way for learning each entity's embedding. Finally, we propose an effective similarity measure to handle the cross-domain and cross-modal 3D shape retrieval. Specifically, every 3D or 2D entity could locate its geometric terms in the 3D knowledge graph, which serve as a link between cross-domain and cross-modal data. Thus, our approach can achieve the cross-domain and cross-modal 3D shape retrieval at the same time. We evaluated our proposed method on the ModelNet40 dataset and ShapeNetCore55 dataset for both the 3D shape retrieval task and cross-domain 3D shape retrieval task. The classic cross-modal dataset (MI3DOR) is utilized to evaluate cross-modal 3D shape retrieval. Experimental results and comparisons with state-of-the-art methods illustrate the superiority of our approach

arXiv.org e-Print Archive