Search CORE

781 research outputs found

Hierarchical structure-and-motion recovery from uncalibrated images

Author: Farenzena Michela
Fusiello Andrea
Gherardi Riccardo
Toldo Roberto
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

This paper addresses the structure-and-motion problem, that requires to find camera motion and 3D struc- ture from point matches. A new pipeline, dubbed Samantha, is presented, that departs from the prevailing sequential paradigm and embraces instead a hierarchical approach. This method has several advantages, like a provably lower computational complexity, which is necessary to achieve true scalability, and better error containment, leading to more stability and less drift. Moreover, a practical autocalibration procedure allows to process images without ancillary information. Experiments with real data assess the accuracy and the computational efficiency of the method.Comment: Accepted for publication in CVI

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

SIFT-based local spectrogram image descriptor: a novel feature for robust music identification

Author
Publication venue: Springer
Publication date: 12/02/2015
Field of study

Springer - Publisher Connector

Domain-Size Pooling in Local Descriptors: DSP-SIFT

Author: Dong Jingming
Soatto Stefano
Publication venue
Publication date: 05/05/2015
Field of study

We introduce a simple modification of local image descriptors, such as SIFT, based on pooling gradient orientations across different domain sizes, in addition to spatial locations. The resulting descriptor, which we call DSP-SIFT, outperforms other methods in wide-baseline matching benchmarks, including those based on convolutional neural networks, despite having the same dimension of SIFT and requiring no training.Comment: Extended version of the CVPR 2015 paper. Technical Report UCLA CSD 14002

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Hyper-network Based End-to-end Visual Servoing with Arbitrary Desired Poses

Author: Chen Anzhe
Jing Wei
Wang Yue
Xiong Rong
Xu Kechun
Yu Hongxiang
Zhou Zhongxiang
Publication venue
Publication date: 18/04/2023
Field of study

Recently, several works achieve end-to-end visual servoing (VS) for robotic manipulation by replacing traditional controller with differentiable neural networks, but lose the ability to servo arbitrary desired poses. This letter proposes a differentiable architecture for arbitrary pose servoing: a hyper-network based neural controller (HPN-NC). To achieve this, HPN-NC consists of a hyper net and a low-level controller, where the hyper net learns to generate the parameters of the low-level controller and the controller uses the 2D keypoints error for control like traditional image-based visual servoing (IBVS). HPN-NC can complete 6 degree of freedom visual servoing with large initial offset. Taking advantage of the fully differentiable nature of HPN-NC, we provide a three-stage training procedure to servo real world objects. With self-supervised end-to-end training, the performance of the integrated model can be further improved in unseen scenes and the amount of manual annotations can be significantly reduced

arXiv.org e-Print Archive

Pixel-Level Deep Multi-Dimensional Embeddings for Homogeneous Multiple Object Tracking

Author: Mittek Mateusz
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2019
Field of study

The goal of Multiple Object Tracking (MOT) is to locate multiple objects and keep track of their individual identities and trajectories given a sequence of (video) frames. A popular approach to MOT is tracking by detection consisting of two processing components: detection (identification of objects of interest in individual frames) and data association (connecting data from multiple frames). This work addresses the detection component by introducing a method based on semantic instance segmentation, i.e., assigning labels to all visible pixels such that they are unique among different instances. Modern tracking methods often built around Convolutional Neural Networks (CNNs) and additional, explicitly-defined post-processing steps. This work introduces two detection methods that incorporate multi-dimensional embeddings. We train deep CNNs to produce easily-clusterable embeddings for semantic instance segmentation and to enable object detection through pose estimation. The use of embeddings allows the method to identify per-pixel instance membership for both tasks. Our method specifically targets applications that require long-term tracking of homogeneous targets using a stationary camera. Furthermore, this method was developed and evaluated on a livestock tracking application which presents exceptional challenges that generalized tracking methods are not equipped to solve. This is largely because contemporary datasets for multiple object tracking lack properties that are specific to livestock environments. These include a high degree of visual similarity between targets, complex physical interactions, long-term inter-object occlusions, and a fixed-cardinality set of targets. For the reasons stated above, our method is developed and tested with the livestock application in mind and, specifically, group-housed pigs are evaluated in this work. Our method reliably detects pigs in a group housed environment based on the publicly available dataset with 99% precision and 95% using pose estimation and achieves 80% accuracy when using semantic instance segmentation at 50% IoU threshold. Results demonstrate our method\u27s ability to achieve consistent identification and tracking of group-housed livestock, even in cases where the targets are occluded and despite the fact that they lack uniquely identifying features. The pixel-level embeddings used by the proposed method are thoroughly evaluated in order to demonstrate their properties and behaviors when applied to real data. Adivser: Lance C. Pére

DigitalCommons@University of Nebraska

Cable Tension Monitoring using Non-Contact Vision-based Techniques

Author: Chu Chaoyang
Publication venue: 'University of Windsor Leddy Library'
Publication date: 07/07/2020
Field of study

In cable-stayed bridges, the structural systems of tensioned cables play a critical role in structural and functional integrity. Thereby, tensile forces in the cables become one of the essential indicators in structural health monitoring (SHM). In this thesis, a video image processing technology integrated with cable dynamic analysis is proposed as a non-contact vision-based measurement technique, which provides a user-friendly, cost-effective, and computationally efficient solution to displacement extraction, frequency identification, and cable tension monitoring. In contrast to conventional contact sensors, the vision-based system is capable of taking remote measurements of cable dynamic response while having flexible sensing capability. Since cable detection is a substantial step in displacement extraction, a comprehensive study on the feasibility of the adopted feature detector is conducted under various testing scenarios. The performance of the feature detector is quantified by developing evaluation parameters. Enhancement methods for the feature detector in cable detection are investigated as well under complex testing environments. Threshold-dependent image matching approaches, which optimize the functionality of the feature-based video image processing technology, is proposed for noise-free and noisy background scenarios. The vision-based system is validated through experimental studies of free vibration tests on a single undamped cable in laboratory settings. The maximum percentage difference of the identified cable fundamental frequency is found to be 0.74% compared with accelerometer readings, while the maximum percentage difference of the estimated cable tensile force is 4.64% compared to direct measurement by a load cell

Scholarship at UWindsor

Online Structured Learning for Real-Time Computer Vision Gaming Applications

Author: Hare S
Publication venue: 'Oxford Brookes University'
Publication date: 01/01/2012
Field of study

In recent years computer vision has played an increasingly important role in the development of computer games, and it now features as one of the core technologies for many gaming platforms. The work in this thesis addresses three problems in real-time computer vision, all of which are motivated by their potential application to computer games. We rst present an approach for real-time 2D tracking of arbitrary objects. In common with recent research in this area we incorporate online learning to provide an appearance model which is able to adapt to the target object and its surrounding background during tracking. However, our approach moves beyond the standard framework of tracking using binary classication and instead integrates tracking and learning in a more principled way through the use of structured learning. As well as providing a more powerful framework for adaptive visual object tracking, our approach also outperforms state-of-the-art tracking algorithms on standard datasets. Next we consider the task of keypoint-based object tracking. We take the traditional pipeline of matching keypoints followed by geometric verication and show how this can be embedded into a structured learning framework in order to provide principled adaptivity to a given environment. We also propose an approximation method allowing us to take advantage of recently developed binary image descriptors, meaning our approach is suitable for real-time application even on low-powered portable devices. Experimentally, we clearly see the benet that online adaptation using structured learning can bring to this problem. Finally, we present an approach for approximately recovering the dense 3D structure of a scene which has been mapped by a simultaneous localisation and mapping system. Our approach is guided by the constraints of the low-powered portable hardware we are targeting, and we develop a system which coarsely models the scene using a small number of planes. To achieve this, we frame the task as a structured prediction problem and introduce online learning into our approach to provide adaptivity to a given scene. This allows us to use relatively simple multi-view information coupled with online learning of appearance to efficiently produce coarse reconstructions of a scene

Oxford Brookes University: RADAR

A new multi-criteria tie point filtering approach to increase the accuracy of UAV photogrammetry models

Author: Li Weilian
Mousavi Vahid (R20493)
Rashidi Maria (R18339)
Varshosaz Masood
Publication venue: Switzerland, MDPI
Publication date: 01/01/2022
Field of study

Extracting accurate tie points plays an essential role in the accuracy of image orientation in Unmanned Aerial Vehicle (UAV) photogrammetry. In this study, a Multi-Criteria Decision Making (MCDM) automatic filtering method is presented. Based on the quality features of a photogrammetric model, the proposed method works at the level of sparse point cloud to remove low-quality tie points for refining the orientation results. In the proposed algorithm, different factors that affect the quality of tie points are identified. The quality measures are then aggregated by applying MCDM methods and a competency score for each 3D tie point. These scores are employed in an automatic filtering approach that selects a subset of high-quality points which are then used to repeat the bundle adjustment. To evaluate the proposed algorithm, various internal and external studies were conducted on different datasets. The findings suggest that our method is both effective and reliable. In addition, in comparison to the existing filtering techniques, the proposed strategy increases the accuracy of bundle adjustment and dense point cloud generation by about 40% and 70%, respectively

Directory of Open Access Journals

Western Sydney ResearchDirect