215,529 research outputs found
Generic Tubelet Proposals for Action Localization
We develop a novel framework for action localization in videos. We propose
the Tube Proposal Network (TPN), which can generate generic, class-independent,
video-level tubelet proposals in videos. The generated tubelet proposals can be
utilized in various video analysis tasks, including recognizing and localizing
actions in videos. In particular, we integrate these generic tubelet proposals
into a unified temporal deep network for action classification. Compared with
other methods, our generic tubelet proposal method is accurate, general, and is
fully differentiable under a smoothL1 loss function. We demonstrate the
performance of our algorithm on the standard UCF-Sports, J-HMDB21, and UCF-101
datasets. Our class-independent TPN outperforms other tubelet generation
methods, and our unified temporal deep network achieves state-of-the-art
localization results on all three datasets
Innovative strategies for 3D visualisation using photogrammetry and 3D scanning for mobile phones
3D model generation through Photogrammetry is a modern overlay of digital information representing real world objects in a virtual world. The immediate scope of this study aims at generating 3D models using imagery and overcoming the challenge of acquiring accurate 3D meshes. This research aims to achieve optimised ways to document raw 3D representations of real life objects and then converting them into retopologised, textured usable data through mobile phones. Augmented Reality (AR) is a projected combination of real and virtual objects. A lot of work is done to create market dependant AR applications so customers can view products before purchasing them. The need is to develop a product independent photogrammetry to AR pipeline which is freely available to create independent 3D Augmented models. Although for the particulars of this research paper, the aim would be to compare and analyse different open source SDK’s and libraries for developing optimised 3D Mesh using Photogrammetry/3D Scanning which will contribute as a main skeleton to the 3D-AR pipeline. Natural disasters, global political crisis, terrorist attacks and other catastrophes have led researchers worldwide to capture monuments using photogrammetry and laser scans. Some of these objects of “global importance” are processed by companies including CyArk (Cyber Archives) and UNESCO’s World Heritage Centre, who work against time to preserve these historical monuments, before they are damaged or in some cases completely destroyed. The need is to question the significance of preserving objects and monuments which might be of value locally to a city or town. What is done to preserve those objects? This research would develop pipelines for collecting and processing 3D data so the local communities could contribute towards restoring endangered sites and objects using their smartphones and making these objects available to be viewed in location based AR. There exist some companies which charge relatively large amounts of money for local scanning projects. This research would contribute as a non-profitable project which could be later used in school curriculums, visitor attractions and historical preservation organisations all over the globe at no cost. The scope isn’t limited to furniture, museums or marketing, but could be used for personal digital archiving as well. This research will capture and process virtual objects using Mobile Phones comparing methodologies used in Computer Vision design from data conversion on Mobile phones to 3D generation, texturing and retopologising. The outcomes of this research will be used as input for generating AR which is application independent of any industry or product
Augmented reality meeting table: a novel multi-user interface for architectural design
Immersive virtual environments have received widespread attention as providing possible replacements for the media and systems that designers traditionally use, as well as, more generally, in providing support for collaborative work. Relatively little attention has been given to date however to the problem of how to merge immersive virtual environments into real world work settings, and so to add to the media at the disposal of the designer and the design team, rather than to replace it. In this paper we report on a research project in which optical see-through augmented reality displays have been developed together with prototype decision support software for architectural and urban design. We suggest that a critical characteristic of multi user augmented reality is its ability to generate visualisations from a first person perspective in which the scale of rendition of the design model follows many of the conventions that designers are used to. Different scales of model appear to allow designers to focus on different aspects of the design under consideration. Augmenting the scene with simulations of pedestrian movement appears to assist both in scale recognition, and in moving from a first person to a third person understanding of the design. This research project is funded by the European Commission IST program (IST-2000-28559)
FoveaBox: Beyond Anchor-based Object Detector
We present FoveaBox, an accurate, flexible, and completely anchor-free
framework for object detection. While almost all state-of-the-art object
detectors utilize predefined anchors to enumerate possible locations, scales
and aspect ratios for the search of the objects, their performance and
generalization ability are also limited to the design of anchors. Instead,
FoveaBox directly learns the object existing possibility and the bounding box
coordinates without anchor reference. This is achieved by: (a) predicting
category-sensitive semantic maps for the object existing possibility, and (b)
producing category-agnostic bounding box for each position that potentially
contains an object. The scales of target boxes are naturally associated with
feature pyramid representations. In FoveaBox, an instance is assigned to
adjacent feature levels to make the model more accurate.We demonstrate its
effectiveness on standard benchmarks and report extensive experimental
analysis. Without bells and whistles, FoveaBox achieves state-of-the-art single
model performance on the standard COCO and Pascal VOC object detection
benchmark. More importantly, FoveaBox avoids all computation and
hyper-parameters related to anchor boxes, which are often sensitive to the
final detection performance. We believe the simple and effective approach will
serve as a solid baseline and help ease future research for object detection.
The code has been made publicly available at
https://github.com/taokong/FoveaBox .Comment: IEEE Transactions on Image Processing, code at:
https://github.com/taokong/FoveaBo
- …