215,529 research outputs found

    Generic Tubelet Proposals for Action Localization

    Full text link
    We develop a novel framework for action localization in videos. We propose the Tube Proposal Network (TPN), which can generate generic, class-independent, video-level tubelet proposals in videos. The generated tubelet proposals can be utilized in various video analysis tasks, including recognizing and localizing actions in videos. In particular, we integrate these generic tubelet proposals into a unified temporal deep network for action classification. Compared with other methods, our generic tubelet proposal method is accurate, general, and is fully differentiable under a smoothL1 loss function. We demonstrate the performance of our algorithm on the standard UCF-Sports, J-HMDB21, and UCF-101 datasets. Our class-independent TPN outperforms other tubelet generation methods, and our unified temporal deep network achieves state-of-the-art localization results on all three datasets

    Innovative strategies for 3D visualisation using photogrammetry and 3D scanning for mobile phones

    Get PDF
    3D model generation through Photogrammetry is a modern overlay of digital information representing real world objects in a virtual world. The immediate scope of this study aims at generating 3D models using imagery and overcoming the challenge of acquiring accurate 3D meshes. This research aims to achieve optimised ways to document raw 3D representations of real life objects and then converting them into retopologised, textured usable data through mobile phones. Augmented Reality (AR) is a projected combination of real and virtual objects. A lot of work is done to create market dependant AR applications so customers can view products before purchasing them. The need is to develop a product independent photogrammetry to AR pipeline which is freely available to create independent 3D Augmented models. Although for the particulars of this research paper, the aim would be to compare and analyse different open source SDK’s and libraries for developing optimised 3D Mesh using Photogrammetry/3D Scanning which will contribute as a main skeleton to the 3D-AR pipeline. Natural disasters, global political crisis, terrorist attacks and other catastrophes have led researchers worldwide to capture monuments using photogrammetry and laser scans. Some of these objects of “global importance” are processed by companies including CyArk (Cyber Archives) and UNESCO’s World Heritage Centre, who work against time to preserve these historical monuments, before they are damaged or in some cases completely destroyed. The need is to question the significance of preserving objects and monuments which might be of value locally to a city or town. What is done to preserve those objects? This research would develop pipelines for collecting and processing 3D data so the local communities could contribute towards restoring endangered sites and objects using their smartphones and making these objects available to be viewed in location based AR. There exist some companies which charge relatively large amounts of money for local scanning projects. This research would contribute as a non-profitable project which could be later used in school curriculums, visitor attractions and historical preservation organisations all over the globe at no cost. The scope isn’t limited to furniture, museums or marketing, but could be used for personal digital archiving as well. This research will capture and process virtual objects using Mobile Phones comparing methodologies used in Computer Vision design from data conversion on Mobile phones to 3D generation, texturing and retopologising. The outcomes of this research will be used as input for generating AR which is application independent of any industry or product

    Augmented reality meeting table: a novel multi-user interface for architectural design

    Get PDF
    Immersive virtual environments have received widespread attention as providing possible replacements for the media and systems that designers traditionally use, as well as, more generally, in providing support for collaborative work. Relatively little attention has been given to date however to the problem of how to merge immersive virtual environments into real world work settings, and so to add to the media at the disposal of the designer and the design team, rather than to replace it. In this paper we report on a research project in which optical see-through augmented reality displays have been developed together with prototype decision support software for architectural and urban design. We suggest that a critical characteristic of multi user augmented reality is its ability to generate visualisations from a first person perspective in which the scale of rendition of the design model follows many of the conventions that designers are used to. Different scales of model appear to allow designers to focus on different aspects of the design under consideration. Augmenting the scene with simulations of pedestrian movement appears to assist both in scale recognition, and in moving from a first person to a third person understanding of the design. This research project is funded by the European Commission IST program (IST-2000-28559)

    FoveaBox: Beyond Anchor-based Object Detector

    Full text link
    We present FoveaBox, an accurate, flexible, and completely anchor-free framework for object detection. While almost all state-of-the-art object detectors utilize predefined anchors to enumerate possible locations, scales and aspect ratios for the search of the objects, their performance and generalization ability are also limited to the design of anchors. Instead, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object. The scales of target boxes are naturally associated with feature pyramid representations. In FoveaBox, an instance is assigned to adjacent feature levels to make the model more accurate.We demonstrate its effectiveness on standard benchmarks and report extensive experimental analysis. Without bells and whistles, FoveaBox achieves state-of-the-art single model performance on the standard COCO and Pascal VOC object detection benchmark. More importantly, FoveaBox avoids all computation and hyper-parameters related to anchor boxes, which are often sensitive to the final detection performance. We believe the simple and effective approach will serve as a solid baseline and help ease future research for object detection. The code has been made publicly available at https://github.com/taokong/FoveaBox .Comment: IEEE Transactions on Image Processing, code at: https://github.com/taokong/FoveaBo
    • …
    corecore