3,956 research outputs found

    Semantic Mapping of Road Scenes

    Get PDF
    The problem of understanding road scenes has been on the fore-front in the computer vision community for the last couple of years. This enables autonomous systems to navigate and understand the surroundings in which it operates. It involves reconstructing the scene and estimating the objects present in it, such as ‘vehicles’, ‘road’, ‘pavements’ and ‘buildings’. This thesis focusses on these aspects and proposes solutions to address them. First, we propose a solution to generate a dense semantic map from multiple street-level images. This map can be imagined as the bird’s eye view of the region with associated semantic labels for ten’s of kilometres of street level data. We generate the overhead semantic view from street level images. This is in contrast to existing approaches using satellite/overhead imagery for classification of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then we describe a method to perform large scale dense 3D reconstruction of road scenes with associated semantic labels. Our method fuses the depth-maps in an online fashion, generated from the stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image sequences. The object class labels estimated from the street level stereo image sequence are used to annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by performing inference over the meshed representation of the scene. By performing labelling over the mesh we solve two issues: Firstly, images often have redundant information with multiple images describing the same scene. Solving these images separately is slow, where our method is approximately a magnitude faster in the inference stage compared to normal inference in the image domain. Secondly, often multiple images, even though they describe the same scene result in inconsistent labelling. By solving a single mesh, we remove the inconsistency of labelling across the images. Also our mesh based labelling takes into account of the object layout in the scene, which is often ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform labelling and structure computation through a hierarchical robust PN Markov Random Field defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and the object-class labels in a principled manner, through bounded approximate minimisation of a well defined and studied energy functional. In this thesis, we also introduce two object labelled datasets created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per camera view of the roadways of the United Kingdom with a subset of them annotated with object class labels and the second dataset is comprised of ground truth object labels for the publicly available KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision research community

    Development of an augmented reality guided computer assisted orthopaedic surgery system

    Get PDF
    Previously held under moratorium from 1st December 2016 until 1st December 2021.This body of work documents the developed of a proof of concept augmented reality guided computer assisted orthopaedic surgery system – ARgCAOS. After initial investigation a visible-spectrum single camera tool-mounted tracking system based upon fiducial planar markers was implemented. The use of visible-spectrum cameras, as opposed to the infra-red cameras typically used by surgical tracking systems, allowed the captured image to be streamed to a display in an intelligible fashion. The tracking information defined the location of physical objects relative to the camera. Therefore, this information allowed virtual models to be overlaid onto the camera image. This produced a convincing augmented experience, whereby the virtual objects appeared to be within the physical world, moving with both the camera and markers as expected of physical objects. Analysis of the first generation system identified both accuracy and graphical inadequacies, prompting the development of a second generation system. This too was based upon a tool-mounted fiducial marker system, and improved performance to near-millimetre probing accuracy. A resection system was incorporated into the system, and utilising the tracking information controlled resection was performed, producing sub-millimetre accuracies. Several complications resulted from the tool-mounted approach. Therefore, a third generation system was developed. This final generation deployed a stereoscopic visible-spectrum camera system affixed to a head-mounted display worn by the user. The system allowed the augmentation of the natural view of the user, providing convincing and immersive three dimensional augmented guidance, with probing and resection accuracies of 0.55±0.04 and 0.34±0.04 mm, respectively.This body of work documents the developed of a proof of concept augmented reality guided computer assisted orthopaedic surgery system – ARgCAOS. After initial investigation a visible-spectrum single camera tool-mounted tracking system based upon fiducial planar markers was implemented. The use of visible-spectrum cameras, as opposed to the infra-red cameras typically used by surgical tracking systems, allowed the captured image to be streamed to a display in an intelligible fashion. The tracking information defined the location of physical objects relative to the camera. Therefore, this information allowed virtual models to be overlaid onto the camera image. This produced a convincing augmented experience, whereby the virtual objects appeared to be within the physical world, moving with both the camera and markers as expected of physical objects. Analysis of the first generation system identified both accuracy and graphical inadequacies, prompting the development of a second generation system. This too was based upon a tool-mounted fiducial marker system, and improved performance to near-millimetre probing accuracy. A resection system was incorporated into the system, and utilising the tracking information controlled resection was performed, producing sub-millimetre accuracies. Several complications resulted from the tool-mounted approach. Therefore, a third generation system was developed. This final generation deployed a stereoscopic visible-spectrum camera system affixed to a head-mounted display worn by the user. The system allowed the augmentation of the natural view of the user, providing convincing and immersive three dimensional augmented guidance, with probing and resection accuracies of 0.55±0.04 and 0.34±0.04 mm, respectively

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    RigidFusion: RGB-D Scene Reconstruction with Rigidly-movie Objects

    Get PDF
    Although surface reconstruction from depth data has made significant advances in the recent years, handling changing environments remains a major challenge. This is unsatisfactory, as humans regularly move objects in their environments. Existing solutions focus on a restricted set of objects (e.g., those detected by semantic classifiers) possibly with template meshes, assume static camera, or mark objects touched by humans as moving. We remove these assumptions by introducing RigidFusion. Our core idea is a novel asynchronous moving-object detection method, combined with a modified volumetric fusion. This is achieved by a model-to-frame TSDF decomposition leveraging free-space carving of tracked depth values of the current frame with respect to the background model during run-time. As output, we produce separate volumetric reconstructions for the background and each moving object in the scene, along with its trajectory over time. Our method does not rely on the object priors (e.g., semantic labels or pre-scanned meshes) and is insensitive to the motion residuals between objects and the camera. In comparison to state-of-the-art methods (e.g., Co-Fusion, MaskFusion), we handle significantly more challenging reconstruction scenarios involving moving camera and improve moving-object detection (26% on the miss-detection ratio), tracking (27% on MOTA), and reconstruction (3% on the reconstruction F1) on the synthetic dataset. Please refer the supplementary and the project website for the video demonstration (geometry.cs.ucl.ac.uk/projects/2021/rigidfusion)

    Real-time synthetic primate vision

    Get PDF

    Robotic Crop Interaction in Agriculture for Soft Fruit Harvesting

    Get PDF
    Autonomous tree crop harvesting has been a seemingly attainable, but elusive, robotics goal for the past several decades. Limiting grower reliance on uncertain seasonal labour is an economic driver of this, but the ability of robotic systems to treat each plant individually also has environmental benefits, such as reduced emissions and fertiliser use. Over the same time period, effective grasping and manipulation (G&M) solutions to warehouse product handling, and more general robotic interaction, have been demonstrated. Despite research progress in general robotic interaction and harvesting of some specific crop types, a commercially successful robotic harvester has yet to be demonstrated. Most crop varieties, including soft-skinned fruit, have not yet been addressed. Soft fruit, such as plums, present problems for many of the techniques employed for their more robust relatives and require special focus when developing autonomous harvesters. Adapting existing robotics tools and techniques to new fruit types, including soft skinned varieties, is not well explored. This thesis aims to bridge that gap by examining the challenges of autonomous crop interaction for the harvesting of soft fruit. Aspects which are known to be challenging include mixed obstacle planning with both hard and soft obstacles present, poor outdoor sensing conditions, and the lack of proven picking motion strategies. Positioning an actuator for harvesting requires solving these problems and others specific to soft skinned fruit. Doing so effectively means addressing these in the sensing, planning and actuation areas of a robotic system. Such areas are also highly interdependent for grasping and manipulation tasks, so solutions need to be developed at the system level. In this thesis, soft robotics actuators, with simplifying assumptions about hard obstacle planes, are used to solve mixed obstacle planning. Persistent target tracking and filtering is used to overcome challenging object detection conditions, while multiple stages of object detection are applied to refine these initial position estimates. Several picking motions are developed and tested for plums, with varying degrees of effectiveness. These various techniques are integrated into a prototype system which is validated in lab testing and extensive field trials on a commercial plum crop. Key contributions of this thesis include I. The examination of grasping & manipulation tools, algorithms, techniques and challenges for harvesting soft skinned fruit II. Design, development and field-trial evaluation of a harvester prototype to validate these concepts in practice, with specific design studies of the gripper type, object detector architecture and picking motion for this III. Investigation of specific G&M module improvements including: o Application of the autocovariance least squares (ALS) method to noise covariance matrix estimation for visual servoing tasks, where both simulated and real experiments demonstrated a 30% improvement in state estimation error using this technique. o Theory and experimentation showing that a single range measurement is sufficient for disambiguating scene scale in monocular depth estimation for some datasets. o Preliminary investigations of stochastic object completion and sampling for grasping, active perception for visual servoing based harvesting, and multi-stage fruit localisation from RGB-Depth data. Several field trials were carried out with the plum harvesting prototype. Testing on an unmodified commercial plum crop, in all weather conditions, showed promising results with a harvest success rate of 42%. While a significant gap between prototype performance and commercial viability remains, the use of soft robotics with carefully chosen sensing and planning approaches allows for robust grasping & manipulation under challenging conditions, with both hard and soft obstacles

    Augmented reality and scene examination

    Get PDF
    The research presented in this thesis explores the impact of Augmented Reality on human performance, and compares this technology with Virtual Reality using a head-mounted video-feed for a variety of tasks that relate to scene examination. The motivation for the work was the question of whether Augmented Reality could provide a vehicle for training in crime scene investigation. The Augmented Reality application was developed using fiducial markers in the Windows Presentation Foundation, running on a wearable computer platform; Virtual Reality was developed using the Crytek game engine to present a photo-realistic 3D environment; and a video-feed was provided through head-mounted webcam. All media were presented through head-mounted displays of similar resolution to provide the sole source of visual information to participants in the experiments. The experiments were designed to increase the amount of mobility required to conduct the search task, i.e., from rotation in the horizontal or vertical plane through to movement around a room. In each experiment, participants were required to find objects and subsequently recall their location. It is concluded that human performance is affected not merely via the medium through which the world is perceived but moreover, the constraints governing how movement in the world is controlled
    • …
    corecore