394 research outputs found

    Semantic Mapping of Road Scenes

    Get PDF
    The problem of understanding road scenes has been on the fore-front in the computer vision community for the last couple of years. This enables autonomous systems to navigate and understand the surroundings in which it operates. It involves reconstructing the scene and estimating the objects present in it, such as ‘vehicles’, ‘road’, ‘pavements’ and ‘buildings’. This thesis focusses on these aspects and proposes solutions to address them. First, we propose a solution to generate a dense semantic map from multiple street-level images. This map can be imagined as the bird’s eye view of the region with associated semantic labels for ten’s of kilometres of street level data. We generate the overhead semantic view from street level images. This is in contrast to existing approaches using satellite/overhead imagery for classification of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then we describe a method to perform large scale dense 3D reconstruction of road scenes with associated semantic labels. Our method fuses the depth-maps in an online fashion, generated from the stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image sequences. The object class labels estimated from the street level stereo image sequence are used to annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by performing inference over the meshed representation of the scene. By performing labelling over the mesh we solve two issues: Firstly, images often have redundant information with multiple images describing the same scene. Solving these images separately is slow, where our method is approximately a magnitude faster in the inference stage compared to normal inference in the image domain. Secondly, often multiple images, even though they describe the same scene result in inconsistent labelling. By solving a single mesh, we remove the inconsistency of labelling across the images. Also our mesh based labelling takes into account of the object layout in the scene, which is often ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform labelling and structure computation through a hierarchical robust PN Markov Random Field defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and the object-class labels in a principled manner, through bounded approximate minimisation of a well defined and studied energy functional. In this thesis, we also introduce two object labelled datasets created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per camera view of the roadways of the United Kingdom with a subset of them annotated with object class labels and the second dataset is comprised of ground truth object labels for the publicly available KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision research community

    From 3D Point Clouds to Pose-Normalised Depth Maps

    Get PDF
    We consider the problem of generating either pairwise-aligned or pose-normalised depth maps from noisy 3D point clouds in a relatively unrestricted poses. Our system is deployed in a 3D face alignment application and consists of the following four stages: (i) data filtering, (ii) nose tip identification and sub-vertex localisation, (iii) computation of the (relative) face orientation, (iv) generation of either a pose aligned or a pose normalised depth map. We generate an implicit radial basis function (RBF) model of the facial surface and this is employed within all four stages of the process. For example, in stage (ii), construction of novel invariant features is based on sampling this RBF over a set of concentric spheres to give a spherically-sampled RBF (SSR) shape histogram. In stage (iii), a second novel descriptor, called an isoradius contour curvature signal, is defined, which allows rotational alignment to be determined using a simple process of 1D correlation. We test our system on both the University of York (UoY) 3D face dataset and the Face Recognition Grand Challenge (FRGC) 3D data. For the more challenging UoY data, our SSR descriptors significantly outperform three variants of spin images, successfully identifying nose vertices at a rate of 99.6%. Nose localisation performance on the higher quality FRGC data, which has only small pose variations, is 99.9%. Our best system successfully normalises the pose of 3D faces at rates of 99.1% (UoY data) and 99.6% (FRGC data)

    Semantic Validation in Structure from Motion

    Full text link
    The Structure from Motion (SfM) challenge in computer vision is the process of recovering the 3D structure of a scene from a series of projective measurements that are calculated from a collection of 2D images, taken from different perspectives. SfM consists of three main steps; feature detection and matching, camera motion estimation, and recovery of 3D structure from estimated intrinsic and extrinsic parameters and features. A problem encountered in SfM is that scenes lacking texture or with repetitive features can cause erroneous feature matching between frames. Semantic segmentation offers a route to validate and correct SfM models by labelling pixels in the input images with the use of a deep convolutional neural network. The semantic and geometric properties associated with classes in the scene can be taken advantage of to apply prior constraints to each class of object. The SfM pipeline COLMAP and semantic segmentation pipeline DeepLab were used. This, along with planar reconstruction of the dense model, were used to determine erroneous points that may be occluded from the calculated camera position, given the semantic label, and thus prior constraint of the reconstructed plane. Herein, semantic segmentation is integrated into SfM to apply priors on the 3D point cloud, given the object detection in the 2D input images. Additionally, the semantic labels of matched keypoints are compared and inconsistent semantically labelled points discarded. Furthermore, semantic labels on input images are used for the removal of objects associated with motion in the output SfM models. The proposed approach is evaluated on a data-set of 1102 images of a repetitive architecture scene. This project offers a novel method for improved validation of 3D SfM models

    SonoNet: Real-Time Detection and Localisation of Fetal Standard Scan Planes in Freehand Ultrasound

    Get PDF
    Identifying and interpreting fetal standard scan planes during 2D ultrasound mid-pregnancy examinations are highly complex tasks which require years of training. Apart from guiding the probe to the correct location, it can be equally difficult for a non-expert to identify relevant structures within the image. Automatic image processing can provide tools to help experienced as well as inexperienced operators with these tasks. In this paper, we propose a novel method based on convolutional neural networks which can automatically detect 13 fetal standard views in freehand 2D ultrasound data as well as provide a localisation of the fetal structures via a bounding box. An important contribution is that the network learns to localise the target anatomy using weak supervision based on image-level labels only. The network architecture is designed to operate in real-time while providing optimal output for the localisation task. We present results for real-time annotation, retrospective frame retrieval from saved videos, and localisation on a very large and challenging dataset consisting of images and video recordings of full clinical anomaly screenings. We found that the proposed method achieved an average F1-score of 0.798 in a realistic classification experiment modelling real-time detection, and obtained a 90.09% accuracy for retrospective frame retrieval. Moreover, an accuracy of 77.8% was achieved on the localisation task.Comment: 12 pages, 8 figures, published in IEEE Transactions in Medical Imagin

    Quantifying Membrane Topology at the Nanoscale

    Get PDF
    Changes in the shape of cellular membranes are linked with viral replication, Alzheimer\u27s, heart disease and an abundance of other maladies. Some membranous organelles, such as the endoplasmic reticulum and the Golgi, are only 50 nm in diameter. As such, membrane shape changes are conventionally studied with electron microscopy (EM), which preserves cellular ultrastructure and achieves a resolution of 2 nm or better. However, immunolabeling in EM is challenging, and often destroys the cell, making it difficult to study interactions between membranes and other proteins. Additionally, cells must be fixed in EM imaging, making it impossible to study mechanisms of disease. To address these problems, this thesis advances nanoscale imaging and analysis of membrane shape changes and their associated proteins using super-resolution single-molecule localization microscopy. This thesis is divided into three parts. In the first, a novel correlative orientation-independent differential interference contrast (OI-DIC) and single-molecule localization microscopy (SMLM) instrument is designed to address challenges with live-cell imaging of membrane nanostructure. SMLM super-resolution fluorescence techniques image with ~ 20 nm resolution, and are compatible with live-cell imaging. However, due to SMLM\u27s slow imaging speeds, most cell movement is under-sampled. OI-DIC images fast, is gentle enough to be used with living cells and can image cellular structure without labelling, but is diffraction-limited. Combining SMLM with OI-DIC allows for imaging of cellular context that can supplement sparse super-resolution data in real time. The second part of the thesis describes an open-source software package for visualizing and analyzing SMLM data. SMLM imaging yields localization point clouds, which requires non-standard visualization and analysis techniques. Existing techniques are described, and necessary new ones are implemented. These tools are designed to interpret data collected from the OI-DIC/SMLM microscope, as well as from other optical setups. Finally, a tool for extracting membrane structure from SMLM point clouds is described. SMLM data is often noisy, containing multiple localizations per fluorophore and many non-specific localizations. SMLM\u27s resolution reveals labelling discontinuities, which exacerbate sparsity of localizations. It is non-trivial to reconstruct the continuous shape of a membrane from a discrete set of points, and even more difficult in the presence of the noise profile characteristic of most SMLM point clouds. To address this, a surface reconstruction algorithm for extracting continuous surfaces from SMLM data is implemented. This method employs biophysical curvature constraints to improve the accuracy of the surface

    Semantic Slam: A New Paradigm for Object Recognition and Scene Reconstruction

    Get PDF
    Simultaneous localisation and mapping (SLAM) is a technique studied in computer vision and robotics that, given measurements obtained from one or more sensors, allows incremental building of a map of the environment and simultaneous estimation of the position and orientation of the very same sensor used to acquire the input data. Visual SLAM systems typically allow the generation of accurate reconstructions of the explored environment but, until very recently, did not provide high level informations on the contents of the reconstructed scenes, useful to foster high level reasoning by subsequent algorithms. In this thesis we focus on the topic of Semantic SLAM, proposing techniques to obtain semantically accurate reconstructions of the explored environment by combining efficient SLAM systems with state-of-the-art semantic image segmentation algorithms. We show how, by relying on such semantic reconstructions, the accuracy of the localisation phase of a SLAM pipeline can improve, by accounting for the presence of semantic informations during the camera pose estimation step. We thus realise a "semantic loop", where the availability of high level clues betters the mapping process, in turn helping the subsequent localisation phase. A full system, drawing inspiration from the presented research, allowing a real-time and automatic semantic mapping of large-scale environments is then presented. An ancillary, but nevertheless important, component of simultaneous localisation and mapping systems is a technique to allow the estimation of sensor position separately from the main SLAM loop, to recover from failures in the localisation algorithm. We present a technique that, by exploiting the appearance of image patches, can reliably localise the likely position of the sensor used to acquire such images. Such relocalisation system can be easily included in a Semantic SLAM system to allow a more robust mapping process wherein camera tracking failures can be reliably recovered from

    Object-Aware Tracking and Mapping

    Get PDF
    Reasoning about geometric properties of digital cameras and optical physics enabled researchers to build methods that localise cameras in 3D space from a video stream, while – often simultaneously – constructing a model of the environment. Related techniques have evolved substantially since the 1980s, leading to increasingly accurate estimations. Traditionally, however, the quality of results is strongly affected by the presence of moving objects, incomplete data, or difficult surfaces – i.e. surfaces that are not Lambertian or lack texture. One insight of this work is that these problems can be addressed by going beyond geometrical and optical constraints, in favour of object level and semantic constraints. Incorporating specific types of prior knowledge in the inference process, such as motion or shape priors, leads to approaches with distinct advantages and disadvantages. After introducing relevant concepts in Chapter 1 and Chapter 2, methods for building object-centric maps in dynamic environments using motion priors are investigated in Chapter 5. Chapter 6 addresses the same problem as Chapter 5, but presents an approach which relies on semantic priors rather than motion cues. To fully exploit semantic information, Chapter 7 discusses the conditioning of shape representations on prior knowledge and the practical application to monocular, object-aware reconstruction systems

    Advances in Robot Navigation

    Get PDF
    Robot navigation includes different interrelated activities such as perception - obtaining and interpreting sensory information; exploration - the strategy that guides the robot to select the next direction to go; mapping - the construction of a spatial representation by using the sensory information perceived; localization - the strategy to estimate the robot position within the spatial map; path planning - the strategy to find a path towards a goal location being optimal or not; and path execution, where motor actions are determined and adapted to environmental changes. This book integrates results from the research work of authors all over the world, addressing the abovementioned activities and analyzing the critical implications of dealing with dynamic environments. Different solutions providing adaptive navigation are taken from nature inspiration, and diverse applications are described in the context of an important field of study: social robotics
    • …
    corecore