5,244 research outputs found

    Clothing Co-Parsing by Joint Image Segmentation and Labeling

    Full text link
    This paper aims at developing an integrated system of clothing co-parsing, in order to jointly parse a set of clothing images (unsegmented but annotated with tags) into semantic configurations. We propose a data-driven framework consisting of two phases of inference. The first phase, referred as "image co-segmentation", iterates to extract consistent regions on images and jointly refines the regions over all images by employing the exemplar-SVM (E-SVM) technique [23]. In the second phase (i.e. "region co-labeling"), we construct a multi-image graphical model by taking the segmented regions as vertices, and incorporate several contexts of clothing configuration (e.g., item location and mutual interactions). The joint label assignment can be solved using the efficient Graph Cuts algorithm. In addition to evaluate our framework on the Fashionista dataset [30], we construct a dataset called CCP consisting of 2098 high-resolution street fashion photos to demonstrate the performance of our system. We achieve 90.29% / 88.23% segmentation accuracy and 65.52% / 63.89% recognition rate on the Fashionista and the CCP datasets, respectively, which are superior compared with state-of-the-art methods.Comment: 8 pages, 5 figures, CVPR 201

    Object-Aware Tracking and Mapping

    Get PDF
    Reasoning about geometric properties of digital cameras and optical physics enabled researchers to build methods that localise cameras in 3D space from a video stream, while – often simultaneously – constructing a model of the environment. Related techniques have evolved substantially since the 1980s, leading to increasingly accurate estimations. Traditionally, however, the quality of results is strongly affected by the presence of moving objects, incomplete data, or difficult surfaces – i.e. surfaces that are not Lambertian or lack texture. One insight of this work is that these problems can be addressed by going beyond geometrical and optical constraints, in favour of object level and semantic constraints. Incorporating specific types of prior knowledge in the inference process, such as motion or shape priors, leads to approaches with distinct advantages and disadvantages. After introducing relevant concepts in Chapter 1 and Chapter 2, methods for building object-centric maps in dynamic environments using motion priors are investigated in Chapter 5. Chapter 6 addresses the same problem as Chapter 5, but presents an approach which relies on semantic priors rather than motion cues. To fully exploit semantic information, Chapter 7 discusses the conditioning of shape representations on prior knowledge and the practical application to monocular, object-aware reconstruction systems

    Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions

    Dense Vision in Image-guided Surgery

    Get PDF
    Image-guided surgery needs an efficient and effective camera tracking system in order to perform augmented reality for overlaying preoperative models or label cancerous tissues on the 2D video images of the surgical scene. Tracking in endoscopic/laparoscopic scenes however is an extremely difficult task primarily due to tissue deformation, instrument invasion into the surgical scene and the presence of specular highlights. State of the art feature-based SLAM systems such as PTAM fail in tracking such scenes since the number of good features to track is very limited. When the scene is smoky and when there are instrument motions, it will cause feature-based tracking to fail immediately. The work of this thesis provides a systematic approach to this problem using dense vision. We initially attempted to register a 3D preoperative model with multiple 2D endoscopic/laparoscopic images using a dense method but this approach did not perform well. We subsequently proposed stereo reconstruction to directly obtain the 3D structure of the scene. By using the dense reconstructed model together with robust estimation, we demonstrate that dense stereo tracking can be incredibly robust even within extremely challenging endoscopic/laparoscopic scenes. Several validation experiments have been conducted in this thesis. The proposed stereo reconstruction algorithm has turned out to be the state of the art method for several publicly available ground truth datasets. Furthermore, the proposed robust dense stereo tracking algorithm has been proved highly accurate in synthetic environment (< 0.1 mm RMSE) and qualitatively extremely robust when being applied to real scenes in RALP prostatectomy surgery. This is an important step toward achieving accurate image-guided laparoscopic surgery.Open Acces

    Quantitative analysis of microscopy

    Get PDF
    Particle tracking is an essential tool for the study of dynamics of biological processes. The dynamics of these processes happens in three-dimensional (3D) space as the biological structures themselves are 3D. The focus of this thesis is on the development of single particle tracking methods for analysis of the dynamics of biological processes through the use of image processing techniques. Firstly, introduced is a novel particle tracking method that works with two-dimensional (2D) image data. This method uses the theory of Haar-like features for particle detection and trajectory linking is achieved using a combination of three Kalman filters within an interacting multiple models framework. The trajectory linking process utilises an extended state space variable which better describe the morphology and intensity profiles of the particles under investigation at their current position. This tracking method is validated using both 2D synthetically generated images as well as 2D experimentally collected images. It is shown that this method outperforms 14 other stateof-the-art methods. Next this method is used to analyse the dynamics of fluorescently labelled particles using a live-cell fluorescence microscopy technique, specifically a variant of the super-resolution (SR) method PALM, spt-PALM. From this application, conclusions about the organisation of the proteins under investigation at the cell membrane are drawn. Introduced next is a second particle tracking method which is highly efficient and capable of working with both 2D and 3D image data. This method uses a novel Haar-inspired feature for particle detection, drawing inspiration from the type of particles to be detected which are typically circular in 2D space and spherical in 3D image space. Trajectory linking in this method utilises a global nearest neighbour methodology incorporating both motion models to describe the motion of the particles under investigation and a further extended state space variable describing many more aspects of the particles to be linked. This method is validated using a variety of both 2D and 3D synthetic image data. The methods performance is compared with 14 other state-of-the-art methods showing it to be one of the best overall performing methods. Finally, analysis tools to study a SR image restoration method developed by our research group, referred to as Translation Microscopy (TRAM) are investigated [1]. TRAM can be implemented on any standardised microscope and deliver an improvement in resolution of up to 7-fold. However, the results from TRAM and other SR imaging methods require specialised tools to validate and analyse them. Tools have been developed to validate that TRAM performs correctly using a specially designed ground truth. Furthermore, through analysis of results on a biological sample corroborate other published results based on the size of biological structures, showing again that TRAM performs as expected.EPSC

    Hand tracking using a quadric surface model and Bayesian filtering

    Get PDF
    Within this paper a technique for model-based 3D hand tracking is presented. A hand model is built from a set of truncated quadrics, approximating the anatomy of a real hand with few parameters. Given that the projection of a quadric onto the image plane is a conic, the contours can be generated efficiently. These model contours are used as shape templates to evaluate possible matches in the current frame. The evaluation is done within a hierarchical Bayesian filtering framework, where the posterior distribution is computed efficiently using a tree of templates. We demonstrate the effectiveness of the technique by using it for tracking 3D articulated and non-rigid hand motion from monocular video sequences in front of a cluttered background

    Super-resolution of 3-dimensional scenes

    Full text link
    Super-resolution is an image enhancement method that increases the resolution of images and video. Previously this technique could only be applied to 2D scenes. The super-resolution algorithm developed in this thesis creates high-resolution views of 3-dimensional scenes, using low-resolution images captured from varying, unknown positions

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Probabilistic multi-modal depth estimation based on camera–LiDAR sensor fusion

    Get PDF
    Multi-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There have been outstanding advances in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution, or LiDAR sensors, due to the precise geometric data they provide. However, each of these suffers from some inherent drawbacks, such as high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate for the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They process the sensor data streams independently and combine the high-level estimates obtained for each sensor. In this paper, we tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher-level estimation problems. This work proposes a conditional random field model with multiple geometry and appearance potentials. It seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the conjugate gradient squared algorithm. The proposed method was evaluated and compared with the state of the art using the commonly used KITTI benchmark dataset
    • …
    corecore