89,848 research outputs found

    Scalable Joint Detection and Segmentation of Surgical Instruments with Weak Supervision

    Get PDF
    Computer vision based models, such as object segmentation, detection and tracking, have the potential to assist surgeons intra-operatively and improve the quality and outcomes of minimally invasive surgery. Different work streams towards instrument detection include segmentation, bounding box localisation and classification. While segmentation models offer much more granular results, bounding box annotations are easier to annotate at scale. To leverage the granularity of segmentation approaches with the scalability of bounding box-based models, a multi-task model for joint bounding box detection and segmentation of surgical instruments is proposed. The model consists of a shared backbone and three independent heads for the tasks of classification, bounding box regression, and segmentation. Using adaptive losses together with simple yet effective weakly-supervised label inference, the proposed model use weak labels to learn to segment surgical instruments with a fraction of the dataset requiring segmentation masks. Results suggest that instrument detection and segmentation tasks share intrinsic challenges and jointly learning from both reduces the burden of annotating masks at scale. Experimental validation shows that the proposed model obtain comparable results to that of single-task state-of-the-art detector and segmentation models, while only requiring a fraction of the dataset to be annotated with masks. Specifically, the proposed model obtained 0.81 weighted average precision (wAP) and 0.73 mean intersection-over-union (IOU) in the Endovis2018 dataset with 1% annotated masks, while performing joint detection and segmentation at more than 20 frames per second

    Pedestrian Attribute Recognition: A Survey

    Full text link
    Recognizing pedestrian attributes is an important task in computer vision community due to it plays an important role in video surveillance. Many algorithms has been proposed to handle this task. The goal of this paper is to review existing works using traditional methods or based on deep learning networks. Firstly, we introduce the background of pedestrian attributes recognition (PAR, for short), including the fundamental concepts of pedestrian attributes and corresponding challenges. Secondly, we introduce existing benchmarks, including popular datasets and evaluation criterion. Thirdly, we analyse the concept of multi-task learning and multi-label learning, and also explain the relations between these two learning algorithms and pedestrian attribute recognition. We also review some popular network architectures which have widely applied in the deep learning community. Fourthly, we analyse popular solutions for this task, such as attributes group, part-based, \emph{etc}. Fifthly, we shown some applications which takes pedestrian attributes into consideration and achieve better performance. Finally, we summarized this paper and give several possible research directions for pedestrian attributes recognition. The project page of this paper can be found from the following website: \url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey: https://sites.google.com/view/ahu-pedestrianattributes

    The More You Know: Using Knowledge Graphs for Image Classification

    Full text link
    One characteristic that sets humans apart from modern learning-based computer vision algorithms is the ability to acquire knowledge about the world and use that knowledge to reason about the visual world. Humans can learn about the characteristics of objects and the relationships that occur between them to learn a large variety of visual concepts, often with few examples. This paper investigates the use of structured prior knowledge in the form of knowledge graphs and shows that using this knowledge improves performance on image classification. We build on recent work on end-to-end learning on graphs, introducing the Graph Search Neural Network as a way of efficiently incorporating large knowledge graphs into a vision classification pipeline. We show in a number of experiments that our method outperforms standard neural network baselines for multi-label classification.Comment: CVPR 201
    • …
    corecore