15 research outputs found

    Detection-aided medical image segmentation using deep learning

    Get PDF
    The details of the work will be defined once the student reaches the destination institution.A fully automatic technique for segmenting the liver and localizing its unhealthy tissues is a convenient tool in order to diagnose hepatic diseases and also to assess the response to the according treatments. In this thesis we propose a method to segment the liver and its lesions from Computed Tomography (CT) scans, as well as other anatomical structures and organs of the human body. We have used Convolutional Neural Networks (CNNs), that have proven good results in a variety of tasks, including medical imaging. The network to segment the lesions consists of a cascaded architecture, which first focuses on the liver region in order to segment the lesion. Moreover, we train a detector to localize the lesions and just keep those pixels from the output of the segmentation network where a lesion is detected. The segmentation architecture is based on DRIU (Maninis, 2016), a Fully Convolutional Network (FCN) with side outputs that work at feature maps of different resolutions, to finally benefit from the multi-scale information learned by different stages of the network. Our pipeline is 2.5D, as the input of the network is a stack of consecutive slices of the CT scans. We also study different methods to benefit from the liver segmentation in order to delineate the lesion. The main focus of this work is to use the detector to localize the lesions, as we demonstrate that it helps to remove false positives triggered by the segmentation network. The benefits of using a detector on top of the segmentation is that the detector acquires a more global insight of the healthiness of a liver tissue compared to the segmentation network, whose final output is pixel-wise and is not forced to take a global decision over a whole liver patch. We show experiments with the LiTS dataset for the lesion and liver segmentation. In order to prove the generality of the segmentation network, we also segment several anatomical structures from the Visceral dataset

    Hierarchical object detection with deep reinforcement learning

    Get PDF
    We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis. We compare two different candidate proposal strategies to guide the object search: with and without overlap. Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal. Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.Postprint (published version

    Budget-aware semi-supervised semantic and instance segmentation

    Get PDF
    Methods that move towards less supervised scenarios are key for image segmentation, as dense labels demand significant human intervention. Generally, the annotation burden is mitigated by labeling datasets with weaker forms of supervision, e.g. image-level labels or bounding boxes. Another option are semi-supervised settings, that commonly leverage a few strong annotations and a huge number of unlabeled/weakly-labeled data. In this paper, we revisit semi-supervised segmentation schemes and narrow down significantly the annotation budget (in terms of total labeling time of the training set) compared to previous approaches. With a very simple pipeline, we demonstrate that at low annotation budgets, semi-supervised methods outperform by a wide margin weakly-supervised ones for both semantic and instance segmentation. Our approach also outperforms previous semi-supervised works at a much reduced labeling cost. We present results for the Pascal VOC benchmark and unify weakly and semi-supervised ap- proaches by considering the total annotation budget, thus allowing a fairer comparison between methods.Peer ReviewedPostprint (author's final draft

    RVOS: end-to-end recurrent network for video object segmentation

    Get PDF
    Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) the temporal, which allows to keep the coherence of the segmented objects along time. We train RVOS for zero-shot video object segmentation and are the first ones to report quantitative results for DAVIS-2017 and YouTube-VOS benchmarks. Further, we adapt RVOS for one-shot video object segmentation by using the masks obtained in previous time steps as inputs to be processed by the recurrent module. Our model reaches comparable results to state-of-the-art techniques in YouTube-VOS benchmark and outperforms all previous video object segmentation methods not using online learning in the DAVIS-2017 benchmark. Moreover, our model achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.Peer ReviewedPostprint (published version

    A closer look at referring expressions for video object segmentation

    Get PDF
    The task of Language-guided Video Object Segmentation (LVOS) aims at generating binary masks for an object referred by a linguistic expression. When this expression unambiguously describes an object in the scene, it is named referring expression (RE). Our work argues that existing benchmarks used for LVOS are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the referring expressions in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, where the non-trivial REs are further annotated with seven RE semantic categories. We leverage these data to analyze the performance of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for LVOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was partially supported by the projects PID2019-107255GB-C22 and PID2020-117142GB-I00 funded by MCIN/ AEI /10.13039/501100011033 Spanish Ministry of Science, and the grant 2017-SGR-1414 of the Government of Catalonia. This work was also partially supported by the project RTI2018-095232-B-C22 funded by the Spanish Ministry of Science, Innovation and Universities.Peer ReviewedPostprint (published version

    The Liver Tumor Segmentation Benchmark (LiTS)

    Get PDF
    In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in http://medicaldecathlon.com/. In addition, both data and online evaluation are accessible via https://competitions.codalab.org/competitions/17094.Bjoern Menze is supported through the DFG funding (SFB 824, subproject B12) and a Helmut-Horten-Professorship for Biomedical Informatics by the Helmut-Horten-Foundation. Florian Kofler is Supported by Deutsche Forschungsgemeinschaft (DFG) through TUM International Graduate School of Science and Engineering (IGSSE), GSC 81. An Tang was supported by the Fonds de recherche du Québec en Santé and Fondation de l’association des radiologistes du Québec (FRQS- ARQ 34939 Clinical Research Scholarship – Junior 2 Salary Award). Hongwei Bran Li is supported by Forschungskredit (Grant NO. FK-21- 125) from University of Zurich.Peer ReviewedArticle signat per 109 autors/es: Patrick Bilic 1,a,b, Patrick Christ 1,a,b, Hongwei Bran Li 1,2,∗,b, Eugene Vorontsov 3,a,b, Avi Ben-Cohen 5,a, Georgios Kaissis 10,12,15,a, Adi Szeskin 18,a, Colin Jacobs 4,a, Gabriel Efrain Humpire Mamani 4,a, Gabriel Chartrand 26,a, Fabian Lohöfer 12,a, Julian Walter Holch 29,30,69,a, Wieland Sommer 32,a, Felix Hofmann 31,32,a, Alexandre Hostettler 36,a, Naama Lev-Cohain 38,a, Michal Drozdzal 34,a, Michal Marianne Amitai 35,a, Refael Vivanti 37,a, Jacob Sosna 38,a, Ivan Ezhov 1, Anjany Sekuboyina 1,2, Fernando Navarro 1,76,78, Florian Kofler 1,13,57,78, Johannes C. Paetzold 15,16, Suprosanna Shit 1, Xiaobin Hu 1, Jana Lipková 17, Markus Rempfler 1, Marie Piraud 57,1, Jan Kirschke 13, Benedikt Wiestler 13, Zhiheng Zhang 14, Christian Hülsemeyer 1, Marcel Beetz 1, Florian Ettlinger 1, Michela Antonelli 9, Woong Bae 73, Míriam Bellver 43, Lei Bi 61, Hao Chen 39, Grzegorz Chlebus 62,64, Erik B. Dam 72, Qi Dou 41, Chi-Wing Fu 41, Bogdan Georgescu 60, Xavier Giró-i-Nieto 45, Felix Gruen 28, Xu Han 77, Pheng-Ann Heng 41, Jürgen Hesser 48,49,50, Jan Hendrik Moltz 62, Christian Igel 72, Fabian Isensee 69,70, Paul Jäger 69,70, Fucang Jia 75, Krishna Chaitanya Kaluva 21, Mahendra Khened 21, Ildoo Kim 73, Jae-Hun Kim 53, Sungwoong Kim 73, Simon Kohl 69, Tomasz Konopczynski 49, Avinash Kori 21, Ganapathy Krishnamurthi 21, Fan Li 22, Hongchao Li 11, Junbo Li 8, Xiaomeng Li 40, John Lowengrub 66,67,68, Jun Ma 54, Klaus Maier-Hein 69,70,7, Kevis-Kokitsi Maninis 44, Hans Meine 62,65, Dorit Merhof 74, Akshay Pai 72, Mathias Perslev 72, Jens Petersen 69, Jordi Pont-Tuset 44, Jin Qi 56, Xiaojuan Qi 40, Oliver Rippel 74, Karsten Roth 47, Ignacio Sarasua 51,12, Andrea Schenk 62,63, Zengming Shen 59,60, Jordi Torres 46,43, Christian Wachinger 51,12,1, Chunliang Wang 42, Leon Weninger 74, Jianrong Wu 25, Daguang Xu 71, Xiaoping Yang 55, Simon Chun-Ho Yu 58, Yading Yuan 52, Miao Yue 20, Liping Zhang 58, Jorge Cardoso 9, Spyridon Bakas 19,23,24, Rickmer Braren 6,12,30,a, Volker Heinemann 33,a, Christopher Pal 3,a, An Tang 27,a, Samuel Kadoury 3,a, Luc Soler 36,a, Bram van Ginneken 4,a, Hayit Greenspan 5,a, Leo Joskowicz 18,a, Bjoern Menze 1,2,a // 1 Department of Informatics, Technical University of Munich, Germany; 2 Department of Quantitative Biomedicine, University of Zurich, Switzerland; 3 Ecole Polytechnique de Montréal, Canada; 4 Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands; 5 Department of Biomedical Engineering, Tel-Aviv University, Israel; 6 German Cancer Consortium (DKTK), Germany; 7 Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany; 8 Philips Research China, Philips China Innovation Campus, Shanghai, China; 9 School of Biomedical Engineering & Imaging Sciences, King’s College London, London, UK; 10 Institute for AI in Medicine, Technical University of Munich, Germany; 11 Department of Computer Science, Guangdong University of Foreign Studies, China; 12 Institute for diagnostic and interventional radiology, Klinikum rechts der Isar, Technical University of Munich, Germany; 13 Institute for diagnostic and interventional neuroradiology, Klinikum rechts der Isar,Technical University of Munich, Germany; 14 Department of Hepatobiliary Surgery, the Affiliated Drum Tower Hospital of Nanjing University Medical School, China; 15 Department of Computing, Imperial College London, London, United Kingdom; 16 Institute for Tissue Engineering and Regenerative Medicine, Helmholtz Zentrum München, Neuherberg, Germany; 17 Brigham and Women’s Hospital, Harvard Medical School, USA; 18 School of Computer Science and Engineering, the Hebrew University of Jerusalem, Israel; 19 Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, PA, USA; 20 CGG Services (Singapore) Pte. Ltd., Singapore; 21 Medical Imaging and Reconstruction Lab, Department of Engineering Design, Indian Institute of Technology Madras, India; 22 Sensetime, Shanghai, China; 23 Department of Radiology, Perelman School of Medicine, University of Pennsylvania, USA; 24 Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA, USA; 25 Tencent Healthcare (Shenzhen) Co., Ltd, China; 26 The University of Montréal Hospital Research Centre (CRCHUM) Montréal, Québec, Canada; 27 Department of Radiology, Radiation Oncology and Nuclear Medicine, University of Montréal, Canada; 28 Institute of Control Engineering, Technische Universität Braunschweig, Germany; 29 Department of Medicine III, University Hospital, LMU Munich, Munich, Germany; 30 Comprehensive Cancer Center Munich, Munich, Germany; 31 Department of General, Visceral and Transplantation Surgery, University Hospital, LMU Munich, Germany; 32 Department of Radiology, University Hospital, LMU Munich, Germany; 33 Department of Hematology/Oncology & Comprehensive Cancer Center Munich, LMU Klinikum Munich, Germany; 34 Polytechnique Montréal, Mila, QC, Canada; 35 Department of Diagnostic Radiology, Sheba Medical Center, Tel Aviv university, Israel; 36 Department of Surgical Data Science, Institut de Recherche contre les Cancers de l’Appareil Digestif (IRCAD), France; 37 Rafael Advanced Defense System, Israel; 38 Department of Radiology, Hadassah University Medical Center, Jerusalem, Israel; 39 Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, China; 40 Department of Electrical and Electronic Engineering, The University of Hong Kong, China; 41 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China; 42 Department of Biomedical Engineering and Health Systems, KTH Royal Institute of Technology, Sweden; 43 Barcelona Supercomputing Center, Barcelona, Spain; 44 Eidgenössische Technische Hochschule Zurich (ETHZ), Zurich, Switzerland; 45 Signal Theory and Communications Department, Universitat Politecnica de Catalunya, Catalonia, Spain; 46 Universitat Politecnica de Catalunya, Catalonia, Spain; 47 University of Tuebingen, Germany; 48 Mannheim Institute for Intelligent Systems in Medicine, department of Medicine Mannheim, Heidelberg University, Germany; 49 Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Germany; 50 Central Institute for Computer Engineering (ZITI), Heidelberg University, Germany; 51 Department of Child and Adolescent Psychiatry, Ludwig-Maximilians-Universität, Munich, Germany; 52 Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, NY, USA; 53 Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, South Korea; 54 Department of Mathematics, Nanjing University of Science and Technology, China; 55 Department of Mathematics, Nanjing University, China; 56 School of Information and Communication Engineering, University of Electronic Science and Technology of China, China; 57 Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany; 58 Department of Imaging and Interventional Radiology, Chinese University of Hong Kong, Hong Kong, China; 59 Beckman Institute, University of Illinois at Urbana-Champaign, USA; 60 Siemens Healthineers, USA; 61 School of Computer Science, the University of Sydney, Australia; 62 Fraunhofer MEVIS, Bremen, Germany; 63 Institute for Diagnostic and Interventional Radiology, Hannover Medical School, Hannover, Germany; 64 Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands; 65 Medical Image Computing Group, FB3, University of Bremen, Germany; 66 Departments of Mathematics, Biomedical Engineering, University of California, Irvine, USA; 67 Center for Complex Biological Systems, University of California, Irvine, USA; 68 Chao Family Comprehensive Cancer Center, University of California, Irvine, USA; 69 Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany; 70 Helmholtz Imaging, Germany; 71 NVIDIA, Santa Clara, CA, USA; 72 Department of Computer Science, University of Copenhagen, Denmark; 73 Kakao Brain, Republic of Korea; 74 Institute of Imaging & Computer Vision, RWTH Aachen University, Germany; 75 Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China; 76 Department of Radiation Oncology and Radiotherapy, Klinikum rechts der Isar, Technical University of Munich, Germany; 77 Department of computer science, UNC Chapel Hill, USA; 78 TranslaTUM - Central Institute for Translational Cancer Research, Technical University of Munich, GermanyPostprint (published version

    Recurrent instance segmentation

    No full text

    Recurrent instance segmentation

    No full text

    Image and video object segmentation in low supervision scenarios

    Get PDF
    Computer vision plays a key role in Artificial Intelligence because of the rich semantic information contained in pixels and the ubiquity of cameras nowadays. Multimedia content is on a rise since social networks have such a strong impact in our society and access to the internet becomes more widespread. This context allows the gathering of large datasets which have fostered great advancements in the computer vision field thanks to deep neural networks. These models can effectively exploit large amounts of data to reach a high expressive power. Since the breakout of Imagenet, a large dataset for image classification, most computer vision tasks have benefited from deep neural networks. Among the different tasks in the computer vision field, locating objects in images and videos is a central one, as it has many applications in autonomous driving, surveillance, image and video edition, medical diagnosis and biometrics along with others. Localization of objects can be obtained with bounding boxes around the target objects, or with accurate pixel-level masks that delineate the instances. The latter is a more challenging task, but fundamental for certain applications where edges of objects need to be determined. The main task addressed in this thesis is instance segmentation, that consists in, given an image or video, providing pixel-level masks for each instance of certain semantic object classes.  In order to train a segmentation model, current solutions rely on large amounts of pixel-wise annotations, which demand significant human effort to collect. Furthermore, expert knowledge is needed to gather certain annotations, such as labels for medical images. In consequence, there is a huge interest for systems that work with less-demanding forms of supervision, such as weakly or semi-supervised pipelines.  Besides, in some segmentation tasks, human effort is not only needed for training the models, but also at inference. In semi-automatic systems, user input may be required as guidance to start the system. One example is the task of one-shot Video Object Segmentation (osVOS), which expects that the end-user provides a pixel-level mask for each object to be tracked in the first frame of the video. Following, the model must predict the segmentation mask of the tracked objects for the remaining frames. These initialization cues are crucial for high accuracy, but they are arduous to obtain. An alternative are models that depend on weaker input signals that are user-friendlier. This thesis explores different supervision scenarios for the instance segmentation task, distinguishing between supervision during training and at inference, and focusing on low-supervision setups. In the first part of the thesis we present a novel recurrent architecture for video object segmentation that is end-to-end trainable in a fully-supervised setup, and that does not require any post-processing step, i.e., the output of the model directly solves the addressed task. The second part of the thesis aims at lowering the annotation cost, in terms of labeling time, needed to train image segmentation models. We explore semi-supervised pipelines and show results when a very limited budget is available. The third part of the dissertation attempts to alleviate the supervision required by semi-automatic systems at inference time. Particularly, we focus on semi-supervised video object segmentation, which typically requires generating a binary mask for each instance to be tracked. In contrast, we present a model for language-guided video object segmentation, which identifies the object to segment with a natural language expression. We study current benchmarks, propose a novel categorization of referring expressions for video, and identify the main challenges posed by the video task.La visió per computador té un paper clau en la intel·ligència artificial pel ric contingut semàntic dels píxels i la ubiqüitat de càmeres avui en dia. El contingut multimèdia creix exponencialment degut a què les xarxes socials tenen una gran influència en la nostra societat, i a que l’accés a l’internat és cada cop més generalitzat. Aquesta contextura permet la recol·lecció de grans bases de dades que impulsen avenços gràcies a les xarxes d’aprenentatge profund. Des de generació d’Imagenet, una base de dades de gran escala pel problema de classificació d’imatges, moltes tasques de visió s’han beneficiat de les xarxes d’aprenentatge profund. D’entre les diferents tasques de visió, localitzar objectes en imatges i vídeos és de les més rellevants, ja que té moltes aplicacions en àmbits com la conducció autònoma, la videovigilància, l’edició d’imatges i vídeos, el diagnòstic mèdic per imatge, i aplicacions biomètriques d’entre altres. La localització d’objectes es pot resoldre amb el que popularment s'anomenen bounding boxes, o amb segmentacions a nivell de píxel, sent aquesta segona una tasca més complexa. En aquesta tesis investiguem la segmentació d’instàncies, que consisteix en, a partir d’una imatge o un vídeo, predir segmentacions de cada instància que hi apareix de certes categories semàntiques. Per tal d’entrenar un model de segmentació, les solucions actuals entrenen amb grans bases de dades que tenen anotacions a nivell de píxel. Aquests anotacions són molt costoses d’obtenir ja que es requereix molt temps de feina manual. A més, per certes anotacions és necessària la intervenció d’experts, com per exemple per imatges mèdiques. Per aquests motius hi ha molt interès tant en sistemes que es puguin entrenar amb formes més senzilles d’anotacions com en sistemes semi-supervisats. En algunes tasques l’esforç d’anotar no només es requereix per les dades d’entrenament, sinó que també és necessari en fase de testeig. En certs models semi-automàtics l’usuari ha d’introduir alguna mena d’anotació per tal de què funcioni el sistema. Un exemple d’aquest tipus de tasca és la segmentació semi-supervisada d'objectes en vídeos, on es necessita com a entrada al sistema una màscara a nivell de píxel per cada objecte que es vulgui segmentar en la primera imatge del vídeo. A continuació, el model prediu segmentacions per la resta del vídeo. Aquesta inicialització és imprescindible per obtenir màscares precises, però és molt costosa. Una alternativa és treballar amb senyals més fàcils d’obtenir. Aquesta tesis explora diferents nivells de supervisió per la tasca de segmentació d’instàncies, diferenciant entre supervisió en entrenament i en inferència. En concret, el nostre objectiu és treballar amb poca supervisió. En la primera part de la tesis presentem una arquitectura recurrent per la tasca de segmentació d’objectes en vídeos, que es pot entrenar end-to-end de forma totalment supervisada, i que no requereix cap post-processament, és a dir, la sortida del model directament solventa la tasca final. La segona part es centra en reduir el cost d’anotacions de bases de dades per entrenar models de segmentació d’imatges. Explorem arquitectures semi-supervisades i presentem resultats quan només es disposa d’un pressupost d’anotació molt limitat. La tercera part de la tesis es centra en reduir el nivell de supervisió en sistemes semi-automàtics en inferència. En concret, investiguem la tasca de segmentació semi-supervisada d’objectes en vídeos, tasca que tradicionalment requereix que l’usuari indiqui quins objectes cal segmentar amb màscares a nivell de píxel. En canvi, nosaltres presentem un model que utilitza el llenguatge natural. Estudiem les bases de dades actuals i proposem una categorització d’expressions de llenguatge per tal d’identificar els majors reptes.Postprint (published version