41 research outputs found

    Dynamically balanced online random forests for interactive scribble-based segmentation

    Get PDF
    Interactive scribble-and-learning-based segmentation is attractive for its good performance and reduced number of user interaction. Scribbles for foreground and background are often imbalanced. With the arrival of new scribbles,the imbalance ratio may change largely. Failing to deal with imbalanced training data and a changing imbalance ratio may lead to a decreased sensitivity and accuracy for segmentation. We propose a generic Dynamically Balanced Online Random Forest (DyBa ORF) to deal with these problems,with a combination of a dynamically balanced online Bagging method and a tree growing and shrinking strategy to update the random forests. We validated DyBa ORF on UCI machine learning data sets and applied it to two different clinical applications: 2D segmentation of the placenta from fetal MRI and adult lungs from radiographic images. Experiments show it outperforms traditional ORF in dealing with imbalanced data with a changing imbalance ratio,while maintaining a comparable accuracy and a higher efficiency compared with its offline counterpart. Our results demonstrate that DyBa ORF is more suitable than existing ORF for learning-based interactive image segmentation

    ECONet: Efficient Convolutional Online Likelihood Network for Scribble-based Interactive Segmentation

    Full text link
    Automatic segmentation of lung lesions associated with COVID-19 in CT images requires large amount of annotated volumes. Annotations mandate expert knowledge and are time-intensive to obtain through fully manual segmentation methods. Additionally, lung lesions have large inter-patient variations, with some pathologies having similar visual appearance as healthy lung tissues. This poses a challenge when applying existing semi-automatic interactive segmentation techniques for data labelling. To address these challenges, we propose an efficient convolutional neural networks (CNNs) that can be learned online while the annotator provides scribble-based interaction. To accelerate learning from only the samples labelled through user-interactions, a patch-based approach is used for training the network. Moreover, we use weighted cross-entropy loss to address the class imbalance that may result from user-interactions. During online inference, the learned network is applied to the whole input volume using a fully convolutional approach. We compare our proposed method with state-of-the-art using synthetic scribbles and show that it outperforms existing methods on the task of annotating lung lesions associated with COVID-19, achieving 16% higher Dice score while reducing execution time by 3×\times and requiring 9000 lesser scribbles-based labelled voxels. Due to the online learning aspect, our approach adapts quickly to user input, resulting in high quality segmentation labels. Source code for ECONet is available at: https://github.com/masadcv/ECONet-MONAILabel.Comment: Accepted at MIDL 202

    Minimally Interactive Segmentation with Application to Human Placenta in Fetal MR Images

    Get PDF
    Placenta segmentation from fetal Magnetic Resonance (MR) images is important for fetal surgical planning. However, accurate segmentation results are difficult to achieve for automatic methods, due to sparse acquisition, inter-slice motion, and the widely varying position and shape of the placenta among pregnant women. Interactive methods have been widely used to get more accurate and robust results. A good interactive segmentation method should achieve high accuracy, minimize user interactions with low variability among users, and be computationally fast. Exploiting recent advances in machine learning, I explore a family of new interactive methods for placenta segmentation from fetal MR images. I investigate the combination of user interactions with learning from a single image or a large set of images. For learning from a single image, I propose novel Online Random Forests to efficiently leverage user interactions for the segmentation of 2D and 3D fetal MR images. I also investigate co-segmentation of multiple volumes of the same patient with 4D Graph Cuts. For learning from a large set of images, I first propose a deep learning-based framework that combines user interactions with Convolutional Neural Networks (CNN) based on geodesic distance transforms to achieve accurate segmentation and good interactivity. I then propose image-specific fine-tuning to make CNNs adaptive to different individual images and able to segment previously unseen objects. Experimental results show that the proposed algorithms outperform traditional interactive segmentation methods in terms of accuracy and interactivity. Therefore, they might be suitable for segmentation of the placenta in planning systems for fetal and maternal surgery, and for rapid characterization of the placenta by MR images. I also demonstrate that they can be applied to the segmentation of other organs from 2D and 3D images

    Knee cartilage segmentation using multi purpose interactive approach

    Get PDF
    Interactive model incorporates expert interpretation and automated segmentation. However, cartilage has complicated structure, indistinctive tissue contrast in magnetic resonance image of knee hardens image review and existing interactive methods are sensitive to various technical problems such as bi-label segmentation problem, shortcut problem and sensitive to image noise. Moreover, redundancy issue caused by non-cartilage labelling has never been tackled. Therefore, Bi-Bezier Curve Contrast Enhancement is developed to improve visual quality of magnetic resonance image by considering brightness preservation and contrast enhancement control. Then, Multipurpose Interactive Tool is developed to handle users’ interaction through Label Insertion Point approach. Approximate NonCartilage Labelling system is developed to generate computerized non-cartilage label, while preserves cartilage for expert labelling. Both computerized and interactive labels initialize Random Walks based segmentation model. To evaluate contrast enhancement techniques, Measure of Enhancement (EME), Absolute Mean Brightness Error (AMBE) and Feature Similarity Index (FSIM) are used. The results suggest that Bi-Bezier Curve Contrast Enhancement outperforms existing methods in terms of contrast enhancement control (EME = 41.44±1.06), brightness distortion (AMBE = 14.02±1.29) and image quality (FSIM = 0.92±0.02). Besides, implementation of Approximate Non-Cartilage Labelling model has demonstrated significant efficiency improvement in segmenting normal cartilage (61s±8s, P = 3.52 x 10-5) and diseased cartilage (56s±16s, P = 1.4 x 10-4). Finally, the proposed labelling model has high Dice values (Normal: 0.94±0.022, P = 1.03 x 10-9; Abnormal: 0.92±0.051, P = 4.94 x 10-6) and is found to be beneficial to interactive model (+0.12)

    Generalizations of the Multicut Problem for Computer Vision

    Get PDF
    Graph decomposition has always been a very important concept in machine learning and computer vision. Many tasks like image and mesh segmentation, community detection in social networks, as well as object tracking and human pose estimation can be formulated as a graph decomposition problem. The multicut problem in particular is a popular model to optimize for a decomposition of a given graph. Its main advantage is that no prior knowledge about the number of components or their sizes is required. However, it has several limitations, which we address in this thesis: Firstly, the multicut problem allows to specify only cost or reward for putting two direct neighbours into distinct components. This limits the expressibility of the cost function. We introduce special edges into the graph that allow to define cost or reward for putting any two vertices into distinct components, while preserving the original set of feasible solutions. We show that this considerably improves the quality of image and mesh segmentations. Second, multicut is notorious to be NP-hard for general graphs, that limits its applications to small super-pixel graphs. We define and implement two primal feasible heuristics to solve the problem. They do not provide any guarantees on the runtime or quality of solutions, but in practice show good convergence behaviour. We perform an extensive comparison on multiple graphs of different sizes and properties. Third, we extend the multicut framework by introducing node labels, so that we can jointly optimize for graph decomposition and nodes classification by means of exactly the same optimization algorithm, thus eliminating the need to hand-tune optimizers for a particular task. To prove its universality we applied it to diverse computer vision tasks, including human pose estimation, multiple object tracking, and instance-aware semantic segmentation. We show that we can improve the results over the prior art using exactly the same data as in the original works. Finally, we use employ multicuts in two applications: 1) a client-server tool for interactive video segmentation: After the pre-processing of the video a user draws strokes on several frames and a time-coherent segmentation of the entire video is performed on-the-fly. 2) we formulate a method for simultaneous segmentation and tracking of living cells in microscopy data. This task is challenging as cells split and our algorithm accounts for this, creating parental hierarchies. We also present results on multiple model fitting. We find models in data heavily corrupted by noise by finding components defining these models using higher order multicuts. We introduce an interesting extension that allows our optimization to pick better hyperparameters for each discovered model. In summary, this thesis extends the multicut problem in different directions, proposes algorithms for optimization, and applies it to novel data and settings.Die Zerlegung von Graphen ist ein sehr wichtiges Konzept im maschinellen Lernen und maschinellen Sehen. Viele Aufgaben wie Bild- und Gittersegmentierung, KommunitĂ€tserkennung in sozialen Netzwerken, sowie Objektverfolgung und SchĂ€tzung von menschlichen Posen können als Graphzerlegungsproblem formuliert werden. Der Mehrfachschnitt-Ansatz ist ein populĂ€res Mittel um ĂŒber die Zerlegungen eines gegebenen Graphen zu optimieren. Sein grĂ¶ĂŸter Vorteil ist, dass kein Vorwissen ĂŒber die Anzahl an Komponenten und deren GrĂ¶ĂŸen benötigt wird. Dennoch hat er mehrere ernsthafte Limitierungen, welche wir in dieser Arbeit behandeln: Erstens erlaubt der klassische Mehrfachschnitt nur die Spezifikation von Kosten oder Belohnungen fĂŒr die Trennung von zwei Nachbarn in verschiedene Komponenten. Dies schrĂ€nkt die AusdrucksfĂ€higkeit der Kostenfunktion ein und fĂŒhrt zu suboptimalen Ergebnissen. Wir fĂŒgen dem Graphen spezielle Kanten hinzu, welche es erlauben, Kosten oder Belohnungen fĂŒr die Trennung von beliebigen Paaren von Knoten in verschiedene Komponenten zu definieren, ohne die Menge an zulĂ€ssigen Lösungen zu verĂ€ndern. Wir zeigen, dass dies die QualitĂ€t von Bild- und Gittersegmentierungen deutlich verbessert. Zweitens ist das Mehrfachschnittproblem berĂŒchtigt dafĂŒr NP-schwer fĂŒr allgemeine Graphen zu sein, was die Anwendungen auf kleine superpixel-basierte Graphen einschrĂ€nkt. Wir definieren und implementieren zwei primal-zulĂ€ssige Heuristiken um das Problem zu lösen. Diese geben keine Garantien bezĂŒglich der Laufzeit oder der QualitĂ€t der Lösungen, zeigen in der Praxis jedoch gutes Konvergenzverhalten. Wir fĂŒhren einen ausfĂŒhrlichen Vergleich auf vielen Graphen verschiedener GrĂ¶ĂŸen und Eigenschaften durch. Drittens erweitern wir den Mehrfachschnitt-Ansatz um Knoten-Kennzeichnungen, sodass wir gemeinsam ĂŒber Zerlegungen und Knoten-Klassifikationen mit dem gleichen Optimierungs-Algorithmus optimieren können. Dadurch wird der Bedarf der Feinabstimmung einzelner aufgabenspezifischer Löser aus dem Weg gerĂ€umt. Um die AllgemeingĂŒltigkeit dieses Ansatzes zu ĂŒberprĂŒfen, haben wir ihn auf verschiedenen Aufgaben des maschinellen Sehens, einschließlich menschliche PosenschĂ€tzung, Mehrobjektverfolgung und instanz-bewusste semantische Segmentierung, angewandt. Wir zeigen, dass wir Resultate von vorherigen Arbeiten mit exakt den gleichen Daten verbessern können. Abschließend benutzen wir Mehrfachschnitte in zwei Anwendungen: 1) Ein Nutzer-Server-Werkzeug fĂŒr interaktive Video Segmentierung: Nach der Vorbearbeitung eines Videos zeichnet der Nutzer Striche auf mehrere Einzelbilder und eine zeit-kohĂ€rente Segmentierung des gesamten Videos wird in Echtzeit berechnet. 2) Wir formulieren eine Methode fĂŒr simultane Segmentierung und Verfolgung von lebenden Zellen in Mikroskopie-Aufnahmen. Diese Aufgabe ist anspruchsvoll, da Zellen sich aufteilen und unser Algorithmus dies in der Erstellung von Eltern-Hierarchien mitberĂŒcksichtigen muss. Wir prĂ€sentieren außerdem Resultate zur Mehrmodellanpassung. Wir berechnen Modelle in stark verrauschten Daten indem wir mithilfe von Mehrfachschnitten höherer Ordnung Komponenten finden, die diesen Modellen entsprechen. Wir fĂŒhren eine interessante Erweiterung ein, die es unserer Optimierung erlaubt, bessere Hyperparameter fĂŒr jedes entdeckte Modell auszuwĂ€hlen. Zusammenfassend erweitert diese Arbeit den Mehrfachschnitt-Ansatz in unterschiedlichen Richtungen, schlĂ€gt Algorithmen zur Inferenz in den resultierenden Modellen vor und wendet ihn auf neuartigen Daten und Umgebungen an

    Analysis and Synthesis of Interactive Video Sprites

    Get PDF
    In this thesis, we explore how video, an extremely compelling medium that is traditionally consumed passively, can be transformed into interactive experiences and what is preventing content creators from using it for this purpose. Film captures extremely rich and dynamic information but, due to the sheer amount of data and the drastic change in content appearance over time, it is problematic to work with. Content creators are willing to invest time and effort to design and capture video so why not for manipulating and interacting with it? We hypothesize that people can help and be helped by automatic video processing and synthesis algorithms when they are given the right tools. Computer games are a very popular interactive media where players engage with dynamic content in compelling and intuitive ways. The first contribution of this thesis is an in-depth exploration of the modes of interaction that enable game-like video experiences. Through active discussions with game developers, we identify both how to assist content creators and how their creation can be dynamically interacted with by players. We present concepts, explore algorithms and design tools that together enable interactive video experiences. Our findings concerning processing videos and interacting with filmed content come together in this thesis' second major contribution. We present a new medium of expression where video elements can be looped, merged and triggered interactively. Static-camera videos are converted into loopable sequences that can be controlled in real time in response to simple end-user requests. We present novel algorithms and interactive tools that enable our new medium of expression. Our human-in-the-loop system gives the user progressively more creative control over the video content as they invest more effort and artists help us evaluate it. Monocular, static-camera videos are a good fit for looping algorithms but they have been limited to two-dimensional applications as pixels are reshuffled in space and time on the image plane. The final contribution of this thesis breaks through this barrier by allowing users to interact with filmed objects in a three-dimensional manner. Our novel object tracking algorithm extends existing 2D bounding box trackers with 3D information, such as a well-fitting bounding volume, which in turn enables a new breed of interactive video experiences. The filmed content becomes a three-dimensional playground as users are free to move the virtual camera or the tracked objects and see them from novel viewpoints

    Interactive computer vision through the Web

    Get PDF
    Computer vision is the computational science aiming at reproducing and improving the ability of human vision to understand its environment. In this thesis, we focus on two fields of computer vision, namely image segmentation and visual odometry and we show the positive impact that interactive Web applications provide on each. The first part of this thesis focuses on image annotation and segmentation. We introduce the image annotation problem and challenges it brings for large, crowdsourced datasets. Many interactions have been explored in the literature to help segmentation algorithms. The most common consist in designating contours, bounding boxes around objects, or interior and exterior scribbles. When crowdsourcing, annotation tasks are delegated to a non-expert public, sometimes on cheaper devices such as tablets. In this context, we conducted a user study showing the advantages of the outlining interaction over scribbles and bounding boxes. Another challenge of crowdsourcing is the distribution medium. While evaluating an interaction in a small user study does not require complex setup, distributing an annotation campaign to thousands of potential users might differ. Thus we describe how the Elm programming language helped us build a reliable image annotation Web application. A highlights tour of its functionalities and architecture is provided, as well as a guide on how to deploy it to crowdsourcing services such as Amazon Mechanical Turk. The application is completely opensource and available online. In the second part of this thesis we present our open-source direct visual odometry library. In that endeavor, we provide an evaluation of other open-source RGB-D camera tracking algorithms and show that our approach performs as well as the currently available alternatives. The visual odometry problem relies on geometry tools and optimization techniques traditionally requiring much processing power to perform at realtime framerates. Since we aspire to run those algorithms directly in the browser, we review past and present technologies enabling high performance computations on the Web. In particular, we detail how to target a new standard called WebAssembly from the C++ and Rust programming languages. Our library has been started from scratch in the Rust programming language, which then allowed us to easily port it to WebAssembly. Thanks to this property, we are able to showcase a visual odometry Web application with multiple types of interactions available. A timeline enables one-dimensional navigation along the video sequence. Pairs of image points can be picked on two 2D thumbnails of the image sequence to realign cameras and correct drifts. Colors are also used to identify parts of the 3D point cloud, selectable to reinitialize camera positions. Combining those interactions enables improvements on the tracking and 3D point reconstruction results

    Segmentation mutuelle d'objets d'intĂ©rĂȘt dans des sĂ©quences d'images stĂ©rĂ©o multispectrales

    Get PDF
    Les systĂšmes de vidĂ©osurveillance automatisĂ©s actuellement dĂ©ployĂ©s dans le monde sont encore bien loin de ceux qui sont reprĂ©sentĂ©s depuis des annĂ©es dans les oeuvres de sciencefiction. Une des raisons derriĂšre ce retard de dĂ©veloppement est le manque d’outils de bas niveau permettant de traiter les donnĂ©es brutes captĂ©es sur le terrain. Le prĂ©-traitement de ces donnĂ©es sert Ă  rĂ©duire la quantitĂ© d’information qui transige vers des serveurs centralisĂ©s, qui eux effectuent l’interprĂ©tation complĂšte du contenu visuel captĂ©. L’identification d’objets d’intĂ©rĂȘt dans les images brutes Ă  partir de leur mouvement est un exemple de prĂ©-traitement qui peut ĂȘtre rĂ©alisĂ©. Toutefois, dans un contexte de vidĂ©osurveillance, une mĂ©thode de prĂ©-traitement ne peut gĂ©nĂ©ralement pas se fier Ă  un modĂšle d’apparence ou de forme qui caractĂ©rise ces objets, car leur nature exacte n’est pas connue d’avance. Cela complique donc l’élaboration des mĂ©thodes de traitement de bas niveau. Dans cette thĂšse, nous prĂ©sentons diffĂ©rentes mĂ©thodes permettant de dĂ©tecter et de segmenter des objets d’intĂ©rĂȘt Ă  partir de sĂ©quences vidĂ©o de maniĂšre complĂštement automatisĂ©e. Nous explorons d’abord les approches de segmentation vidĂ©o monoculaire par soustraction d’arriĂšre-plan. Ces approches se basent sur l’idĂ©e que l’arriĂšre-plan d’une scĂšne peut ĂȘtre modĂ©lisĂ© au fil du temps, et que toute variation importante d’apparence non prĂ©dite par le modĂšle dĂ©voile en fait la prĂ©sence d’un objet en intrusion. Le principal dĂ©fi devant ĂȘtre relevĂ© par ce type de mĂ©thode est que leur modĂšle d’arriĂšre-plan doit pouvoir s’adapter aux changements dynamiques des conditions d’observation de la scĂšne. La mĂ©thode conçue doit aussi pouvoir rester sensible Ă  l’apparition de nouveaux objets d’intĂ©rĂȘt, malgrĂ© cette robustesse accrue aux comportements dynamiques prĂ©visibles. Nous proposons deux mĂ©thodes introduisant diffĂ©rentes techniques de modĂ©lisation qui permettent de mieux caractĂ©riser l’apparence de l’arriĂšre-plan sans que le modĂšle soit affectĂ© par les changements d’illumination, et qui analysent la persistance locale de l’arriĂšre-plan afin de mieux dĂ©tecter les objets d’intĂ©rĂȘt temporairement immobilisĂ©s. Nous introduisons aussi de nouveaux mĂ©canismes de rĂ©troaction servant Ă  ajuster les hyperparamĂštres de nos mĂ©thodes en fonction du dynamisme observĂ© de la scĂšne et de la qualitĂ© des rĂ©sultats produits.----------ABSTRACT: The automated video surveillance systems currently deployed around the world are still quite far in terms of capabilities from the ones that have inspired countless science fiction works over the past few years. One of the reasons behind this lag in development is the lack of lowlevel tools that allow raw image data to be processed directly in the field. This preprocessing is used to reduce the amount of information transferred to centralized servers that have to interpret the captured visual content for further use. The identification of objects of interest in raw images based on motion is an example of a reprocessing step that might be required by a large system. However, in a surveillance context, the preprocessing method can seldom rely on an appearance or shape model to recognize these objects since their exact nature cannot be known exactly in advance. This complicates the elaboration of low-level image processing methods. In this thesis, we present different methods that detect and segment objects of interest from video sequences in a fully unsupervised fashion. We first explore monocular video segmentation approaches based on background subtraction. These approaches are based on the idea that the background of an observed scene can be modeled over time, and that any drastic variation in appearance that is not predicted by the model actually reveals the presence of an intruding object. The main challenge that must be met by background subtraction methods is that their model should be able to adapt to dynamic changes in scene conditions. The designed methods must also remain sensitive to the emergence of new objects of interest despite this increased robustness to predictable dynamic scene behaviors. We propose two methods that introduce different modeling techniques to improve background appearance description in an illumination-invariant way, and that analyze local background persistence to improve the detection of temporarily stationary objects. We also introduce new feedback mechanisms used to adjust the hyperparameters of our methods based on the observed dynamics of the scene and the quality of the generated output

    Ecosystemic Evolution Feeded by Smart Systems

    Get PDF
    Information Society is advancing along a route of ecosystemic evolution. ICT and Internet advancements, together with the progression of the systemic approach for enhancement and application of Smart Systems, are grounding such an evolution. The needed approach is therefore expected to evolve by increasingly fitting into the basic requirements of a significant general enhancement of human and social well-being, within all spheres of life (public, private, professional). This implies enhancing and exploiting the net-living virtual space, to make it a virtuous beneficial integration of the real-life space. Meanwhile, contextual evolution of smart cities is aiming at strongly empowering that ecosystemic approach by enhancing and diffusing net-living benefits over our own lived territory, while also incisively targeting a new stable socio-economic local development, according to social, ecological, and economic sustainability requirements. This territorial focus matches with a new glocal vision, which enables a more effective diffusion of benefits in terms of well-being, thus moderating the current global vision primarily fed by a global-scale market development view. Basic technological advancements have thus to be pursued at the system-level. They include system architecting for virtualization of functions, data integration and sharing, flexible basic service composition, and end-service personalization viability, for the operation and interoperation of smart systems, supporting effective net-living advancements in all application fields. Increasing and basically mandatory importance must also be increasingly reserved for human–technical and social–technical factors, as well as to the associated need of empowering the cross-disciplinary approach for related research and innovation. The prospected eco-systemic impact also implies a social pro-active participation, as well as coping with possible negative effects of net-living in terms of social exclusion and isolation, which require incisive actions for a conformal socio-cultural development. In this concern, speed, continuity, and expected long-term duration of innovation processes, pushed by basic technological advancements, make ecosystemic requirements stricter. This evolution requires also a new approach, targeting development of the needed basic and vocational education for net-living, which is to be considered as an engine for the development of the related ‘new living know-how’, as well as of the conformal ‘new making know-how’
    corecore