    An image retrieval framework for real-time endoscopic image retargeting

    Purpose Serial endoscopic examinations of a patient are important for early diagnosis of malignancies in the gastrointestinal tract. However, retargeting for optical biopsy is challenging due to extensive tissue variations between examinations, requiring the method to be tolerant to these changes whilst enabling real-time retargeting. Method This work presents an image retrieval framework for inter-examination retargeting. We propose both a novel image descriptor tolerant of long-term tissue changes and a novel descriptor matching method in real time. The descriptor is based on histograms generated from regional intensity comparisons over multiple scales, offering stability over long-term appearance changes at the higher levels, whilst remaining discriminative at the lower levels. The matching method then learns a hashing function using random forests, to compress the string and allow for fast image comparison by a simple Hamming distance metric. Results A dataset that contains 13 in vivo gastrointestinal videos was collected from six patients, representing serial examinations of each patient, which includes videos captured with significant time intervals. Precision-recall for retargeting shows that our new descriptor outperforms a number of alternative descriptors, whilst our hashing method outperforms a number of alternative hashing approaches. Conclusion We have proposed a novel framework for optical biopsy in serial endoscopic examinations. A new descriptor, combined with a novel hashing method, achieves state-of-the-art retargeting, with validation on in vivo videos from six patients. Real-time performance also allows for practical integration without disturbing the existing clinical workflow

    Vision-based retargeting for endoscopic navigation

    Endoscopy is a standard procedure for visualising the human gastrointestinal tract. With the advances in biophotonics, imaging techniques such as narrow band imaging, confocal laser endomicroscopy, and optical coherence tomography can be combined with normal endoscopy for assisting the early diagnosis of diseases, such as cancer. In the past decade, optical biopsy has emerged to be an effective tool for tissue analysis, allowing in vivo and in situ assessment of pathological sites with real-time feature-enhanced microscopic images. However, the non-invasive nature of optical biopsy leads to an intra-examination retargeting problem, which is associated with the difficulty of re-localising a biopsied site consistently throughout the whole examination. In addition to intra-examination retargeting, retargeting of a pathological site is even more challenging across examinations, due to tissue deformation and changing tissue morphologies and appearances. The purpose of this thesis is to address both the intra- and inter-examination retargeting problems associated with optical biopsy. We propose a novel vision-based framework for intra-examination retargeting. The proposed framework is based on combining visual tracking and detection with online learning of the appearance of the biopsied site. Furthermore, a novel cascaded detection approach based on random forests and structured support vector machines is developed to achieve efficient retargeting. To cater for reliable inter-examination retargeting, the solution provided in this thesis is achieved by solving an image retrieval problem, for which an online scene association approach is proposed to summarise an endoscopic video collected in the first examination into distinctive scenes. A hashing-based approach is then used to learn the intrinsic representations of these scenes, such that retargeting can be achieved in subsequent examinations by retrieving the relevant images using the learnt representations. For performance evaluation of the proposed frameworks, extensive phantom, ex vivo and in vivo experiments have been conducted, with results demonstrating the robustness and potential clinical values of the methods proposed.Open Acces

    Efficient video indexing for monitoring disease activity and progression in the upper gastrointestinal tract

    Endoscopy is a routine imaging technique used for both diagnosis and minimally invasive surgical treatment. While the endoscopy video contains a wealth of information, tools to capture this information for the purpose of clinical reporting are rather poor. In date, endoscopists do not have any access to tools that enable them to browse the video data in an efficient and user friendly manner. Fast and reliable video retrieval methods could for example, allow them to review data from previous exams and therefore improve their ability to monitor disease progression. Deep learning provides new avenues of compressing and indexing video in an extremely efficient manner. In this study, we propose to use an autoencoder for efficient video compression and fast retrieval of video images. To boost the accuracy of video image retrieval and to address data variability like multi-modality and view-point changes, we propose the integration of a Siamese network. We demonstrate that our approach is competitive in retrieving images from 3 large scale videos of 3 different patients obtained against the query samples of their previous diagnosis. Quantitative validation shows that the combined approach yield an overall improvement of 5% and 8% over classical and variational autoencoders, respectively.Comment: Accepted at IEEE International Symposium on Biomedical Imaging (ISBI), 201

    Tracking and Mapping in Medical Computer Vision: A Review

    As computer vision algorithms are becoming more capable, their applications in clinical systems will become more pervasive. These applications include diagnostics such as colonoscopy and bronchoscopy, guiding biopsies and minimally invasive interventions and surgery, automating instrument motion and providing image guidance using pre-operative scans. Many of these applications depend on the specific visual nature of medical scenes and require designing and applying algorithms to perform in this environment. In this review, we provide an update to the field of camera-based tracking and scene mapping in surgery and diagnostics in medical computer vision. We begin with describing our review process, which results in a final list of 515 papers that we cover. We then give a high-level summary of the state of the art and provide relevant background for those who need tracking and mapping for their clinical applications. We then review datasets provided in the field and the clinical needs therein. Then, we delve in depth into the algorithmic side, and summarize recent developments, which should be especially useful for algorithm designers and to those looking to understand the capability of off-the-shelf methods. We focus on algorithms for deformable environments while also reviewing the essential building blocks in rigid tracking and mapping since there is a large amount of crossover in methods. Finally, we discuss the current state of the tracking and mapping methods along with needs for future algorithms, needs for quantification, and the viability of clinical applications in the field. We conclude that new methods need to be designed or combined to support clinical applications in deformable environments, and more focus needs to be put into collecting datasets for training and evaluation.Comment: 31 pages, 17 figure

    Surgical Data Science - from Concepts toward Clinical Translation

    Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of SDS, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process

    Visual and Camera Sensors

    This book includes 13 papers published in Special Issue ("Visual and Camera Sensors") of the journal Sensors. The goal of this Special Issue was to invite high-quality, state-of-the-art research papers dealing with challenging issues in visual and camera sensors

    Automated retrieval and extraction of training course information from unstructured web pages

    Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, Naïve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance

    Generation of Artificial Image and Video Data for Medical Deep Learning Applications

    Neuronale Netze haben in den letzten Jahren erstaunliche Ergebnisse bei der Erkennung von Ereignissen im Bereich der medizinischen Bild- und Videoanalyse erzielt. Dabei stellte sich jedoch immer wieder heraus, dass ein genereller Mangel an Daten besteht. Dieser Mangel bezieht sich nicht nur auf die Anzahl an verfügbaren Datensätzen, sondern auch auf die Anzahl an individuellen Stichproben, das heißt an unabhängigen Bildern und Videos, in bestehenden Datensätzen. Das führt wiederum zu einer schlechteren Erkennungsgenauigkeit von Ereignissen durch das neuronale Netz. Gerade im medizinischen Bereich ist es nicht einfach möglich die Datensätze zu erweitern oder neue Datensätze zu erfassen. Die Gründe hierfür sind vielfältig. Einerseits können rechtliche Belange die Datenveröffentlichung verhindern. Andererseits kann es sein, dass eine Krankheit nur sehr selten Auftritt und sich so keine Gelegenheit bietet die Daten zu erfassen. Ein zusätzliches Problem ist, dass es sich bei den Daten meist um eine sehr spezifische Domäne handelt, wodurch die Daten meist nur von Experten annotiert werden können. Die Annotation ist aber zeitaufwendig und somit teuer. Existierende Datenaugmentierungsmethoden können oft nur sinnvoll auf Bilddaten angewendet werden und erzeugen z.B. bei Videos nicht ausreichend zeitlich unabhängige Daten. Deswegen ist es notwendig, dass neue Methoden entwickelt werden, mit denen im Nachhinein auch Videodatensätze erweitert oder auch synthetische Daten generiert werden können. Im Rahmen dieser Dissertation werden zwei neu entwickelte Methoden vorgestellt und beispielhaft auf drei medizinische Beispiele aus dem Bereich der Chirurgie angewendet. Die erste Methode ist die sogenannte Workflow-Augmentierungsmethode, mit deren Hilfe semantischen Information, z.B. Ereignissen eines chirurgischen Arbeitsablaufs, in einem Video augmentiert werden können. Die Methode ermöglicht zusätzlich auch eine Balancierung zum Beispiel von chirurgischen Phasen oder chirurgischen Instrumenten, die im Videodatensatz vorkommen. Bei der Anwendung der Methode auf die zwei verschiedenen Datensätzen, von Kataraktoperationen und laparoskopischen Cholezystektomieoperationen, konnte die Leistungsfähigkeit der Methode gezeigt werden. Dabei wurde Genauigkeit der Instrumentenerkennung bei der Kataraktoperation durch ein Neuronales Netz während Kataraktoperation um 2,8% auf 93,5% im Vergleich zu etablierten Methoden gesteigert. Bei der chirurgischen Phasenerkennung im Fall bei der Cholezystektomie konnte sogar eine Steigerung der Genauigkeit um 8,7% auf 96,96% im Verglich zu einer früheren Studie erreicht werden. Beide Studien zeigen eindrucksvoll das Potential der Workflow-Augmentierungsmethode. Die zweite vorgestellte Methode basiert auf einem erzeugenden gegnerischen Netzwerk (engl. generative adversarial network (GAN)). Dieser Ansatz ist sehr vielversprechend, wenn nur sehr wenige Daten oder Datensätze vorhanden sind. Dabei werden mit Hilfe eines neuronalen Netzes neue fotorealistische Bilder generiert. Im Rahmen dieser Dissertation wird ein sogenanntes zyklisches erzeugendes gegnerisches Netzwerk (engl. cycle generative adversarial network (CycleGAN)) verwendet. CycleGANs führen meiste eine Bild zu Bild Transformation durch. Zusätzlich ist es möglich weitere Bedingungen an die Transformation zu knüpfen. Das CycleGAN wurde im dritten Beispiel dazu verwendet, ein Passbild von einem Patienten nach einem Kranio-Maxillofazialen chirurgischen Korrektur, mit Hilfe eines präoperativen Porträtfotos und der operativen 3D Planungsmaske, zu schätzen. Dabei konnten realistisch, lebendig aussehende Bilder generiert werden, ohne dass für das Training des GANs medizinische Daten verwendeten wurden. Stattdessen wurden für das Training synthetisch erzeugte Daten verwendet. Abschließend lässt sich sagen, dass die in dieser Arbeit entwickelten Methoden in der Lage sind, den Mangel an Stichproben und Datensätzen teilweise zu überwinden und dadurch eine bessere Erkennungsleistung von neuronalen Netzen erreicht werden konnte. Die entwickelten Methoden können in Zukunft dazu verwendet werden, bessere medizinische Unterstützungssysteme basierende auf künstlicher Intelligenz zu entwerfen, die den Arzt in der klinischen Routine weiter unterstützen, z.B. bei der Diagnose, der Therapie oder bei bildgesteuerten Eingriffen, was zu einer Verringerung der klinischen Arbeitsbelastung und damit zu einer Verbesserung der Patientensicherheit führt