    Vision-based and marker-less surgical tool detection and tracking: a review of the literature

    In recent years, tremendous progress has been made in surgical practice for example with Minimally Invasive Surgery (MIS). To overcome challenges coming from deported eye-to-hand manipulation, robotic and computer-assisted systems have been developed. Having real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy is a key ingredient for such systems. In this paper, we present a review of the literature dealing with vision-based and marker-less surgical tool detection. This paper includes three primary contributions: (1) identification and analysis of data-sets used for developing and testing detection algorithms, (2) in-depth comparison of surgical tool detection methods from the feature extraction process to the model learning strategy and highlight existing shortcomings, and (3) analysis of validation techniques employed to obtain detection performance results and establish comparison between surgical tool detectors. The papers included in the review were selected through PubMed and Google Scholar searches using the keywords: “surgical tool detection”, “surgical tool tracking”, “surgical instrument detection” and “surgical instrument tracking” limiting results to the year range 2000 2015. Our study shows that despite significant progress over the years, the lack of established surgical tool data-sets, and reference format for performance assessment and method ranking is preventing faster improvement

    Knowledge-driven entity recognition and disambiguation in biomedical text

    Entity recognition and disambiguation (ERD) for the biomedical domain are notoriously difficult problems due to the variety of entities and their often long names in many variations. Existing works focus heavily on the molecular level in two ways. First, they target scientific literature as the input text genre. Second, they target single, highly specialized entity types such as chemicals, genes, and proteins. However, a wealth of biomedical information is also buried in the vast universe of Web content. In order to fully utilize all the information available, there is a need to tap into Web content as an additional input. Moreover, there is a need to cater for other entity types such as symptoms and risk factors since Web content focuses on consumer health. The goal of this thesis is to investigate ERD methods that are applicable to all entity types in scientific literature as well as Web content. In addition, we focus on under-explored aspects of the biomedical ERD problems -- scalability, long noun phrases, and out-of-knowledge base (OOKB) entities. This thesis makes four main contributions, all of which leverage knowledge in UMLS (Unified Medical Language System), the largest and most authoritative knowledge base (KB) of the biomedical domain. The first contribution is a fast dictionary lookup method for entity recognition that maximizes throughput while balancing the loss of precision and recall. The second contribution is a semantic type classification method targeting common words in long noun phrases. We develop a custom set of semantic types to capture word usages; besides biomedical usage, these types also cope with non-biomedical usage and the case of generic, non-informative usage. The third contribution is a fast heuristics method for entity disambiguation in MEDLINE abstracts, again maximizing throughput but this time maintaining accuracy. The fourth contribution is a corpus-driven entity disambiguation method that addresses OOKB entities. The method first captures the entities expressed in a corpus as latent representations that comprise in-KB and OOKB entities alike before performing entity disambiguation.Die Erkennung und Disambiguierung von EntitĂ€ten fĂŒr den biomedizinischen Bereich stellen, wegen der vielfĂ€ltigen Arten von biomedizinischen EntitĂ€ten sowie deren oft langen und variantenreichen Namen, große Herausforderungen dar. Vorhergehende Arbeiten konzentrieren sich in zweierlei Hinsicht fast ausschließlich auf molekulare EntitĂ€ten. Erstens fokussieren sie sich auf wissenschaftliche Publikationen als Genre der Eingabetexte. Zweitens fokussieren sie sich auf einzelne, sehr spezialisierte EntitĂ€tstypen wie Chemikalien, Gene und Proteine. Allerdings bietet das Internet neben diesen Quellen eine Vielzahl an Inhalten biomedizinischen Wissens, das vernachlĂ€ssigt wird. Um alle verfĂŒgbaren Informationen auszunutzen besteht der Bedarf weitere Internet-Inhalte als zusĂ€tzliche Quellen zu erschließen. Außerdem ist es auch erforderlich andere EntitĂ€tstypen wie Symptome und Risikofaktoren in Betracht zu ziehen, da diese fĂŒr zahlreiche Inhalte im Internet, wie zum Beispiel Verbraucherinformationen im Gesundheitssektor, relevant sind. Das Ziel dieser Dissertation ist es, Methoden zur Erkennung und Disambiguierung von EntitĂ€ten zu erforschen, die alle EntitĂ€tstypen in Betracht ziehen und sowohl auf wissenschaftliche Publikationen als auch auf andere Internet-Inhalte anwendbar sind. DarĂŒber hinaus setzen wir Schwerpunkte auf oft vernachlĂ€ssigte Aspekte der biomedizinischen Erkennung und Disambiguierung von EntitĂ€ten, nĂ€mlich Skalierbarkeit, lange Nominalphrasen und fehlende EntitĂ€ten in einer Wissensbank. In dieser Hinsicht leistet diese Dissertation vier HauptbeitrĂ€ge, denen allen das Wissen von UMLS (Unified Medical Language System), der grĂ¶ĂŸten und wichtigsten Wissensbank im biomedizinischen Bereich, zu Grunde liegt. Der erste Beitrag ist eine schnelle Methode zur Erkennung von EntitĂ€ten mittels Lexikonabgleich, welche den Durchsatz maximiert und gleichzeitig den Verlust in Genauigkeit und Trefferquote (precision and recall) balanciert. Der zweite Beitrag ist eine Methode zur Klassifizierung der semantischen Typen von Nomen, die sich auf gebrĂ€uchliche Nomen von langen Nominalphrasen richtet und auf einer selbstentwickelten Sammlung von semantischen Typen beruht, die die Verwendung der Nomen erfasst. Neben biomedizinischen können diese Typen auch nicht-biomedizinische und allgemeine, informationsarme Verwendungen behandeln. Der dritte Beitrag ist eine schnelle Heuristikmethode zur Disambiguierung von EntitĂ€ten in MEDLINE Kurzfassungen, welche den Durchsatz maximiert, aber auch die Genauigkeit erhĂ€lt. Der vierte Beitrag ist eine korpusgetriebene Methode zur Disambiguierung von EntitĂ€ten, die speziell fehlende EntitĂ€ten in einer Wissensbank behandelt. Die Methode wandelt erst die EntitĂ€ten, die in einem Textkorpus ausgedrĂŒckt aber nicht notwendigerweise in einer Wissensbank sind, in latente Darstellungen um und fĂŒhrt anschließend die Disambiguierung durch

    Data integration support for offshore decommissioning waste management

    Offshore oil and gas platforms have a design life of about 25 years whereas the techniques and tools used for managing their data are constantly evolving. Therefore, data captured about platforms during their lifetimes will be in varying forms. Additionally, due to the many stakeholders involved with a facility over its life cycle, information representation of its components varies. These challenges make data integration difficult. Over the years, data integration technology application in the oil and gas industry has focused on meeting the needs of asset life cycle stages other than decommissioning. This is the case because most assets are just reaching the end of their design lives. Currently, limited work has been done on integrating life cycle data for offshore decommissioning purposes, and reports by industry stakeholders underscore this need. This thesis proposes a method for the integration of the common data types relevant in oil and gas decommissioning. The key features of the method are that it (i) ensures semantic homogeneity using knowledge representation languages (Semantic Web) and domain specific reference data (ISO 15926); and (ii) allows stakeholders to continue to use their current applications. Prototypes of the framework have been implemented using open source software applications and performance measures made. The work of this thesis has been motivated by the business case of reusing offshore decommissioning waste items. The framework developed is generic and can be applied whenever there is a need to integrate and query disparate data involving oil and gas assets. The prototypes presented show how the data management challenges associated with assessing the suitability of decommissioned offshore facility items for reuse can be addressed. The performance of the prototypes show that significant time and effort is saved compared to the state-of‐the‐art solution. The ability to do this effectively and efficiently during decommissioning will advance the oil the oil and gas industry’s transition toward a circular economy and help save on cost

    Interpreting and Answering Keyword Queries using Web Knowledge Bases

    Many keyword queries issued to Web search engines target information about real world entities, and interpreting these queries over Web knowledge bases can allow a search system to provide exact answers to keyword queries. Such an ability provides a useful service to end users, as their information need can be directly addressed and they need not scour textual results for the desired information. However, not all keyword queries can be addressed by even the most comprehensive knowledge base, and therefore equally important is the problem of recognizing when a reference knowledge base is not capable of modelling the keyword query's intention. This may be due to lack of coverage of the knowledge base or lack of expressiveness in the underlying query representation formalism. This thesis presents an approach to computing structured representations of keyword queries over a reference knowledge base. Keyword queries are annotated with occurrences of semantic constructs by learning a sequential labelling model from an annotated Web query log. Frequent query structures are then mined from the query log and are used along with the annotations to map keyword queries into a structured representation over the vocabulary of a reference knowledge base. The proposed approach exploits coarse linguistic structure in keyword queries, and combines it with rich structured query representations of information needs. As an intermediate representation formalism, a novel query language is proposed that blends keyword search with structured query processing over large Web knowledge bases. The formalism for structured keyword queries combines the flexibility of keyword search with the expressiveness of structures queries. A solution to the resulting disambiguation problem caused by introducing keywords as primitives in a structured query language is presented. Expressions in our proposed language are rewritten using the vocabulary of the knowledge base, and different possible rewritings are ranked based on their syntactic relationship to the keywords in the query as well as their semantic coherence in the underlying knowledge base. The problem of ranking knowledge base entities returned as a query result is also explored from the perspective of personalized result ranking. User interest models based on entity types are learned from a Web search session by cross referencing clicks on URLs with known entity homepages. The user interest model is then used to effectively rerank answer lists for a given user. A methodology for evaluating entity-based search engines is also proposed and empirically evaluated

    Professional Search in Pharmaceutical Research

    In the mid 90s, visiting libraries – as means of retrieving the latest literature – was still a common necessity among professionals. Nowadays, professionals simply access information by ‘googling’. Indeed, the name of the Web search engine market leader “Google” became a synonym for searching and retrieving information. Despite the increased popularity of search as a method for retrieving relevant information, at the workplace search engines still do not deliver satisfying results to professionals. Search engines for instance ignore that the relevance of answers (the satisfaction of a searcher’s needs) depends not only on the query (the information request) and the document corpus, but also on the working context (the user’s personal needs, education, etc.). In effect, an answer which might be appropriate to one user might not be appropriate to the other user, even though the query and the document corpus are the same for both. Personalization services addressing the context become therefore more and more popular and are an active field of research. This is only one of several challenges encountered in ‘professional search’: How can the working context of the searcher be incorporated in the ranking process; how can unstructured free-text documents be enriched with semantic information so that the information need can be expressed precisely at query time; how and to which extent can a company’s knowledge be exploited for search purposes; how should data from distributed sources be accessed from into one-single-entry-point. This thesis is devoted to ‘professional search’, i.e. search at the workplace, especially in industrial research and development. We contribute by compiling and developing several approaches for facing the challenges mentioned above. The approaches are implemented into the prototype YASA (Your Adaptive Search Agent) which provides meta-search, adaptive ranking of search results, guided navigation, and which uses domain knowledge to drive the search processes. YASA is deployed in the pharmaceutical research department of Roche in Penzberg – a major pharmaceutical company – in which the applied methods were empirically evaluated. Being confronted with mostly unstructured free-text documents and having barely explicit metadata at hand, we faced a serious challenge. Incorporating semantics (i.e. formal knowledge representation) into the search process can only be as good as the underlying data. Nonetheless, we are able to demonstrate that this issue can be largely compensated by incorporating automatic metadata extraction techniques. The metadata we were able to extract automatically was not perfectly accurate, nor did the ontology we applied contain considerably “rich semantics”. Nonetheless, our results show that already the little semantics incorporated into the search process, suffices to achieve a significant improvement in search and retrieval. We thus contribute to the research field of context-based search by incorporating the working context into the search process – an area which so far has not yet been well studied

    Automated histopathological analyses at scale

    Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2017.Cataloged from PDF version of thesis.Includes bibliographical references (pages 68-73).Histopathology is the microscopic examination of processed human tissues to diagnose conditions like cancer, tuberculosis, anemia and myocardial infractions. The diagnostic procedure is, however, very tedious, time-consuming and prone to misinterpretation. It also requires highly trained pathologists to operate, making it unsuitable for large-scale screening in resource-constrained settings, where experts are scarce and expensive. In this thesis, we present a software system for automated screening, backed by deep learning algorithms. This cost-effective, easily-scalable solution can be operated by minimally trained health workers and would extend the reach of histopathological analyses to settings such as rural villages, mass-screening camps and mobile health clinics. With metastatic breast cancer as our primary case study, we describe how the system could be used to test for the presence of a tumor, determine the precise location of a lesion, as well as the severity stage of a patient. We examine how the algorithms are combined into an end-to-end pipeline for utilization by hospitals, doctors and clinicians on a Software as a Service (SaaS) model. Finally, we discuss potential deployment strategies for the technology, as well an analysis of the market and distribution chain in the specific case of the current Indian healthcare ecosystem.by Mrinal Mohit.S.M

    Towards Context-free Information Importance Estimation

    The amount of information contained in heterogeneous text documents such as news articles, blogs, social media posts, scientific articles, discussion forums, and microblogging platforms is already huge and is going to increase further. It is not possible for humans to cope with this flood of information, so that important information can neither be found nor be utilized. This situation is unfortunate since information is the key driver in many areas of society in the present Information Age. Hence, developing automatic means that can assist people to handle the information overload is crucial. Developing methods for automatic estimation of information importance is an essential step towards this goal. The guiding hypothesis of this work is that prior methods for automatic information importance estimation are inherently limited because they are based on merely correlated signals that are, however, not causally linked with information importance. To resolve this issue, we lay in this work the foundations for a fundamentally new approach for importance estimation. The key idea of context-free information importance estimation is to equip machine learning models with world knowledge so that they can estimate information importance based on causal reasons. In the first part of this work, we lay the theoretical foundations for context-free information importance estimation. First, we discuss how the abstract concept of information importance can be formally defined. So far, a formal definition of this concept is missing in the research community. We close this gap by discussing two information importance definitions, which equate the importance of information with its impact on the behavior and the impact on the course of life of the information recipients, respectively. Second, we discuss how information importance estimation abilities can be assessed. Usually, this is done by performing automatic summarization of text documents. However, we find that this approach is not ideal. Instead, we propose to consider ranking, regression, and preference prediction tasks as alternatives in future work. Third, we deduce context-free information importance estimation as a logical consequence of the previously introduced importance definitions. We find that reliable importance estimation, in particular for heterogeneous text documents, is only possible with context-free methods. In the second part, we develop the first machine learning models based on the idea of context-free information importance estimation. To this end, we first tackle the lack of suited datasets that are required to train and test machine learning models. In particular, large and heterogeneous datasets to investigate automatic summarization of multiple source documents are missing, because their construction is complicated and costly. To solve this problem, we present a simple and cost-efficient corpus construction approach and demonstrate its applicability by creating new multi-document summarization datasets. Second, we develop a new machine learning approach for context-free information importance estimation, implement a concrete realization, and demonstrate its advantages over contextual importance estimators. Third, we develop a new method to evaluate automatic summarization methods. Previous works are based on expensive reference summaries and unreliable semantic comparisons of text documents. On the contrary, our approach uses cheap pairwise preference annotations and only much simpler sentence-level similarity estimation. This work lays the foundations for context-free information importance estimation. We hope that future research will explore if this fundamentally new type of information importance estimation can eventually lead to human-level information importance estimation abilities
