550 research outputs found

    Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment

    Full text link
    We propose and demonstrate a novel machine learning algorithm that assesses pulmonary edema severity from chest radiographs. While large publicly available datasets of chest radiographs and free-text radiology reports exist, only limited numerical edema severity labels can be extracted from radiology reports. This is a significant challenge in learning such models for image classification. To take advantage of the rich information present in the radiology reports, we develop a neural network model that is trained on both images and free-text to assess pulmonary edema severity from chest radiographs at inference time. Our experimental results suggest that the joint image-text representation learning improves the performance of pulmonary edema assessment compared to a supervised model trained on images only. We also show the use of the text for explaining the image classification by the joint model. To the best of our knowledge, our approach is the first to leverage free-text radiology reports for improving the image model performance in this application. Our code is available at https://github.com/RayRuizhiLiao/joint_chestxray.Comment: The two first authors contributed equally. To be published in the proceedings of MICCAI 202

    Several categories of Large Language Models (LLMs): A Short Survey

    Full text link
    Large Language Models(LLMs)have become effective tools for natural language processing and have been used in many different fields. This essay offers a succinct summary of various LLM subcategories. The survey emphasizes recent developments and efforts made for various LLM kinds, including task-based financial LLMs, multilingual language LLMs, biomedical and clinical LLMs, vision language LLMs, and code language models. The survey gives a general summary of the methods, attributes, datasets, transformer models, and comparison metrics applied in each category of LLMs. Furthermore, it highlights unresolved problems in the field of developing chatbots and virtual assistants, such as boosting natural language processing, enhancing chatbot intelligence, and resolving moral and legal dilemmas. The purpose of this study is to provide readers, developers, academics, and users interested in LLM-based chatbots and virtual intelligent assistant technologies with useful information and future directions

    Distributed knowledge based clinical auto-coding system

    Get PDF
    Codification of free-text clinical narratives have long been recognised to be beneficial for secondary uses such as funding, insurance claim processing and research. In recent years, many researchers have studied the use of Natural Language Processing (NLP), related Machine Learning (ML) methods and techniques to resolve the problem of manual coding of clinical narratives. Most of the studies are focused on classification systems relevant to the U.S and there is a scarcity of studies relevant to Australian classification systems such as ICD- 10-AM and ACHI. Therefore, we aim to develop a knowledge-based clinical auto-coding system, that utilise appropriate NLP and ML techniques to assign ICD-10-AM and ACHI codes to clinical records, while adhering to both local coding standards (Australian Coding Standard) and international guidelines that get updated and validated continuously

    A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert knowledge in text supervision

    Full text link
    Foundation vision-language models are currently transforming computer vision, and are on the rise in medical imaging fueled by their very promising generalization capabilities. However, the initial attempts to transfer this new paradigm to medical imaging have shown less impressive performances than those observed in other domains, due to the significant domain shift and the complex, expert domain knowledge inherent to medical-imaging tasks. Motivated by the need for domain-expert foundation models, we present FLAIR, a pre-trained vision-language model for universal retinal fundus image understanding. To this end, we compiled 37 open-access, mostly categorical fundus imaging datasets from various sources, with up to 97 different target conditions and 284,660 images. We integrate the expert's domain knowledge in the form of descriptive textual prompts, during both pre-training and zero-shot inference, enhancing the less-informative categorical supervision of the data. Such a textual expert's knowledge, which we compiled from the relevant clinical literature and community standards, describes the fine-grained features of the pathologies as well as the hierarchies and dependencies between them. We report comprehensive evaluations, which illustrate the benefit of integrating expert knowledge and the strong generalization capabilities of FLAIR under difficult scenarios with domain shifts or unseen categories. When adapted with a lightweight linear probe, FLAIR outperforms fully-trained, dataset-focused models, more so in the few-shot regimes. Interestingly, FLAIR outperforms by a large margin more generalist, larger-scale image-language models, which emphasizes the potential of embedding experts' domain knowledge and the limitations of generalist models in medical imaging.Comment: The pre-trained model is available at: https://github.com/jusiro/FLAI

    Bodies of Seeing: A video ethnography of academic x-ray image interpretation training and professional vision in undergraduate radiology and radiography education

    Get PDF
    This thesis reports on a UK-based video ethnography of academic x-ray image interpretation training across two undergraduate courses in radiology and radiography. By studying the teaching and learning practices of the classroom, I initially explore the professional vision of x-ray image interpretation and how its relation to normal radiographic anatomy founds the practice of being ‘critical’. This criticality accomplishes a faculty of perceptual norms that is coded and organised and also, therefore, of a specific radiological vision. Professionals’ commitment to the cognitivist rhetoric of ‘looking at’/‘pattern recognition’ builds this critical perception, a perception that deepens in organisation when professionals endorse a ‘systematic approach’ that mediates matter-of-fact thoroughness and offers a helpful critical commentary towards the image. In what follows, I explore how x-ray image interpretation is constituted in case presentations. During training, x-ray images are treated with suspicion and as misleading and are aligned with a commitment to discursive contexts of ‘missed abnormality’, ‘interpretive risk’, and ‘technical error’. The image is subsequently constructed as ambiguous and that what is shown cannot be taken at face value. This interconnects with reenacting ideals around ‘seeing clearly’ that are explained through the teaching practices and material world of the academic setting and how, if misinterpretation is established, the ambiguity of the image is reduced by embodied gestures and technoscientific knowledge. By making this correction, the ambiguous image is reenacted and the misinterpretation of image content is explained. To conclude, I highlight how the professional vision of academic x-ray image interpretation prepares students for the workplace, shapes the classificatory interpretation of ab(normal) anatomy, manages ambiguity through embodied expectations and bodily norms, and cultivates body-machine relations

    Interfaces for Modular Surgical Planning and Assistance Systems

    Get PDF
    Modern surgery of the 21st century relies in many aspects on computers or, in a wider sense, digital data processing. Department administration, OR scheduling, billing, and - with increasing pervasion - patient data management are performed with the aid of so called Surgical Information Systems (SIS) or, more general, Hospital Information Systems (HIS). Computer Assisted Surgery (CAS) summarizes techniques which assist a surgeon in the preparation and conduction of surgical interventions. Today still predominantly based on radiology images, these techniques include the preoperative determination of an optimal surgical strategy and intraoperative systems which aim at increasing the accuracy of surgical manipulations. CAS is a relatively young field of computer science. One of the unsolved "teething troubles" of CAS is the absence of technical standards for the interconnectivity of CAS system. Current CAS systems are usually "islands of information" with no connection to other devices within the operating room or hospital-wide information systems. Several workshop reports and individual publications point out that this situation leads to ergonomic, logistic, and economic limitations in hospital work. Perioperative processes are prolonged by the manual installation and configuration of an increasing amount of technical devices. Intraoperatively, a large amount of the surgeons'' attention is absorbed by the requirement to monitor and operate systems. The need for open infrastructures which enable the integration of CAS devices from different vendors in order to exchange information as well as commands among these devices through a network has been identified by numerous experts with backgrounds in medicine as well as engineering. This thesis contains two approaches to the integration of CAS systems: - For perioperative data exchange, the specification of new data structures as an amendment to the existing DICOM standard for radiology image management is presented. The extension of DICOM towards surgical application allows for the seamless integration of surgical planning and reporting systems into DICOM-based Picture Archiving and Communication Systems (PACS) as they are installed in most hospitals for the exchange and long-term archival of patient images and image-related patient data. - For the integration of intraoperatively used CAS devices, such as, e.g., navigation systems, video image sources, or biosensors, the concept of a surgical middleware is presented. A c++ class library, the TiCoLi, is presented which facilitates the configuration of ad-hoc networks among the modules of a distributed CAS system as well as the exchange of data streams, singular data objects, and commands between these modules. The TiCoLi is the first software library for a surgical field of application to implement all of these services. To demonstrate the suitability of the presented specifications and their implementation, two modular CAS applications are presented which utilize the proposed DICOM extensions for perioperative exchange of surgical planning data as well as the TiCoLi for establishing an intraoperative network of autonomous, yet not independent, CAS modules.Die moderne Hochleistungschirurgie des 21. Jahrhunderts ist auf vielerlei Weise abhĂ€ngig von Computern oder, im weiteren Sinne, der digitalen Datenverarbeitung. Administrative AblĂ€ufe, wie die Erstellung von NutzungsplĂ€nen fĂŒr die verfĂŒgbaren technischen, rĂ€umlichen und personellen Ressourcen, die Rechnungsstellung und - in zunehmendem Maße - die Verwaltung und Archivierung von Patientendaten werden mit Hilfe von digitalen Informationssystemen rationell und effizient durchgefĂŒhrt. Innerhalb der Krankenhausinformationssysteme (KIS, oder englisch HIS) stehen fĂŒr die speziellen BedĂŒrfnisse der einzelnen Fachabteilungen oft spezifische Informationssysteme zur VerfĂŒgung. Chirurgieinformationssysteme (CIS, oder englisch SIS) decken hierbei vor allen Dingen die Bereiche Operationsplanung sowie Materialwirtschaft fĂŒr spezifisch chirurgische Verbrauchsmaterialien ab. WĂ€hrend die genannten HIS und SIS vornehmlich der Optimierung administrativer Aufgaben dienen, stehen die Systeme der Computerassistierten Chirugie (CAS) wesentlich direkter im Dienste der eigentlichen chirugischen Behandlungsplanung und Therapie. Die CAS verwendet Methoden der Robotik, digitalen Bild- und Signalverarbeitung, kĂŒnstlichen Intelligenz, numerischen Simulation, um nur einige zu nennen, zur patientenspezifischen Behandlungsplanung und zur intraoperativen UnterstĂŒtzung des OP-Teams, allen voran des Chirurgen. Vor allen Dingen Fortschritte in der rĂ€umlichen Verfolgung von Werkzeugen und Patienten ("Tracking"), die VerfĂŒgbarkeit dreidimensionaler radiologischer Aufnahmen (CT, MRT, ...) und der Einsatz verschiedener Robotersysteme haben in den vergangenen Jahrzehnten den Einzug des Computers in den Operationssaal - medienwirksam - ermöglicht. Weniger prominent, jedoch keinesfalls von untergeordnetem praktischen Nutzen, sind Beispiele zur automatisierten Überwachung klinischer Messwerte, wie etwa Blutdruck oder SauerstoffsĂ€ttigung. Im Gegensatz zu den meist hochgradig verteilten und gut miteinander verwobenen Informationssystemen fĂŒr die Krankenhausadministration und Patientendatenverwaltung, sind die Systeme der CAS heutzutage meist wenig oder ĂŒberhaupt nicht miteinander und mit Hintergrundsdatenspeichern vernetzt. Eine Reihe wissenschaftlicher Publikationen und interdisziplinĂ€rer Workshops hat sich in den vergangen ein bis zwei Jahrzehnten mit den Problemen des Alltagseinsatzes von CAS Systemen befasst. Mit steigender IntensitĂ€t wurde hierbei auf den Mangel an infrastrukturiellen Grundlagen fĂŒr die Vernetzung intraoperativ eingesetzter CAS Systeme miteinander und mit den perioperativ eingesetzten Planungs-, Dokumentations- und Archivierungssystemen hingewiesen. Die sich daraus ergebenden negativen EinflĂŒsse auf die Effizienz perioperativer AblĂ€ufe - jedes GerĂ€t muss manuell in Betrieb genommen und mit den spezifischen Daten des nĂ€chsten Patienten gefĂŒttert werden - sowie die zunehmende Aufmerksamkeit, welche der Operateur und sein Team auf die Überwachung und dem Betrieb der einzelnen GerĂ€te verwenden muss, werden als eine der "Kinderkrankheiten" dieser relativ jungen Technologie betrachtet und stehen einer Verbreitung ĂŒber die Grenzen einer engagierten technophilen Nutzergruppe hinaus im Wege. Die vorliegende Arbeit zeigt zwei parallel von einander (jedoch, im Sinne der SchnittstellenkompatibilitĂ€t, nicht gĂ€nzlich unabhĂ€ngig voneinander) zu betreibende AnsĂ€tze zur Integration von CAS Systemen. - FĂŒr den perioperativen Datenaustausch wird die Spezifikation zusĂ€tzlicher Datenstrukturen zum Transfer chirurgischer Planungsdaten im Rahmen des in radiologischen Bildverarbeitungssystemen weit verbreiteten DICOM Standards vorgeschlagen und an zwei Beispielen vorgefĂŒhrt. Die Erweiterung des DICOM Standards fĂŒr den perioperativen Einsatz ermöglicht hierbei die nahtlose Integration chirurgischer Planungssysteme in existierende "Picture Archiving and Communication Systems" (PACS), welche in den meisten FĂ€llen auf dem DICOM Standard basieren oder zumindest damit kompatibel sind. Dadurch ist einerseits der Tatsache Rechnung getragen, dass die patientenspezifische OP-Planung in hohem Masse auf radiologischen Bildern basiert und andererseits sicher gestellt, dass die Planungsergebnisse entsprechend der geltenden Bestimmungen langfristig archiviert und gegen unbefugten Zugriff geschĂŒtzt sind - PACS Server liefern hier bereits wohlerprobte Lösungen. - FĂŒr die integration intraoperativer CAS Systeme, wie etwa Navigationssysteme, Videobildquellen oder Sensoren zur Überwachung der Vitalparameter, wird das Konzept einer "chirurgischen Middleware" vorgestellt. Unter dem Namen TiCoLi wurde eine c++ Klassenbibliothek entwickelt, auf deren Grundlage die Konfiguration von ad-hoc Netzwerken wĂ€hrend der OP-Vorbereitung mittels plug-and-play Mechanismen erleichtert wird. Nach erfolgter Konfiguration ermöglicht die TiCoLi den Austausch kontinuierlicher Datenströme sowie einzelner Datenpakete und Kommandos zwischen den Modulen einer verteilten CAS Anwendung durch ein Ethernet-basiertes Netzwerk. Die TiCoLi ist die erste frei verfĂŒgbare Klassenbibliothek welche diese FunktionalitĂ€ten dediziert fĂŒr einen Einsatz im chirurgischen Umfeld vereinigt. Zum Nachweis der Tauglichkeit der gezeigten Spezifikationen und deren Implementierungen, werden zwei modulare CAS Anwendungen prĂ€sentiert, welche die vorgeschlagenen DICOM Erweiterungen zum perioperativen Austausch von Planungsergebnissen sowie die TiCoLi zum intraoperativen Datenaustausch von Messdaten unter echzeitnahen Anforderungen verwenden

    Multimodal Integration for Natural Language Classification and Generation

    Get PDF
    Multimodal integration is a framework for building models that can accept information from different types of modalities. Due to the recent success in the Transformer model and Pre-training Fine-tuning Techniques, Vision-and-Language Pre-training Models have been heavily investigated and they achieved State-of-the-Art in various of Vision-and-Language downstream tasks, such as Visual Question Answering, Image Text Matching and Image Captioning. However, most of the previous studies focus on improving the performance of the models and only provide accessible code for research purposes. There are several existing open-source libraries such as Natural Language Toolkit, OpenCV and HuggingFace, which combine and standardise the available models and tools for easy access, but applying these libraries still requires expertise in both Deep Learning and programming. Moreover, there has been no recent research aimed at establishing user-friendly multimodal question-answering platforms for non-deep-learning users. Therefore, the question of how State-Of-The-Art multimodal models can be easily applied by professionals in other domains remains open. Apart from the first challenge, there exists another challenge in the less-common domain. Since general multimodal domains such as street view, landscape, and indoor scenes have been extensively studied with current VL-PMs, while specific domains like medicine, geography, and esports have garnered less attention. Due to the difficulties in data collection, there aren't many publicly available multimodal datasets, and those that exist tend to be small. This scarcity poses challenges for model training. Consequently, the question of how to collect a comprehensive multimodal dataset in the esports domain and how to improve domain-specific multimodal models remains open. Therefore, the main focus of this thesis is integrating multimodal information for natural language classification and generation tasks by addressing the two challenges

    Automatic extraction of robotic surgery actions from text and kinematic data

    Get PDF
    The latest generation of robotic systems is becoming increasingly autonomous due to technological advancements and artificial intelligence. The medical field, particularly surgery, is also interested in these technologies because automation would benefit surgeons and patients. While the research community is active in this direction, commercial surgical robots do not currently operate autonomously due to the risks involved in dealing with human patients: it is still considered safer to rely on human surgeons' intelligence for decision-making issues. This means that robots must possess human-like intelligence, including various reasoning capabilities and extensive knowledge, to become more autonomous and credible. As demonstrated by current research in the field, indeed, one of the most critical aspects in developing autonomous systems is the acquisition and management of knowledge. In particular, a surgical robot must base its actions on solid procedural surgical knowledge to operate autonomously, safely, and expertly. This thesis investigates different possibilities for automatically extracting and managing knowledge from text and kinematic data. In the first part, we investigated the possibility of extracting procedural surgical knowledge from real intervention descriptions available in textbooks and academic papers on the robotic-surgical domains, by exploiting Transformer-based pre-trained language models. In particular, we released SurgicBERTa, a RoBERTa-based pre-trained language model for surgical literature understanding. It has been used to detect procedural sentences in books and extract procedural elements from them. Then, with some use cases, we explored the possibilities of translating written instructions into logical rules usable for robotic planning. Since not all the knowledge required for automatizing a procedure is written in texts, we introduce the concept of surgical commonsense, showing how it relates to different autonomy levels. In the second part of the thesis, we analyzed surgical procedures from a lower granularity level, showing how each surgical gesture is associated with a given combination of kinematic data
    • 

    corecore