54 research outputs found

    Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval

    Get PDF
    Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu. Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände. In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval. Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten. Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt

    Evidence in Practice – A Pilot Study Leveraging Companion Animal and Equine Health Data from Primary Care Veterinary Clinics in New Zealand

    Get PDF
    Veterinary practitioners have extensive knowledge of animal health from their day-to-day observations of clinical patients. There have been several recent initiatives to capture these data from electronic medical records for use in national surveillance systems and clinical research. In response, an approach to surveillance has been evolving that leverages existing computerized veterinary practice management systems to capture animal health data recorded by veterinarians. Work in the United Kingdom within the VetCompass program utilizes routinely recorded clinical data with the addition of further standardized fields. The current study describes a prototype system that was developed based on this approach. In a 4-week pilot study in New Zealand, clinical data on presentation reasons and diagnoses from a total of 344 patient consults were extracted from two veterinary clinics into a dedicated database and analyzed at the population level. New Zealand companion animal and equine veterinary practitioners were engaged to test the feasibility of this national practice-based health information and data system. Strategies to ensure continued engagement and submission of quality data by participating veterinarians were identified, as were important considerations for transitioning the pilot program to a sustainable large-scale and multi-species surveillance system that has the capacity to securely manage big data. The results further emphasized the need for a high degree of usability and smart interface design to make such a system work effectively in practice. The geospatial integration of data from multiple clinical practices into a common operating picture can be used to establish the baseline incidence of disease in New Zealand companion animal and equine populations, detect unusual trends that may indicate an emerging disease threat or welfare issue, improve the management of endemic and exotic infectious diseases, and support research activities. This pilot project is an important step toward developing a national surveillance system for companion animals and equines that moves beyond emerging infectious disease detection to provide important animal health information that can be used by a wide range of stakeholder groups, including participating veterinary practices

    Modeling controlled vocabularies using OODBs and multilevel area diagrams

    Get PDF
    A Controlled Vocabulary (CV) is a software system of domain knowledge that consolidates and unifies the terminology of a large application domain. With a common, centralized CV, costly and time-consuming translations can be eliminated between pairs of organizations and pairs of software systems. Unfortunately, the more knowledge we put into a CV, the harder it is to understand and maintain it. In this dissertation, a comprehensive theoretical methodology for modeling CVs using Object-Oriented Database (OODB) technology is presented. We present two methods for representing a semantic network CV as an equivalent OODB, which we call an Object-Oriented Vocabulary Repository (OOVR). The first method, based on a structural analysis and partitioning of the CV, yields an OODB with a very concise schema, referred to as the OOVR schema. Due to its compact size, the schema can be displayed on one or a few computer screens and serves as an aid for comprehending and maintaining the CV. A program called the Object-Oriented Vocabulary Repository Generator (OOVR Generator) has been built to automatically generate an OOVR for a given semantic network CV. Our second methodology results in a larger schema, which, however, serves as an important tool for browsing and navigation through a CV. The OODB schemas created by both methodologies provide important abstract views of CVs. We have also defined a new type of semantic relationships called IS-A\u27 in the context of an OOVR representation. The IS-A\u27 relationships are defined on OOVR schemas to reflect certain important IS-A relationships in the underlying CV. The two OOVR representations exhibit several interesting theoretical characteristics which are formally proven in this dissertation. To provide an environment with several abstract views of a CV, we also define a paradigm called Multilevel Area Diagrams (MLADs). A MLAD is a collection of different partitions of increasing detail and decreasing abstraction derived from a CV. Users can browse at one level and then switch to another level to continue their navigation. Examples of browsing sessions are presented to show that the MLAD paradigm provides processing capabilities beyond those of a traditional object-oriented representation of a vocabulary

    Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective

    Get PDF
    Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them

    Machine understanding surgical actions from intervention procedure textbooks

    Get PDF
    The automatic extraction of procedural surgical knowledge from surgery manuals, academic papers or other high-quality textual resources, is of the utmost importance to develop knowledge-based clinical decision support systems, to automatically execute some procedure’s step or to summarize the procedural information, spread throughout the texts, in a structured form usable as a study resource by medical students. In this work, we propose a first benchmark on extracting detailed surgical actions from available intervention procedure textbooks and papers. We frame the problem as a Semantic Role Labeling task. Exploiting a manually annotated dataset, we apply different Transformer-based information extraction methods. Starting from RoBERTa and BioMedRoBERTa pre-trained language models, we first investigate a zero-shot scenario and compare the obtained results with a full fine-tuning setting. We then introduce a new ad-hoc surgical language model, named SurgicBERTa, pre-trained on a large collection of surgical materials, and we compare it with the previous ones. In the assessment, we explore different dataset splits (one in-domain and two out-of-domain) and we investigate also the effectiveness of the approach in a few-shot learning scenario. Performance is evaluated on three correlated sub-tasks: predicate disambiguation, semantic argument disambiguation and predicate-argument disambiguation. Results show that the fine-tuning of a pre-trained domain-specific language model achieves the highest performance on all splits and on all sub-tasks. All models are publicly released
    corecore