3 research outputs found

    The application of medical terminologies to free-text in routine databases using the example of strategies to reduce infant mortality

    Get PDF
    Hintergrund Die Säuglingssterblichkeitsrate (IMR), ein wichtiger Indikator für die Qualität eines Gesundheitssystems, liegt in Deutschland seit 10 Jahren bei rund 3.5‰. Generische Qualitätsindikatoren (QIs), wie sie seit 2010 in Deutschland verwendet werden, tragen wesentlich zu einem so guten Wert bei, scheinen aber nicht in der Lage zu sein, den IMR weiter zu reduzieren. Die neonatale Sterblichkeitsrate (NMR) trägt zu 65-70% der IMR bei. Der vorgestellte Ansatz schlägt daher eine Einzelfallanalyse neonataler Todesfälle auf der Grundlage von Krankenakten vor. Die meisten elektronischen Krankenakten enthalten noch immer große Mengen an Freitextdaten. Die semantische Auswertung solcher Daten erfordert, dass die Daten mit ausreichenden Klassifizierungen kodiert oder in eine wissensbasierte Datenbank umgewandelt werden. Methodik Die Nordic-Baltic-Classification (NBC) wurde zur Erkennung vermeidbarer neonataler Todesfälle verwendet. Diese Klassifikation wurde auf eine Stichprobe von 1.968 neonatalen Todesfällen angewandt, die über 90% aller neonatalen Todesfälle in Ost-Berlin von 1973 bis 1989 darstellen. Alle Fälle wurden damals von einer speziellen Kommission verschiedener Experten auf der Grundlage der vollständigen perinatalen und klinischen Daten auf ihre Vermeidbarkeit hin analysiert. Der entwickelte Ansatz ermöglicht es, Datenbanken, die über SQL (Structured Query Language) zugänglich sind, direkt über semantische Abfragen zu durchsuchen, ohne dass weitere Transformationen erforderlich sind. Dazu wurden 1.) eine Erweiterung von SQL „Ontology-SQL“ (O-SQL) entwickelt, die es ermöglicht, semantische Ausdrücke zu verwenden, 2.) ein Framework entwickelt, das einen Standardterminologieserver verwendet, um Freitext enthaltende Datenbanktabellen zu annotieren und 3.) ein Parser entwickelt, der O-SQL Ausdrücke in SQL konvertiert, so dass semantische Abfragen direkt an den Datenbankserver weitergeleitet werden können. Ergebnisse Die NBC wurde verwendet, um die Gruppe der Fälle auszuwählen, die ein hohes Vermeidungspotenzial hatten. Die ausgewählte Gruppe stellte 6,0% aller Fälle dar und 60,4% der Fälle innerhalb dieser Gruppe wurden tatsächlich als vermeidbar oder bedingt vermeidbar beurteilt. Die automatische Erkennung von Fehlbildungen ergab einen F1-Wert von 0,94. Darüber hinaus wurde die Verallgemeinerbarkeit des Ansatzes mit verschiedenen semantischen Abfragen nachgewiesen und dessen Güte mit F1-Werten von 0,91 bis 0,98 gemessen. Zusammenfassung Die Ergebnisse zeigen, dass die vorgestellte Methode automatisch anwendbar ist und ein leistungsfähiges und hochsensitives und -spezifisches Werkzeug zur Auswahl potenziell vermeidbarer neonataler Todesfälle und damit zur Unterstützung einer effizienten Einzelfallanalyse darstellt. Die nahtlose Verknüpfung von Ontologien und Standardtechnologien aus dem Datenbankbereich stellt einen wichtigen Bestandteil der unstrukturierten Datenanalyse dar. Die entwickelte Technologie lässt sich problemlos auf aktuelle Daten anwenden und unterstützt das immer wichtiger werdende Feld der translationalen Forschung.Background The infant mortality rate (IMR), a key indicator of the quality of a healthcare system, has remained at approximately 3.5‰ for the past 10 years in Germany. Generic quality indicators (QIs), as used in Germany since 2010, greatly help to ensure such a good value but do not seem to be able to further reduce the IMR. The neonatal mortality rate (NMR) contributes to 65-70% of the IMR. The presented approach therefore proposes single-case analysis of neonatal deaths on base of medical records. Most electronic medical records still contain large amounts of free-text data. Semantic evaluation of such data requires the data to be encoded with sufficient classifications or transformed into a knowledge-based database. Methods The Nordic-Baltic classification (NBC) was used to detect avoidable neonatal deaths. This classification has been applied to a sample of 1,968 neonatal death records, which represent over 90% of all neonatal deaths in East Berlin from 1973 to 1989. All cases were analyzed as to their preventability based on the complete perinatal and clinical data by a special commission of different experts. The developed approach allows databases accessible via SQL (Structured Query Language) to be searched directly through semantic queries without the need for further transformations. Therefore, I) an extension to SQL named Ontology-SQL (O-SQL) that allows to use semantic expressions, II) a framework that uses a standard terminology server to annotate free-text containing database tables and III) a parser that rewrites O-SQL to SQL, so that such queries can be passed to the database server, have been developed. Results The NBC was used to select the group of cases that had a high potential of avoidance. The selected group represented 6.0% of all cases, and 60.4% of the cases within that group were judged avoidable or conditionally avoidable. The automatic detection of malformations showed an F1 score of 0.94. Furthermore, the generability has been proved with different semantic queries and was measured with between 0.91 and 0.98. Conclusion The results show, that the presented method can be applied automatically and is a powerful and highly specific tool for selecting potentially avoidable neonatal deaths and thus for supporting efficient single case analysis. The seamless connection of ontologies and standard technologies from the database field represents an important constituent of unstructured data analysis. The developed technology can be readily applied to current data and supports the increasingly important field of translational research

    Studies on User Intent Analysis and Mining

    Get PDF
    Predicting the goals of users can be extremely useful in e-commerce, online entertainment, information retrieval, and many other online services and applications. In this thesis, we study the task of user intent understanding, trying to bridge the gap between user expressions to online services and their goals behind it. As far as we know, most of the existing user intent studies are focusing on web search and social media domain. Studies on other areas are not enough. For example, as people more and more rely our daily life on cellphone, our information needs expressing to mobile devices and related services are increasing dramatically. Studies of user intent mining on mobile devices are not much. And the intentions of using mobile devices are different from the ones we use web search engine or social network. So we cannot directly apply the existing user intention to this area. Besides, user's intents are not stable but changing over time. And different interests will impact each other. Modeling such kind of dynamic user interests can help accurately understand and predict user's intent. But there're few existing works in this area. Moreover, user intent could be explicitly or implicitly expressed by users. The implicit intent expression is more close to human's natural language and also have great value to recognize and mine. To make further studies of these challenges, we first try to answer the question of “What is the user intent?” By referring amount of previous studies, we give our definition of user intent as “User intent is a task-specific, predefined or latent concept, topic or knowledge-base that is under an expression from a user who is trying to express his goal of information or service need.“ Then, we focus on the driving scenario when a user using cellphone and study the user intent in this domain. As far as we know, it is the first time of user intent analysis and categorization in this domain. And we also build a dataset of user input and related intent category and attributes by crowdsourcing and carefully handcraft. With the user intent taxonomy and dataset in hand, we conduct a user intent classification and user intent attribute recognition by supervised machine learning models. To classify the user intent for a user intent query, we use a convolutional neural network model to build a multi-class classifier. And then we use a sequential labeling method to recognize the intent attribute in the query. The experiment results show that our proposed method outperforms several baseline models in precision, recall, and F-score. In addition, we study the implicit user intent mining method through web search log data. By using a Restricted Boltzmann Machine, we make use of the correlation of query and click information to learn the latent intent behind a user web search. We propose a user intent prediction model on online discussion forum using Multivariate Hawkes Process. It dynamically models user intentions change and interact over time.The method models both of the internal and external factors of user's online forum response motivations, and also integrated the time decay fact of user's interests. We also present a data visualization method, using an enriched domain ontology to highlight the domain-specific words and entity relations within an article.Ph.D., Information Studies -- Drexel University, 201
    corecore