106 research outputs found

    Automatic Population of Structured Reports from Narrative Pathology Reports

    Get PDF
    There are a number of advantages for the use of structured pathology reports: they can ensure the accuracy and completeness of pathology reporting; it is easier for the referring doctors to glean pertinent information from them. The goal of this thesis is to extract pertinent information from free-text pathology reports and automatically populate structured reports for cancer diseases and identify the commonalities and differences in processing principles to obtain maximum accuracy. Three pathology corpora were annotated with entities and relationships between the entities in this study, namely the melanoma corpus, the colorectal cancer corpus and the lymphoma corpus. A supervised machine-learning based-approach, utilising conditional random fields learners, was developed to recognise medical entities from the corpora. By feature engineering, the best feature configurations were attained, which boosted the F-scores significantly from 4.2% to 6.8% on the training sets. Without proper negation and uncertainty detection, the quality of the structured reports will be diminished. The negation and uncertainty detection modules were built to handle this problem. The modules obtained overall F-scores ranging from 76.6% to 91.0% on the test sets. A relation extraction system was presented to extract four relations from the lymphoma corpus. The system achieved very good performance on the training set, with 100% F-score obtained by the rule-based module and 97.2% F-score attained by the support vector machines classifier. Rule-based approaches were used to generate the structured outputs and populate them to predefined templates. The rule-based system attained over 97% F-scores on the training sets. A pipeline system was implemented with an assembly of all the components described above. It achieved promising results in the end-to-end evaluations, with 86.5%, 84.2% and 78.9% F-scores on the melanoma, colorectal cancer and lymphoma test sets respectively

    Mapping of electronic health records in Spanish to the unified medical language system metathesaurus

    Get PDF
    [EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta

    Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web

    Get PDF
    The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The combination of this expertise, and the time and space afforded the consortium by the IRC structure, suggested the opportunity for a concerted effort to develop an approach to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to the knowledge management services AKT tries to provide. As a medium for the semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing ontologies to create a third). Ontology mapping, and the elimination of conflicts of reference, will be important tasks. All of these issues are discussed along with our proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which semantic hygiene prevails interesting enough to reason in? These and many other questions need to be addressed if we are to provide effective knowledge technologies for our content on the web

    Mapping of electronic health records in Spanish to the unified medical language system metathesaurus

    Get PDF
    [EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta

    Type Theories and Lexical Networks: using Serious Games as the basis for Multi-Sorted Typed Systems

    Get PDF
    In this paper, we show how a rich lexico-semantic network which has been built using serious games, JeuxDeMots, can help us in grounding our semantic ontologies as well as different sorts of information in doing formal semantics using rich or modern type theories (type theories within the tradition of Martin Löf). We discuss the domain of base types, adjectival and verbal types, hyperonymy/hyponymy relations as well as more advanced issues like homophony and polysemy. We show how one can take advantage of this wealth in a formal compositional semantics framework. This is a way to sidestep the problem of deciding how your type ontology should look like once you have made a move to a many sorted type system. Furthermore, we show how this kind of information can be extracted  from JeuxdeMots and inserted into a proof-assistant like Coq in order to perform reasoning tasks using modern type theoretic semantics

    Domain-specific word embeddings for ICD-9-CM classification

    Get PDF
    In this work we evaluate domain-speciïżœc embedding models induced from textual resources in the medical domain. The International Classiïżœcation of Diseases (ICD) is a standard, broadly used classiïżœcation system, that codes a large number of speciïżœc diseases, symptoms, injuries and medical procedures into numerical classes. Assigning a code to a clinical case means classifying that case into one or more particular discrete class, hence allowing further statistics studies and automated calculations. The possibility to have a discrete code instead of a text in natural language is intuitively a great advantage for data processing systems. The use of such classiïżœcation is becoming increasingly important for, but not limited to, economic and policy-making purposes. Experiments show that domain-speciïżœc word embeddings, instead of a general one, improves classiïżœers in terms of frequency similarities between words

    Sistemas interativos e distribuĂ­dos para telemedicina

    Get PDF
    doutoramento CiĂȘncias da ComputaçãoDurante as Ășltimas dĂ©cadas, as organizaçÔes de saĂșde tĂȘm vindo a adotar continuadamente as tecnologias de informação para melhorar o funcionamento dos seus serviços. Recentemente, em parte devido Ă  crise financeira, algumas reformas no sector de saĂșde incentivaram o aparecimento de novas soluçÔes de telemedicina para otimizar a utilização de recursos humanos e de equipamentos. Algumas tecnologias como a computação em nuvem, a computação mĂłvel e os sistemas Web, tĂȘm sido importantes para o sucesso destas novas aplicaçÔes de telemedicina. As funcionalidades emergentes de computação distribuĂ­da facilitam a ligação de comunidades mĂ©dicas, promovem serviços de telemedicina e a colaboração em tempo real. TambĂ©m sĂŁo evidentes algumas vantagens que os dispositivos mĂłveis podem introduzir, tais como facilitar o trabalho remoto a qualquer hora e em qualquer lugar. Por outro lado, muitas funcionalidades que se tornaram comuns nas redes sociais, tais como a partilha de dados, a troca de mensagens, os fĂłruns de discussĂŁo e a videoconferĂȘncia, tĂȘm o potencial para promover a colaboração no sector da saĂșde. Esta tese teve como objetivo principal investigar soluçÔes computacionais mais ĂĄgeis que permitam promover a partilha de dados clĂ­nicos e facilitar a criação de fluxos de trabalho colaborativos em radiologia. AtravĂ©s da exploração das atuais tecnologias Web e de computação mĂłvel, concebemos uma solução ubĂ­qua para a visualização de imagens mĂ©dicas e desenvolvemos um sistema colaborativo para a ĂĄrea de radiologia, baseado na tecnologia da computação em nuvem. Neste percurso, foram investigadas metodologias de mineração de texto, de representação semĂąntica e de recuperação de informação baseada no conteĂșdo da imagem. Para garantir a privacidade dos pacientes e agilizar o processo de partilha de dados em ambientes colaborativos, propomos ainda uma metodologia que usa aprendizagem automĂĄtica para anonimizar as imagens mĂ©dicasDuring the last decades, healthcare organizations have been increasingly relying on information technologies to improve their services. At the same time, the optimization of resources, both professionals and equipment, have promoted the emergence of telemedicine solutions. Some technologies including cloud computing, mobile computing, web systems and distributed computing can be used to facilitate the creation of medical communities, and the promotion of telemedicine services and real-time collaboration. On the other hand, many features that have become commonplace in social networks, such as data sharing, message exchange, discussion forums, and a videoconference, have also the potential to foster collaboration in the health sector. The main objective of this research work was to investigate computational solutions that allow us to promote the sharing of clinical data and to facilitate the creation of collaborative workflows in radiology. By exploring computing and mobile computing technologies, we have designed a solution for medical imaging visualization, and developed a collaborative system for radiology, based on cloud computing technology. To extract more information from data, we investigated several methodologies such as text mining, semantic representation, content-based information retrieval. Finally, to ensure patient privacy and to streamline the data sharing in collaborative environments, we propose a machine learning methodology to anonymize medical images

    Word Sense Disambiguation for clinical abbreviations

    Get PDF
    Abbreviations are extensively used in electronic health records (EHR) of patients as well as medical documentation, reaching 30-50% of the words in clinical narrative. There are more than 197,000 unique medical abbreviations found in the clinical text and their meanings vary depending on the context in which they are used. Since data in electronic health records could be shareable across health information systems (hospitals, primary care centers, etc.) as well as others such as insurance companies information systems, it is essential determining the correct meaning of the abbreviations to avoid misunderstandings. Clinical abbreviations have specific characteristic that do not follow any standard rules for creating them. This makes it complicated to find said abbreviations and corresponding meanings. Furthermore, there is an added difficulty to working with clinical data due to privacy reasons, since it is essential to have them in order to develop and test algorithms. Word sense disambiguation (WSD) is an essential task in natural language processing (NLP) applications such as information extraction, chatbots and summarization systems among others. WSD aims to identify the correct meaning of the ambiguous word which has more than one meaning. Disambiguating clinical abbreviations is a type of lexical sample WSD task. Previous research works adopted supervised, unsupervised and Knowledge-based (KB) approaches to disambiguate clinical abbreviations. This thesis aims to propose a classification model that apart from disambiguating well known abbreviations also disambiguates rare and unseen abbreviations using the most recent deep neural network architectures for language modeling. In clinical abbreviation disambiguation several resources and disambiguation models were encountered. Different classification approaches used to disambiguate the clinical abbreviations were investigated in this thesis. Considering that computers do not directly understand texts, different data representations were implemented to capture the meaning of the words. Since it is also necessary to measure the performance of algorithms, the evaluation measurements used are discussed. As the different solutions proposed to clinical WSD we have explored static word embeddings data representation on 13 English clinical abbreviations of the UMN data set (from University of Minnesota) by testing traditional supervised machine learning algorithms separately for each abbreviation. Moreover, we have utilized a transformer-base pretrained model that was fine-tuned as a multi-classification classifier for the whole data set (75 abbreviations of the UMN data set). The aim of implementing just one multi-class classifier is to predict rare and unseen abbreviations that are most common in clinical narrative. Additionally, other experiments were conducted for a different type of abbreviations (scientific abbreviations and acronyms) by defining a hybrid approach composed of supervised and knowledge-based approaches. Most previous works tend to build a separated classifier for each clinical abbreviation, tending to leverage different data resources to overcome the data acquisition bottleneck. However, those models were restricted to disambiguate terms that have been seen in trained data. Meanwhile, based on our results, transfer learning by fine-tuning a transformer-based model could predict rare and unseen abbreviations. A remaining challenge for future work is to improve the model to automate the disambiguation of clinical abbreviations on run-time systems by implementing self-supervised learning models.Las abreviaturas se utilizan ampliamente en las historias clĂ­nicas electrĂłnicas de los pacientes y en mucha documentaciĂłn mĂ©dica, llegando a ser un 30-50% de las palabras empleadas en narrativa clĂ­nica. Existen mĂĄs de 197.000 abreviaturas Ășnicas usadas en textos clĂ­nicos siendo tĂ©rminos altamente ambiguos El significado de las abreviaturas varĂ­a en funciĂłn del contexto en el que se utilicen. Dado que los datos de las historias clĂ­nicas electrĂłnicas pueden compartirse entre servicios, hospitales, centros de atenciĂłn primaria asĂ­ como otras organizaciones como por ejemplo, las compañías de seguros es fundamental determinar el significado correcto de las abreviaturas para evitar ademĂĄs eventos adversos relacionados con la seguridad del paciente. Nuevas abreviaturas clĂ­nicas aparecen constantemente y tienen la caracterĂ­stica especĂ­fica de que no siguen ningĂșn estĂĄndar para su creaciĂłn. Esto hace que sea muy difĂ­cil disponer de un recurso con todas las abreviaturas y todos sus significados. A todo esto hay que añadir la dificultad para trabajar con datos clĂ­nicos por cuestiones de privacidad cuando es esencial disponer de ellos para poder desarrollar algoritmos para su tratamiento. La desambiguaciĂłn del sentido de las palabras (WSD, en inglĂ©s) es una tarea esencial en tareas de procesamiento del lenguaje natural (PLN) como extracciĂłn de informaciĂłn, chatbots o generadores de resĂșmenes, entre otros. WSD tiene como objetivo identificar el significado correcto de una palabra ambigua (que tiene mĂĄs de un significado). Esta tarea se ha abordado previamente utilizando tanto enfoques supervisados, no supervisados asĂ­ como basados en conocimiento. Esta tesis tiene como objetivo definir un modelo de clasificaciĂłn que ademĂĄs de desambiguar abreviaturas conocidas desambigĂŒe tambiĂ©n abreviaturas menos frecuentes que no han aparecido previamente en los conjuntos de entrenaminto utilizando las arquitecturas de redes neuronales profundas mĂĄs recientes relacionadas ocn los modelos del lenguaje. En la desambiguaciĂłn de abreviaturas clĂ­nicas se emplean diversos recursos y modelos de desambiguaciĂłn. Se han investigado los diferentes enfoques de clasificaciĂłn utilizados para desambiguar las abreviaturas clĂ­nicas. Dado que un ordenador no comprende directamente los textos, se han implementado diferentes representaciones de textos para capturar el significado de las palabras. Puesto que tambiĂ©n es necesario medir el desempeño de cualquier algoritmo, se describen tambiĂ©n las medidas de evaluaciĂłn utilizadas. La mayorĂ­a de los trabajos previos se han basado en la construcciĂłn de un clasificador separado para cada abreviatura clĂ­nica. De este modo, tienden a aprovechar diferentes recursos de datos para superar el cuello de botella de la adquisiciĂłn de datos. Sin embargo, estos modelos se limitaban a desambiguar con los datos para los que el sistema habĂ­a sido entrenado. Se han explorado ademĂĄs representaciones basadas vectores de palabras (word embeddings) estĂĄticos para 13 abreviaturas clĂ­nicas en el corpus UMN en inglĂ©s (de la University of Minnesota) utilizando algoritmos de clasificaciĂłn tradicionales de aprendizaje automĂĄtico supervisados (un clasificador por cada abreviatura). Se ha llevado a cabo un segundo experimento utilizando un modelo multi-clasificador sobre todo el conjunto de las 75 abreviaturas del corpus UMN basado en un modelo Transformer pre-entrenado. El objetivo ha sido implementar un clasificador multiclase para predecir tambiĂ©n abreviaturas raras y no vistas. Se realizĂł un experimento adicional para siglas cientĂ­ficas en documentos de dominio abierto mediante la aplicaciĂłn de un enfoque hĂ­brido compuesto por enfoques supervisados y basados en el conocimiento. AsĂ­, basĂĄndonos en los resultados de esta tesis, el aprendizaje por transferencia (transfer learning) mediante el ajuste (fine-tuning) de un modelo de lenguaje preentrenado podrĂ­a predecir abreviaturas raras y no vistas sin necesidad de entrenarlas previamente. Un reto pendiente para el trabajo futuro es mejorar el modelo para automatizar la desambiguaciĂłn de las abreviaturas clĂ­nicas en tiempo de ejecuciĂłn mediante la implementaciĂłn de modelos de aprendizaje autosupervisados.Programa de Doctorado en Ciencia y TecnologĂ­a InformĂĄtica por la Universidad Carlos III de MadridPresidente: Israel GonzĂĄlez Carrasco.- Secretario: Leonardo Campillos Llanos.- Vocal: Ana MarĂ­a GarcĂ­a Serran

    Quantitative Multimodal Mapping Of Seizure Networks In Drug-Resistant Epilepsy

    Get PDF
    Over 15 million people worldwide suffer from localization-related drug-resistant epilepsy. These patients are candidates for targeted surgical therapies such as surgical resection, laser thermal ablation, and neurostimulation. While seizure localization is needed prior to surgical intervention, this process is challenging, invasive, and often inconclusive. In this work, I aim to exploit the power of multimodal high-resolution imaging and intracranial electroencephalography (iEEG) data to map seizure networks in drug-resistant epilepsy patients, with a focus on minimizing invasiveness. Given compelling evidence that epilepsy is a disease of distorted brain networks as opposed to well-defined focal lesions, I employ a graph-theoretical approach to map structural and functional brain networks and identify putative targets for removal. The first section focuses on mesial temporal lobe epilepsy (TLE), the most common type of localization-related epilepsy. Using high-resolution structural and functional 7T MRI, I demonstrate that noninvasive neuroimaging-based network properties within the medial temporal lobe can serve as useful biomarkers for TLE cases in which conventional imaging and volumetric analysis are insufficient. The second section expands to all forms of localization-related epilepsy. Using iEEG recordings, I provide a framework for the utility of interictal network synchrony in identifying candidate resection zones, with the goal of reducing the need for prolonged invasive implants. In the third section, I generate a pipeline for integrated analysis of iEEG and MRI networks, paving the way for future large-scale studies that can effectively harness synergy between different modalities. This multimodal approach has the potential to provide fundamental insights into the pathology of an epileptic brain, robustly identify areas of seizure onset and spread, and ultimately inform clinical decision making
    • 

    corecore