524 research outputs found

    Hypersensitivity Adverse Event Reporting in Clinical Cancer Trials: Barriers and Potential Solutions to Studying Severe Events on a Population Level

    Get PDF
    ABSTRACT HYPERSENSITIVITY ADVERSE EVENT REPORTING IN CLINICAL CANCER TRIALS: BARRIERS AND POTENTIAL SOLUTIONS TO STUDYING ALLERGIC EVENTS ON A POPULATION LEVEL by Christina Eldredge The University of Wisconsin-Milwaukee, 2020 Under the Supervision of Professor Timothy Patrick Clinical cancer trial interventions are associated with hypersensitivity events (HEs) which are recorded in the national clinical trial registry, ClinicalTrials.gov and publicly available. This data could potentially be leveraged to study predictors for HEs to identify at risk patients who may benefit from desensitization therapies to prevent these potentially life-threatening reactions. However, variation in investigator reporting methods is a barrier to leveraging this data for aggregation and analysis. The National Cancer Institute has developed the CTCAE classification system to address this barrier. This study analyzes the comprehensiveness of CTCAE to describe severe HEs in clinical cancer trials in comparison to other systems or terminologies. An XML parser was used to extract readable text from adverse event tables. Queries of the parsed data elements were performed to identify immune disorder events associated with biological and chemotherapy interventions. A data subset of severe anaphylactic and anaphylactoid events was created and analyzed. 1,331 clinical trials with 13088 immune disorder events occurred from September 20, 1999 to March 2018. 2409 (18.4%) of these were recorded as “serious” events. In the severe subset, MedDRA terminology, CTCAE or CTC classification systems were used to describe HEs, however, a large number of studies did not specify the system. The CTCAE term “anaphylaxis” was miscoded as “other (not including serious)” in 76.2% of events. The CTCAE classification system severity grades levels were not used to describe any of the severe events and the majority of terms did not include the allergen and therefore, in dual or multi- drug therapies, the etiologic agent was not identifiable. Furthermore, collection methods were not specified in 76% of events. Therefore, CTCAE was not found to improve the ability to capture event etiology or severity in anaphylaxis and anaphylactoid events in cancer clinical trials. Potential solutions to improving CTCAE HE description include adapting terms with a low percentage of HE severity miscoding (e.g. anaphylactic reaction) and terms which include drugs, biological agents and/or drug classes to improve study of anaphylaxis etiology and incidence in multi-drug cancer therapy, therefore, making a significant impact on patient safety

    A retrospective look at the predictions and recommendations from the 2009 AMIA Policy Meeting: Did we see EHR-related clinician burnout coming?

    Get PDF
    Clinicians often attribute much of their burnout experience to use of the electronic health record, the adoption of which was greatly accelerated by the Health Information Technology for Economic and Clinical Health Act of 2009. That same year, AMIA\u27s Policy Meeting focused on possible unintended consequences associated with rapid implementation of electronic health records, generating 17 potential consequences and 15 recommendations to address them. At the 2020 annual meeting of the American College of Medical Informatics (ACMI), ACMI fellows participated in a modified Delphi process to assess the accuracy of the 2009 predictions and the response to the recommendations. Among the findings, the fellows concluded that the degree of clinician burnout and its contributing factors, such as increased documentation requirements, were significantly underestimated. Conversely, problems related to identify theft and fraud were overestimated. Only 3 of the 15 recommendations were adjudged more than half-addressed

    Automated Detection of Substance-Use Status and Related Information from Clinical Text

    Get PDF
    This study aims to develop and evaluate an automated system for extracting information related to patient substance use (smoking, alcohol, and drugs) from unstructured clinical text (medical discharge records). The authors propose a four-stage system for the extraction of the substance-use status and related attributes (type, frequency, amount, quit-time, and period). The first stage uses a keyword search technique to detect sentences related to substance use and to exclude unrelated records. In the second stage, an extension of the NegEx negation detection algorithm is developed and employed for detecting the negated records. The third stage involves identifying the temporal status of the substance use by applying windowing and chunking methodologies. Finally, in the fourth stage, regular expressions, syntactic patterns, and keyword search techniques are used in order to extract the substance-use attributes. The proposed system achieves an F1-score of up to 0.99 for identifying substance-use-related records, 0.98 for detecting the negation status, and 0.94 for identifying temporal status. Moreover, F1-scores of up to 0.98, 0.98, 1.00, 0.92, and 0.98 are achieved for the extraction of the amount, frequency, type, quit-time, and period attributes, respectively. Natural Language Processing (NLP) and rule-based techniques are employed efficiently for extracting substance-use status and attributes, with the proposed system being able to detect substance-use status and attributes over both sentence-level and document-level data. Results show that the proposed system outperforms the compared state-of-the-art substance-use identification system on an unseen dataset, demonstrating its generalisability

    Safeguarding Privacy Through Deep Learning Techniques

    Get PDF
    Over the last few years, there has been a growing need to meet minimum security and privacy requirements. Both public and private companies have had to comply with increasingly stringent standards, such as the ISO 27000 family of standards, or the various laws governing the management of personal data. The huge amount of data to be managed has required a huge effort from the employees who, in the absence of automatic techniques, have had to work tirelessly to achieve the certification objectives. Unfortunately, due to the delicate information contained in the documentation relating to these problems, it is difficult if not impossible to obtain material for research and study purposes on which to experiment new ideas and techniques aimed at automating processes, perhaps exploiting what is in ferment in the scientific community and linked to the fields of ontologies and artificial intelligence for data management. In order to bypass this problem, it was decided to examine data related to the medical world, which, especially for important reasons related to the health of individuals, have gradually become more and more freely accessible over time, without affecting the generality of the proposed methods, which can be reapplied to the most diverse fields in which there is a need to manage privacy-sensitive information

    THaW publications

    Get PDF
    In 2013, the National Science Foundation\u27s Secure and Trustworthy Cyberspace program awarded a Frontier grant to a consortium of four institutions, led by Dartmouth College, to enable trustworthy cybersystems for health and wellness. As of this writing, the Trustworthy Health and Wellness (THaW) project\u27s bibliography includes more than 130 significant publications produced with support from the THaW grant; these publications document the progress made on many fronts by the THaW research team. The collection includes dissertations, theses, journal papers, conference papers, workshop contributions and more. The bibliography is organized as a Zotero library, which provides ready access to citation materials and abstracts and associates each work with a URL where it may be found, cluster (category), several content tags, and a brief annotation summarizing the work\u27s contribution. For more information about THaW, visit thaw.org

    Word Sense Disambiguation for clinical abbreviations

    Get PDF
    Abbreviations are extensively used in electronic health records (EHR) of patients as well as medical documentation, reaching 30-50% of the words in clinical narrative. There are more than 197,000 unique medical abbreviations found in the clinical text and their meanings vary depending on the context in which they are used. Since data in electronic health records could be shareable across health information systems (hospitals, primary care centers, etc.) as well as others such as insurance companies information systems, it is essential determining the correct meaning of the abbreviations to avoid misunderstandings. Clinical abbreviations have specific characteristic that do not follow any standard rules for creating them. This makes it complicated to find said abbreviations and corresponding meanings. Furthermore, there is an added difficulty to working with clinical data due to privacy reasons, since it is essential to have them in order to develop and test algorithms. Word sense disambiguation (WSD) is an essential task in natural language processing (NLP) applications such as information extraction, chatbots and summarization systems among others. WSD aims to identify the correct meaning of the ambiguous word which has more than one meaning. Disambiguating clinical abbreviations is a type of lexical sample WSD task. Previous research works adopted supervised, unsupervised and Knowledge-based (KB) approaches to disambiguate clinical abbreviations. This thesis aims to propose a classification model that apart from disambiguating well known abbreviations also disambiguates rare and unseen abbreviations using the most recent deep neural network architectures for language modeling. In clinical abbreviation disambiguation several resources and disambiguation models were encountered. Different classification approaches used to disambiguate the clinical abbreviations were investigated in this thesis. Considering that computers do not directly understand texts, different data representations were implemented to capture the meaning of the words. Since it is also necessary to measure the performance of algorithms, the evaluation measurements used are discussed. As the different solutions proposed to clinical WSD we have explored static word embeddings data representation on 13 English clinical abbreviations of the UMN data set (from University of Minnesota) by testing traditional supervised machine learning algorithms separately for each abbreviation. Moreover, we have utilized a transformer-base pretrained model that was fine-tuned as a multi-classification classifier for the whole data set (75 abbreviations of the UMN data set). The aim of implementing just one multi-class classifier is to predict rare and unseen abbreviations that are most common in clinical narrative. Additionally, other experiments were conducted for a different type of abbreviations (scientific abbreviations and acronyms) by defining a hybrid approach composed of supervised and knowledge-based approaches. Most previous works tend to build a separated classifier for each clinical abbreviation, tending to leverage different data resources to overcome the data acquisition bottleneck. However, those models were restricted to disambiguate terms that have been seen in trained data. Meanwhile, based on our results, transfer learning by fine-tuning a transformer-based model could predict rare and unseen abbreviations. A remaining challenge for future work is to improve the model to automate the disambiguation of clinical abbreviations on run-time systems by implementing self-supervised learning models.Las abreviaturas se utilizan ampliamente en las historias clínicas electrónicas de los pacientes y en mucha documentación médica, llegando a ser un 30-50% de las palabras empleadas en narrativa clínica. Existen más de 197.000 abreviaturas únicas usadas en textos clínicos siendo términos altamente ambiguos El significado de las abreviaturas varía en función del contexto en el que se utilicen. Dado que los datos de las historias clínicas electrónicas pueden compartirse entre servicios, hospitales, centros de atención primaria así como otras organizaciones como por ejemplo, las compañías de seguros es fundamental determinar el significado correcto de las abreviaturas para evitar además eventos adversos relacionados con la seguridad del paciente. Nuevas abreviaturas clínicas aparecen constantemente y tienen la característica específica de que no siguen ningún estándar para su creación. Esto hace que sea muy difícil disponer de un recurso con todas las abreviaturas y todos sus significados. A todo esto hay que añadir la dificultad para trabajar con datos clínicos por cuestiones de privacidad cuando es esencial disponer de ellos para poder desarrollar algoritmos para su tratamiento. La desambiguación del sentido de las palabras (WSD, en inglés) es una tarea esencial en tareas de procesamiento del lenguaje natural (PLN) como extracción de información, chatbots o generadores de resúmenes, entre otros. WSD tiene como objetivo identificar el significado correcto de una palabra ambigua (que tiene más de un significado). Esta tarea se ha abordado previamente utilizando tanto enfoques supervisados, no supervisados así como basados en conocimiento. Esta tesis tiene como objetivo definir un modelo de clasificación que además de desambiguar abreviaturas conocidas desambigüe también abreviaturas menos frecuentes que no han aparecido previamente en los conjuntos de entrenaminto utilizando las arquitecturas de redes neuronales profundas más recientes relacionadas ocn los modelos del lenguaje. En la desambiguación de abreviaturas clínicas se emplean diversos recursos y modelos de desambiguación. Se han investigado los diferentes enfoques de clasificación utilizados para desambiguar las abreviaturas clínicas. Dado que un ordenador no comprende directamente los textos, se han implementado diferentes representaciones de textos para capturar el significado de las palabras. Puesto que también es necesario medir el desempeño de cualquier algoritmo, se describen también las medidas de evaluación utilizadas. La mayoría de los trabajos previos se han basado en la construcción de un clasificador separado para cada abreviatura clínica. De este modo, tienden a aprovechar diferentes recursos de datos para superar el cuello de botella de la adquisición de datos. Sin embargo, estos modelos se limitaban a desambiguar con los datos para los que el sistema había sido entrenado. Se han explorado además representaciones basadas vectores de palabras (word embeddings) estáticos para 13 abreviaturas clínicas en el corpus UMN en inglés (de la University of Minnesota) utilizando algoritmos de clasificación tradicionales de aprendizaje automático supervisados (un clasificador por cada abreviatura). Se ha llevado a cabo un segundo experimento utilizando un modelo multi-clasificador sobre todo el conjunto de las 75 abreviaturas del corpus UMN basado en un modelo Transformer pre-entrenado. El objetivo ha sido implementar un clasificador multiclase para predecir también abreviaturas raras y no vistas. Se realizó un experimento adicional para siglas científicas en documentos de dominio abierto mediante la aplicación de un enfoque híbrido compuesto por enfoques supervisados y basados en el conocimiento. Así, basándonos en los resultados de esta tesis, el aprendizaje por transferencia (transfer learning) mediante el ajuste (fine-tuning) de un modelo de lenguaje preentrenado podría predecir abreviaturas raras y no vistas sin necesidad de entrenarlas previamente. Un reto pendiente para el trabajo futuro es mejorar el modelo para automatizar la desambiguación de las abreviaturas clínicas en tiempo de ejecución mediante la implementación de modelos de aprendizaje autosupervisados.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Israel González Carrasco.- Secretario: Leonardo Campillos Llanos.- Vocal: Ana María García Serran

    Terminology Services: Standard Terminologies to Control Medical Vocabulary. “Words are Not What they Say but What they Mean”

    Get PDF
    Data entry is an obstacle for the usability of electronic health records (EHR) applications and the acceptance of physicians, who prefer to document using “free text”. Natural language is huge and very rich in details but at the same time is ambiguous; it has great dependence on context and uses jargon and acronyms. Healthcare Information Systems should capture clinical data in a structured and preferably coded format. This is crucial for data exchange between health information systems, epidemiological analysis, quality and research, clinical decision support systems, administrative functions, etc. In order to address this point, numerous terminological systems for the systematic recording of clinical data have been developed. These systems interrelate concepts of a particular domain and provide reference to related terms and possible definitions and codes. The purpose of terminology services consists of representing facts that happen in the real world through database management. This process is named Semantic Interoperability. It implies that different systems understand the information they are processing through the use of codes of clinical terminologies. Standard terminologies allow controlling medical vocabulary. But how do we do this? What do we need? Terminology services are a fundamental piece for health data management in health environment
    corecore