75 research outputs found

    Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records

    Get PDF
    Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.Comment: Proceedings of the 1st ACM WSDM Health Search and Data Mining Workshop (HSDM2020), 202

    Citicoline for treating people with acute ischemic stroke (Review)

    Get PDF
    Background Stroke is one of the leading causes of long‐lasting disability and mortality and its global burden has increased in the past two decades. Several therapies have been proposed for the recovery from, and treatment of, ischemic stroke. One of them is citicoline. This review assessed the benefits and harms of citicoline for treating patients with acute ischemic stroke. Objectives To assess the clinical benefits and harms of citicoline compared with placebo or any other control for treating people with acute ischemic stroke. Search methods We searched in the Cochrane Stroke Group Trials Register, CENTRAL, MEDLINE Ovid, Embase Ovid, LILACS until 29 January 2020. We searched the World Health Organization Clinical Trials Search Portal and ClinicalTrials.gov. Additionally, we also reviewed reference lists of the retrieved publications and review articles, and searched the websites of the US Food and Drug Administration (FDA) and European Medicines Agency (EMA). Selection criteria We included randomized controlled trials (RCTs) in any setting including participants with acute ischemic stroke. Trials were eligible for inclusion if they compared citicoline versus placebo or no intervention. Data collection and analysis We selected RCTs, assessed the risk of bias in seven domains, and extracted data by duplicate. Our primary outcomes of interest were all‐cause mortality and the degree of disability or dependence in daily activities at 90 days. We estimated risk ratios (RRs) for dichotomous outcomes. We measured statistical heterogeneity using the I² statistic. We conducted our analyses using the fixed‐effect and random‐effects model meta‐analyses. We assessed the overall quality of evidence for six pre‐specified outcomes using the GRADE approach. Main results We identified 10 RCTs including 4281 participants. In all these trials, citicoline was given either orally, intravenously, or a combination of both compared with placebo or standard care therapy. Citicoline doses ranged between 500 mg and 2000 mg per day. We assessed all the included trials as having high risk of bias. Drug companies sponsored six trials. A pooled analysis of eight trials indicates there may be little or no difference in all‐cause mortality comparing citicoline with placebo (17.3% versus 18.5%; RR 0.94, 95% CI 0.83 to 1.07; I² = 0%; low‐quality evidence due to risk of bias). Four trials showed that citicoline may not increase the proportion of patients with a moderate or lower degree of disability or dependence compared with placebo, according to the Rankin Scale (21.72% versus 19.23%; RR 1.11, 95% CI 0.97 to 1.26; I² = 1%; low‐quality evidence due to risk of bias). Meta‐analysis of three trials indicates there may be little or no difference in serious cardiovascular adverse events comparing citicoline with placebo (8.83% versus 7.77%; RR 1.04, 95% CI 0.84 to 1.29; I² = 0%; low‐quality evidence due to risk of bias). Overall, either serious or non‐serious adverse events – central nervous system, gastrointestinal, musculoskeletal, etc. – were poorly reported and harms may have been underestimated. Four trials suggested that citicoline results in no difference in functional recovery, according to the Barthel Index, compared with placebo (32.78% versus 30.70%; RR 1.03, 95% CI 0.94 to 1.13; I² = 24%; low‐quality evidence due to risk of bias). Citicoline may not increase the proportion of patients with a minor impairment (according to ≤ 1 scores in the National Institutes of Health Stroke Scale) (5 trials, 24.31% versus 22.44%; RR 1.08, 95% CI 0.96 to 1.21; I² = 27%, low‐quality evidence due to risk of bias). None of the included trials reported data on quality of life. A pre‐planned Trial Sequential Analysis suggested that no more trials may be needed for the primary outcomes. Authors' conclusions This review assessed the clinical benefits and harms of citicoline compared with placebo or any other standard treatment for people with acute ischemic stroke. The findings of the review suggest there may be little to no difference between citicoline and its controls regarding all‐cause mortality, disability or dependence in daily activities, functional recovery, neurological function and severe adverse events, based on low‐certainty evidence. None of the included trials assessed quality of life and the safety profile of citicoline remains unknown. The available evidence is of low quality due to either limitations in the design or execution of the trials.post-print982 K

    Expression of gender in the human voice: investigating the “gender code”

    Get PDF
    We can easily and reliably identify the gender of an unfamiliar interlocutor over the telephone. This is because our voice is “sexually dimorphic”: men typically speak with a lower fundamental frequency (F0 - lower pitch) and lower vocal tract resonances (ΔF – “deeper” timbre) than women. While the biological bases of these differences are well understood, and mostly down to size differences between men and women, very little is known about the extent to which we can play with these differences to accentuate or de-emphasise our perceived gender, masculinity and femininity in a range of social roles and contexts. The general aim of this thesis is to investigate the behavioural basis of gender expression in the human voice in both children and adults. More specifically, I hypothesise that, on top of the biologically determined sexual dimorphism, humans use a “gender code” consisting of vocal gestures (global F0 and ΔF adjustments) aimed at altering the gender attributes conveyed by their voice. In order to test this hypothesis, I first explore how acoustic variation of sexually dimorphic acoustic cues (F0 and ΔF) relates to physiological differences in pre-pubertal speakers (vocal tract length) and adult speakers (body height and salivary testosterone levels), and show that voice gender variation cannot be solely explained by static, biologically determined differences in vocal apparatus and body size of speakers. Subsequently, I show that both children and adult speakers can spontaneously modify their voice gender by lowering (raising) F0 and ΔF to masculinise (feminise) their voice, a key ability for the hypothesised control of voice gender. Finally, I investigate the interplay between voice gender expression and social context in relation to cultural stereotypes. I report that listeners spontaneously integrate stereotypical information in the auditory and visual domain to make stereotypical judgments about children’s gender and that adult actors manipulate their gender expression in line with stereotypical gendered notions of homosexuality. Overall, this corpus of data supports the existence of a “gender code” in human nonverbal vocal communication. This “gender code” provides not only a methodological framework with which to empirically investigate variation in voice gender and its role in expressing gender identity, but also a unifying theoretical structure to understand the origins of such variation from both evolutionary and social perspectives

    Using machine learning for automated de-identification and clinical coding of free text data in electronic medical records

    Full text link
    The widespread adoption of Electronic Medical Records (EMRs) in hospitals continues to increase the amount of patient data that are digitally stored. Although the primary use of the EMR is to support patient care by making all relevant information accessible, governments and health organisations are looking for ways to unleash the potential of these data for secondary purposes, including clinical research, disease surveillance and automation of healthcare processes and workflows. EMRs include large quantities of free text documents that contain valuable information. The greatest challenges in using the free text data in EMRs include the removal of personally identifiable information and the extraction of relevant information for specific tasks such as clinical coding. Machine learning-based automated approaches can potentially address these challenges. This thesis aims to explore and improve the performance of machine learning models for automated de-identification and clinical coding of free text data in EMRs, as captured in hospital discharge summaries, and facilitate the applications of these approaches in real-world use cases. It does so by 1) implementing an end-to-end de-identification framework using an ensemble of deep learning models; 2) developing a web-based system for de-identification of free text (DEFT) with an interactive learning loop; 3) proposing and implementing a hierarchical label-wise attention transformer model (HiLAT) for explainable International Classification of Diseases (ICD) coding; and 4) investigating the use of extreme multi-label long text transformer-based models for automated ICD coding. The key findings include: 1) An end-to-end framework using an ensemble of deep learning base-models achieved excellent performance on the de-identification task. 2) A new web-based de-identification software system (DEFT) can be readily and easily adopted by data custodians and researchers to perform de-identification of free text in EMRs. 3) A novel domain-specific transformer-based model (HiLAT) achieved state-of-the-art (SOTA) results for predicting ICD codes on a Medical Information Mart for Intensive Care (MIMIC-III) dataset comprising the discharge summaries (n=12,808) that are coded with at least one of the most 50 frequent diagnosis and procedure codes. In addition, the label-wise attention scores for the tokens in the discharge summary presented a potential explainability tool for checking the face validity of ICD code predictions. 4) An optimised transformer-based model, PLM-ICD, achieved the latest SOTA results for ICD coding on all the discharge summaries of the MIMIC-III dataset (n=59,652). The segmentation method, which split the long text consecutively into multiple small chunks, addressed the problem of applying transformer-based models to long text datasets. However, using transformer-based models on extremely large label sets needs further research. These findings demonstrate that the de-identification and clinical coding tasks can benefit from the application of machine learning approaches, present practical tools for implementing these approaches, and highlight priorities for further research

    Text Mining and Gene Expression Analysis Towards Combined Interpretation of High Throughput Data

    Get PDF
    Microarrays can capture gene expression activity for thousands of genes simultaneously and thus make it possible to analyze cell physiology and disease processes on molecular level. The interpretation of microarray gene expression experiments profits from knowledge on the analyzed genes and proteins and the biochemical networks in which they play a role. The trend is towards the development of data analysis methods that integrate diverse data types. Currently, the most comprehensive biomedical knowledge source is a large repository of free text articles. Text mining makes it possible to automatically extract and use information from texts. This thesis addresses two key aspects, biomedical text mining and gene expression data analysis, with the focus on providing high-quality methods and data that contribute to the development of integrated analysis approaches. The work is structured in three parts. Each part begins by providing the relevant background, and each chapter describes the developed methods as well as applications and results. Part I deals with biomedical text mining: Chapter 2 summarizes the relevant background of text mining; it describes text mining fundamentals, important text mining tasks, applications and particularities of text mining in the biomedical domain, and evaluation issues. In Chapter 3, a method for generating high-quality gene and protein name dictionaries is described. The analysis of the generated dictionaries revealed important properties of individual nomenclatures and the used databases (Fundel and Zimmer, 2006). The dictionaries are publicly available via a Wiki, a web service, and several client applications (Szugat et al., 2005). In Chapter 4, methods for the dictionary-based recognition of gene and protein names in texts and their mapping onto unique database identifiers are described. These methods make it possible to extract information from texts and to integrate text-derived information with data from other sources. Three named entity identification systems have been set up, two of them building upon the previously existing tool ProMiner (Hanisch et al., 2003). All of them have shown very good performance in the BioCreAtIvE challenges (Fundel et al., 2005a; Hanisch et al., 2005; Fundel and Zimmer, 2007). In Chapter 5, a new method for relation extraction (Fundel et al., 2007) is presented. It was applied on the largest collection of biomedical literature abstracts, and thus a comprehensive network of human gene and protein relations has been generated. A classification approach (Küffner et al., 2006) can be used to specify relation types further; e. g., as activating, direct physical, or gene regulatory relation. Part II deals with gene expression data analysis: Gene expression data needs to be processed so that differentially expressed genes can be identified. Gene expression data processing consists of several sequential steps. Two important steps are normalization, which aims at removing systematic variances between measurements, and quantification of differential expression by p-value and fold change determination. Numerous methods exist for these tasks. Chapter 6 describes the relevant background of gene expression data analysis; it presents the biological and technical principles of microarrays and gives an overview of the most relevant data processing steps. Finally, it provides a short introduction to osteoarthritis, which is in the focus of the analyzed gene expression data sets. In Chapter 7, quality criteria for the selection of normalization methods are described, and a method for the identification of differentially expressed genes is proposed, which is appropriate for data with large intensity variances between spots representing the same gene (Fundel et al., 2005b). Furthermore, a system is described that selects an appropriate combination of feature selection method and classifier, and thus identifies genes which lead to good classification results and show consistent behavior in different sample subgroups (Davis et al., 2006). The analysis of several gene expression data sets dealing with osteoarthritis is described in Chapter 8. This chapter contains the biomedical analysis of relevant disease processes and distinct disease stages (Aigner et al., 2006a), and a comparison of various microarray platforms and osteoarthritis models. Part III deals with integrated approaches and thus provides the connection between parts I and II: Chapter 9 gives an overview of different types of integrated data analysis approaches, with a focus on approaches that integrate gene expression data with manually compiled data, large-scale networks, or text mining. In Chapter 10, a method for the identification of genes which are consistently regulated and have a coherent literature background (Küffner et al., 2005) is described. This method indicates how gene and protein name identification and gene expression data can be integrated to return clusters which contain genes that are relevant for the respective experiment together with literature information that supports interpretation. Finally, in Chapter 11 ideas on how the described methods can contribute to current research and possible future directions are presented

    The Golgi complex as a regulatory platform for DNA Damage Response pathways

    Get PDF
    DNA damage response (DDR) is a well-characterised process, however, the majority of studies so far focus specifically on nuclear events, while the cytoplasmic and endomembrane response to DNA damage remains largely unexplored. Although to date there exists little experimental work providing evidence to suggest a role of the Golgi complex as part of the cellular DNA damage response (DDR), a few and far in-between studies have alluded to this function (Farber-Katz et al., 2014). In line with this, previous work from our lab identified a network of 15 proteins that function in various distinct DNA repair pathways, along with cell cycle and other regulatory proteins, that localise to the Golgi and the nucleus. A first in-depth characterisation of the dual-localising DDR protein RAD51C has shed some light on the potential role the Golgi complex might play in DDR (Galea et al., 2022). This work has revealed that the Golgi localisation of RAD51C is dependent on the Golgin Giantin, and strongly reduced upon DNA damage coincident with its increase in nuclear DNA damage sites. The loss of this Giantin-RAD51C interaction by siRNA-mediated depletion of Giantin leads to genomic instability and aberrant DNA repair. In light of these findings and the wide array of DDR proteins identified at the Golgi, I hypothesised that similar regulatory mechanisms would be present for various other DNA repair pathways. To assess the role of the DDR Golgi population in DNA repair response, various types of DNA lesions were induced utilising DNA damaging agents, while monitoring for any change in distribution patterns of the investigated DDR proteins. The dual-localised DDR proteins were found to change in localisation in a bi-directional manner, showing distribution from the Golgi to the nucleus or from the nucleus to the Golgi depending on the DNA damage induced and based on the proteins’ role in specific DDR pathways. A sub-organelle distribution analysis of DDR proteins within the Golgi was then performed, speculating that specific sub-organelle distribution could be important for the activation and regulation of these proteins at the organelle. The dual-localised DDR proteins were found to be heterogeneously distributed throughout the Golgi stacks and their position on the organelle was found to be correlated to their function in DDR. Additionally, this work identified several Golgins acting as anchors and regulators of these DDR proteins at the Golgi. Furthermore, proteomic-scale interaction analysis of DDR and Golgin family proteins revealed the new interactions between the Golgin and DDR proteomes, suggesting that the network of interactions between the Golgi proteins and DDR proteins might be more extensive than previously found. Altogether, the results described in this work demonstrate the presence of a complex relationship between the Golgi and DDR, identifying new interacting pathways between the Golgi proteins, in particular Golgins and DDR. These findings expand on our current knowledge of how DDR is regulated, showcasing the Golgi as a regulatory hub for DNA damage response. These findings are a beginning of a new direction in the DNA damage repair field, shifting from a nuclear-centric to a more global picture of how DNA damage response is regulated, opening up possibilities for finding new therapeutic targets

    Overcoming the Loss of Nonverbal Cues Encountered by the Adventitiously Blind: Reconstructing Relationships and Identity

    Get PDF
    In this study, couples shared their experiences adjusting to one of the members loss of sight. Through interviews, their narratives expressed their values, actions, inactions, successes, failures, needs, obstacles, and feelings. Participants explained their standpoint/perspective about vision loss, when it happened, how it affected them, how they reacted and responded, through hindsight how they thought they should have responded, and how they reconstructed a shared interpersonal relationship. Narratives about situations and events after the loss of sight revealed descriptions of their relationships and interactions with each other and other people in their circle. Through constant comparative analysis the individual narratives were compared; within a single interview, between interviews within the same group, interviews from different groups, in pairs at the level of the couple, and finally comparing couples. Eventually, the following related overarching themes emerged: communication, interdependency, identity (personal, enacted, and relational), learning curve, and deliberate living. In each of these themes, there were two distinct perspectives; a blind perspective and a sighted perspective. The analysis revealed in many ways the couples adjusted to events and crisis, as most successful couples do, but often they had to be inventive and creative to bridge the gaps created by the loss of a common and, in many cases, independent form of communication. Without the convenience of visual nonverbal cues to either be a complete message or complete the message, participants found ways to restore interpersonal communication and allow them to grow together. At some point, their identity changed, or a new identity emerged. In all of the participants’ cases, the new identity did not become the primary identity, just another part of them as a whole. In summary, the participants presented themes, identities, and coping strategies which allowed them to continue as a couple when facing a life changing event, becoming blind, and becoming the sighted partner of a blind person

    Good Research Practice in Non-Clinical Pharmacology and Biomedicine

    Get PDF
    This open access book, published under a CC BY 4.0 license in the Pubmed indexed book series Handbook of Experimental Pharmacology, provides up-to-date information on best practice to improve experimental design and quality of research in non-clinical pharmacology and biomedicine
    corecore