12,914 research outputs found

    Race in the Life Sciences: An Empirical Assessment, 1950-2000

    Get PDF
    The mainstream narrative regarding the evolution of race as an idea in the scientific community is that biological understandings of race dominated throughout the nineteenth and twentieth centuries up until World War II, after which a social constructionist approach is thought to have taken hold. Many believe that the horrific outcomes of the most notorious applications of biological race—eugenics and the Holocaust—moved scientists away from thinking that race reflects inherent differences and toward an understanding that race is a largely social, cultural, and political phenomenon. This understanding of the evolution of race as a scientific idea informed the way that many areas of law conceptualize human equality, including civil rights, human rights, and constitutional law. This Article provides one of the first large-scale empirical assessments of publications in peer-reviewed biomedical and life science journals to examine whether biological theories of race actually lost credibility in the life sciences after World War II. We find that biological theories of race transformed yet persisted in the dominant academic discourse up through modern times—a finding that contradicts the central narrative that the life sciences became “color-blind” or “post-racial” several decades ago. The continued salience of biological race in the life sciences suggests that more attention needs to be paid to the questionable assumptions driving this research on biological race and its potential spillover effects, i.e., how persisting claims of biological race in the scientific literature might reconstitute its significance in law and society in a manner that may be harmful to racial minorities

    Toward a process theory of entrepreneurship: revisiting opportunity identification and entrepreneurial actions

    Get PDF
    This dissertation studies the early development of new ventures and small business and the entrepreneurship process from initial ideas to viable ventures. I unpack the micro-foundations of entrepreneurial actions and new ventures’ investor communications through quality signals to finance their growth path. This dissertation includes two qualitative papers and one quantitative study. The qualitative papers employ an inductive multiple-case approach and include seven medical equipment manufacturers (new ventures) in a nascent market context (the mobile health industry) across six U.S. states and a secondary data analysis to understand the emergence of opportunities and the early development of new ventures. The quantitative research chapter includes 770 IPOs in the manufacturing industries in the U.S. and investigates the legitimation strategies of young ventures to gain resources from targeted resource-holders.Open Acces

    Extreme multi-label deep neural classification of Spanish health records according to the International Classification of Diseases

    Get PDF
    111 p.Este trabajo trata sobre la minería de textos clínicos, un campo del Procesamiento del Lenguaje Natural aplicado al dominio biomédico. El objetivo es automatizar la tarea de codificación médica. Los registros electrónicos de salud (EHR) son documentos que contienen información clínica sobre la salud de unpaciente. Los diagnósticos y procedimientos médicos plasmados en la Historia Clínica Electrónica están codificados con respecto a la Clasificación Internacional de Enfermedades (CIE). De hecho, la CIE es la base para identificar estadísticas de salud internacionales y el estándar para informar enfermedades y condiciones de salud. Desde la perspectiva del aprendizaje automático, el objetivo es resolver un problema extremo de clasificación de texto de múltiples etiquetas, ya que a cada registro de salud se le asignan múltiples códigos ICD de un conjunto de más de 70 000 términos de diagnóstico. Una cantidad importante de recursos se dedican a la codificación médica, una laboriosa tarea que actualmente se realiza de forma manual. Los EHR son narraciones extensas, y los codificadores médicos revisan los registros escritos por los médicos y asignan los códigos ICD correspondientes. Los textos son técnicos ya que los médicos emplean una jerga médica especializada, aunque rica en abreviaturas, acrónimos y errores ortográficos, ya que los médicos documentan los registros mientras realizan la práctica clínica real. Paraabordar la clasificación automática de registros de salud, investigamos y desarrollamos un conjunto de técnicas de clasificación de texto de aprendizaje profundo

    Ontology-Based Clinical Information Extraction Using SNOMED CT

    Get PDF
    Extracting and encoding clinical information captured in unstructured clinical documents with standard medical terminologies is vital to enable secondary use of clinical data from practice. SNOMED CT is the most comprehensive medical ontology with broad types of concepts and detailed relationships and it has been widely used for many clinical applications. However, few studies have investigated the use of SNOMED CT in clinical information extraction. In this dissertation research, we developed a fine-grained information model based on the SNOMED CT and built novel information extraction systems to recognize clinical entities and identify their relations, as well as to encode them to SNOMED CT concepts. Our evaluation shows that such ontology-based information extraction systems using SNOMED CT could achieve state-of-the-art performance, indicating its potential in clinical natural language processing

    Supporting the Billing Process in Outpatient Medical Care: Automated Medical Coding Through Machine Learning

    Get PDF
    Reimbursement in medical care implies significant administrative effort for medical staff. To bill the treatments or services provided, diagnosis and treatment codes must be assigned to patient records using standardized healthcare classification systems, which is a time-consuming and error-prone task. In contrast to ICD diagnosis codes used in most countries for inpatient care reimbursement, outpatient medical care often involves different reimbursement schemes. Following the Action Design Research methodology, we developed an NLP-based machine learning artifact in close collaboration with a general practitioner’s office in Germany, leveraging a dataset of over 5,600 patients with more than 63,000 billing codes. For the code prediction of most problematic treatments as well as a complete code prediction task, we achieved F1-scores of 93.60 % and 78.22 %, respectively. Throughout three iterations, we derived five meta requirements leading to three design principles for an automated coding system to support the reimbursement of outpatient medical care

    Knowledge Augmentation in Language Models to Overcome Domain Adaptation and Scarce Data Challenges in Clinical Domain

    Get PDF
    The co-existence of two scenarios, “the massive amount of unstructured text data that humanity produces” and “the scarcity of sufficient training data to train language models,” in the healthcare domain have multifold increased the need for intelligent tools and techniques to process, interpret and extract different types of knowledge from the data. My research goal in this thesis is to develop intelligent methods and models to automatically better interpret human language and sentiments, particularly its structure and semantics, to solve multiple higher-level Natural Language Processing (NLP) downstream tasks and beyond. This thesis is spread over six chapters and is divided into two parts based on the contributions. The first part is centered on best practices for modeling data and injecting domain knowledge to enrich data semantics applied to tackle several classification tasks in the healthcare domain and beyond. The contribution is to reduce the training time, improve the performance of classification models, and use world knowledge as a source of domain knowledge when working with limited/small training data. The second part introduces the one of its kind high-quality dataset of Motivational Interviewing (MI), AnnoMI, followed by the experimental benchmarking analysis for AnnoMI. The contribution accounts to provide a publicly accessible dataset of Motivational Interviewing and methods to overcome data scarcity challenges in complex domains (such as mental health). The overall organization of the thesis is as follows: \\ The first chapter provides a high-level introduction to the tools and techniques applied in the scope of the thesis. The second chapter presents optimal methods for (i) feature selection, (ii) eliminating irrelevant and superfluous attributes from the dataset, (iii) data preprocessing, and (iv) advanced data representation methods (word embedding and bag-of-words) to model data. The third chapter introduces the Language Model (LM), K-LM, a combination of Generative Pretrained Transformer (GPT)-2 and Bidirectional Encoder Representations from Transformers (BERT) that uses knowledge graphs to inject domain knowledge for domain adaptation tasks. The end goal of this chapter is to reduce the training time and improve the performance of classification models when working with limited/small training data. The fourth chapter introduces the high-quality dataset of expert-annotated MI (AnnoMI), comprised of 133 therapy session transcriptions distributed over 44 topics (including smoking cessation, anxiety management, weight loss, etc.), and provides an in-depth analysis of the dataset. \\ The fifth chapter presents the experimental analysis with AnnoMI, which includes (i) augmentation techniques to generate data and (ii) fairness and bias assessments of the employed Classical Machine Learning (CML) and Deep Learning (DL) approach to develop reliable classification models. Finally, the sixth chapter provides the conclusion and outcomes of all the work presented in this thesis. The scientific contributions of this thesis include the solution to overcome the challenges of scarce training data in complex domains and domain adaptation in LMs. The practical contributions of the thesis are data resources and the language model for a range of quantitative and qualitative NLP applications. Keywords: Natural Language Processing, Domain Adaptation, Motivational Interviewing, AI Fairness and Bias, Data Augmentation, GPT, BERT, Healthcare

    HiCu: Leveraging Hierarchy for Curriculum Learning in Automated ICD Coding

    Full text link
    There are several opportunities for automation in healthcare that can improve clinician throughput. One such example is assistive tools to document diagnosis codes when clinicians write notes. We study the automation of medical code prediction using curriculum learning, which is a training strategy for machine learning models that gradually increases the hardness of the learning tasks from easy to difficult. One of the challenges in curriculum learning is the design of curricula -- i.e., in the sequential design of tasks that gradually increase in difficulty. We propose Hierarchical Curriculum Learning (HiCu), an algorithm that uses graph structure in the space of outputs to design curricula for multi-label classification. We create curricula for multi-label classification models that predict ICD diagnosis and procedure codes from natural language descriptions of patients. By leveraging the hierarchy of ICD codes, which groups diagnosis codes based on various organ systems in the human body, we find that our proposed curricula improve the generalization of neural network-based predictive models across recurrent, convolutional, and transformer-based architectures. Our code is available at https://github.com/wren93/HiCu-ICD.Comment: To appear at Machine Learning for Healthcare Conference (MLHC2022

    Low-Code/No-Code Artificial Intelligence Platforms for the Health Informatics Domain

    Get PDF
    In the contemporary health informatics space, Artificial Intelligence (AI) has become a necessity for the extraction of actionable knowledge in a timely manner. Low-code/No-Code (LCNC) AI Platforms enable domain experts to leverage the value that AI has to offer by lowering the technical skills overhead. We develop domain-specific, service-orientated platforms in the context of two subdomains of health informatics. We address in this work the core principles and the architectures of these platforms whose functionality we are constantly extending. Our work conforms to best practices with respect to the integration and interoperability of external services and provides process orchestration in a LCNC modeldriven fashion. We chose the CINCO product DIME and a bespoke tool developed in CINCO Cloud to serve as the underlying infrastructure for our LCNC platforms which address the requirements from our two application domains; public health and biomedical research. In the context of public health, an environment for building AI driven web applications for the automated evaluation of Web-based Health Information (WBHI). With respect to biomedical research, an AI driven workflow environment for the computational analysis of highly-plexed tissue images. We extended both underlying application stacks to support the various AI service functionality needed to address the requirements of the two application domains. The two case studies presented outline the methodology of developing these platforms through co-design with experts in the respective domains. Moving forward we anticipate we will increasingly re-use components which will reduce the development overhead for extending our existing platforms or developing new applications in similar domains
    corecore