4 research outputs found

    Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks

    Get PDF
    Interventional cancer clinical trials are generally too restrictive, and some patients are often excluded on the basis of comorbidity, past or concomitant treatments, or the fact that they are over a certain age. The efficacy and safety of new treatments for patients with these characteristics are, therefore, not defined. In this work, we built a model to automatically predict whether short clinical statements were considered inclusion or exclusion criteria. We used protocols from cancer clinical trials that were available in public registries from the last 18 years to train word-embeddings, and we constructed a~dataset of 6M short free-texts labeled as eligible or not eligible. A text classifier was trained using deep neural networks, with pre-trained word-embeddings as inputs, to predict whether or not short free-text statements describing clinical information were considered eligible. We additionally analyzed the semantic reasoning of the word-embedding representations obtained and were able to identify equivalent treatments for a type of tumor analogous with the drugs used to treat other tumors. We show that representation learning using {deep} neural networks can be successfully leveraged to extract the medical knowledge from clinical trial protocols for potentially assisting practitioners when prescribing treatments

    Evaluating BERT Embeddings for Text Classification in Bio-Medical Domain to Determine Eligibility of Patients in Clinical Trials

    Get PDF
    Clinical Trials are studies conducted by researchers in order to assess the impact of new medicine in terms of its efficacy and most importantly safety on human health. For any advancement in the field of medicine it is very important that clinical trials are conducted with right ethics supported by scientific evidence. Not all people who volunteer or participate in clinical trials are allowed to undergo the trials. Age, comorbidity and other health issues present in a patient can be a major factor to decide whether the profile is suitable or not for the trial. Profiles selected for clinical trials should be documented and also the profiles which were excluded. This research which took over a long time period conducted trials on 15,000 cancer drugs. Keeping track of so many trials, their outcomes and formulating a standard health guideline is easier said than done. In this paper, Text classification which is one of the primary assessment tasks in Natural Language Processing (NLP) is discussed. One of the most common problems in NLP, but it becomes complex when it is dealing with a specific domain like bio-medical which finds presence of quite a few jargons pertaining to the medical field. This paper proposes a framework with two major components comprising transformer architecture to produce embedding coupled with a text classifier. In the later section it is proved that pre-trained embeddings generated by BERT (Bidirectional Encoder Representations from Transformers) can perform as efficiently and achieve a better F1-score and accuracy than the current benchmark score which uses embeddings trained from the same dataset. The main contribution of this paper is the framework which can be extended to different bio-medical problems. The design can also be reused for different domains by fine-tuning. The framework also provides support for different optimization techniques like Mixed Precision, Dynamic Padding and Uniform Length Batching which improves performance by up to 3 times in GPU (Graphics Processing Unit) processors and by 60% in TPU (Tensor Processing Unit)

    Enhancing reuse of structured eligibility criteria and supporting their relaxation

    Get PDF
    AbstractPatient recruitment is one of the most important barriers to successful completion of clinical trials and thus to obtaining evidence about new methods for prevention, diagnostics and treatment. The reason is that recruitment is effort consuming. It requires the identification of candidate patients for the trial (the population under study), and verifying for each patient whether the eligibility criteria are met. The work we describe in this paper aims to support the comparison of population under study in different trials, and the design of eligibility criteria for new trials. We do this by introducing structured eligibility criteria, that enhance reuse of criteria across trials. We developed a method that allows for automated structuring of criteria from text. Additionally, structured eligibility criteria allow us to propose suggestions for relaxation of criteria to remove potentially unnecessarily restrictive conditions. We thereby increase the recruitment potential and generalizability of a trial.Our method for automated structuring of criteria enables us to identify related conditions and to compare their restrictiveness. The comparison is based on the general meaning of criteria, comprised of commonly occurring contextual patterns, medical concepts and constraining values. These are automatically identified using our pattern detection algorithm, state of the art ontology annotators and semantic taggers. The comparison uses predefined relations between the patterns, concept equivalences defined in medical ontologies, and threshold values. The result is a library of structured eligibility criteria which can be browsed using fine grained queries. Furthermore, we developed visualizations for the library that enable intuitive navigation of relations between trials, criteria and concepts. These visualizations expose interesting co-occurrences and correlations, potentially enhancing meta-research.The method for criteria structuring processes only certain types of criteria, which results in low recall of the method (18%) but a high precision for the relations we identify between the criteria (94%). Analysis of the approach from the medical perspective revealed that the approach can be beneficial for supporting trial design, though more research is needed

    Advanced Methods for Entity Linking in the Life Sciences

    Get PDF
    The amount of knowledge increases rapidly due to the increasing number of available data sources. However, the autonomy of data sources and the resulting heterogeneity prevent comprehensive data analysis and applications. Data integration aims to overcome heterogeneity by unifying different data sources and enriching unstructured data. The enrichment of data consists of different subtasks, amongst other the annotation process. The annotation process links document phrases to terms of a standardized vocabulary. Annotated documents enable effective retrieval methods, comparability of different documents, and comprehensive data analysis, such as finding adversarial drug effects based on patient data. A vocabulary allows the comparability using standardized terms. An ontology can also represent a vocabulary, whereas concepts, relationships, and logical constraints additionally define an ontology. The annotation process is applicable in different domains. Nevertheless, there is a difference between generic and specialized domains according to the annotation process. This thesis emphasizes the differences between the domains and addresses the identified challenges. The majority of annotation approaches focuses on the evaluation of general domains, such as Wikipedia. This thesis evaluates the developed annotation approaches with case report forms that are medical documents for examining clinical trials. The natural language provides different challenges, such as similar meanings using different phrases. The proposed annotation method, AnnoMap, considers the fuzziness of natural language. A further challenge is the reuse of verified annotations. Existing annotations represent knowledge that can be reused for further annotation processes. AnnoMap consists of a reuse strategy that utilizes verified annotations to link new documents to appropriate concepts. Due to the broad spectrum of areas in the biomedical domain, different tools exist. The tools perform differently regarding a particular domain. This thesis proposes a combination approach to unify results from different tools. The method utilizes existing tool results to build a classification model that can classify new annotations as correct or incorrect. The results show that the reuse and the machine learning-based combination improve the annotation quality compared to existing approaches focussing on the biomedical domain. A further part of data integration is entity resolution to build unified knowledge bases from different data sources. A data source consists of a set of records characterized by attributes. The goal of entity resolution is to identify records representing the same real-world entity. Many methods focus on linking data sources consisting of records being characterized by attributes. Nevertheless, only a few methods can handle graph-structured knowledge bases or consider temporal aspects. The temporal aspects are essential to identify the same entities over different time intervals since these aspects underlie certain conditions. Moreover, records can be related to other records so that a small graph structure exists for each record. These small graphs can be linked to each other if they represent the same. This thesis proposes an entity resolution approach for census data consisting of person records for different time intervals. The approach also considers the graph structure of persons given by family relationships. For achieving qualitative results, current methods apply machine-learning techniques to classify record pairs as the same entity. The classification task used a model that is generated by training data. In this case, the training data is a set of record pairs that are labeled as a duplicate or not. Nevertheless, the generation of training data is a time-consuming task so that active learning techniques are relevant for reducing the number of training examples. The entity resolution method for temporal graph-structured data shows an improvement compared to previous collective entity resolution approaches. The developed active learning approach achieves comparable results to supervised learning methods and outperforms other limited budget active learning methods. Besides the entity resolution approach, the thesis introduces the concept of evolution operators for communities. These operators can express the dynamics of communities and individuals. For instance, we can formulate that two communities merged or split over time. Moreover, the operators allow observing the history of individuals. Overall, the presented annotation approaches generate qualitative annotations for medical forms. The annotations enable comprehensive analysis across different data sources as well as accurate queries. The proposed entity resolution approaches improve existing ones so that they contribute to the generation of qualitative knowledge graphs and data analysis tasks
    corecore