707 research outputs found

    A Frame-Based NLP System for Cancer-Related Information Extraction.

    Get PDF
    We propose a frame-based natural language processing (NLP) method that extracts cancer-related information from clinical narratives. We focus on three frames: cancer diagnosis, cancer therapeutic procedure, and tumor description. We utilize a deep learning-based approach, bidirectional Long Short-term Memory (LSTM) Conditional Random Field (CRF), which uses both character and word embeddings. The system consists of two constituent sequence classifiers: a frame identification (lexical unit) classifier and a frame element classifier. The classifier achieves an

    Enhance Representation Learning of Clinical Narrative with Neural Networks for Clinical Predictive Modeling

    Get PDF
    Medicine is undergoing a technological revolution. Understanding human health from clinical data has major challenges from technical and practical perspectives, thus prompting methods that understand large, complex, and noisy data. These methods are particularly necessary for natural language data from clinical narratives/notes, which contain some of the richest information on a patient. Meanwhile, deep neural networks have achieved superior performance in a wide variety of natural language processing (NLP) tasks because of their capacity to encode meaningful but abstract representations and learn the entire task end-to-end. In this thesis, I investigate representation learning of clinical narratives with deep neural networks through a number of tasks ranging from clinical concept extraction, clinical note modeling, and patient-level language representation. I present methods utilizing representation learning with neural networks to support understanding of clinical text documents. I first introduce the notion of representation learning from natural language processing and patient data modeling. Then, I investigate word-level representation learning to improve clinical concept extraction from clinical notes. I present two works on learning word representations and evaluate them to extract important concepts from clinical notes. The first study focuses on cancer-related information, and the second study evaluates shared-task data. The aims of these two studies are to automatically extract important entities from clinical notes. Next, I present a series of deep neural networks to encode hierarchical, longitudinal, and contextual information for modeling a series of clinical notes. I also evaluate the models by predicting clinical outcomes of interest, including mortality, length of stay, and phenotype predictions. Finally, I propose a novel representation learning architecture to develop a generalized and transferable language representation at the patient level. I also identify pre-training tasks appropriate for constructing a generalizable language representation. The main focus is to improve predictive performance of phenotypes with limited data, a challenging task due to a lack of data. Overall, this dissertation addresses issues in natural language processing for medicine, including clinical text classification and modeling. These studies show major barriers to understanding large-scale clinical notes. It is believed that developing deep representation learning methods for distilling enormous amounts of heterogeneous data into patient-level language representations will improve evidence-based clinical understanding. The approach to solving these issues by learning representations could be used across clinical applications despite noisy data. I conclude that considering different linguistic components in natural language and sequential information between clinical events is important. Such results have implications beyond the immediate context of predictions and further suggest future directions for clinical machine learning research to improve clinical outcomes. This could be a starting point for future phenotyping methods based on natural language processing that construct patient-level language representations to improve clinical predictions. While significant progress has been made, many open questions remain, so I will highlight a few works to demonstrate promising directions

    Histopathology Image Analysis and NLP for Digital Pathology

    Get PDF
    Information technologies based on ML with quantitative imaging and texts are playing an essential role, particularly in general medicine and oncology. DL in particular has demonstrated significant breakthroughs in Computer Vision and NLP which could enhance disease detection and the establishment of efficient treatments. Furthermore, considering a large number of people with cancer and the substantial volume of data generated during cancer treatment, there is a significant interest in the use of AI to improve oncologic care. In digital pathology, high-resolution microscope images of tissue samples are stored along with written medical reports in databases that are used by pathologists. The diagnosis is made through tissue analysis of the biopsy sample and is written as a brief unstructured report which is stored as free text in Electronic Medical Record (EMR)systems. For the transition towards digitization of medical records to achieve its maximum benefits, these reports must be accessible and usable by medical practitioners to easily understand them and help them precisely identify the disease. Concerning the histopathology images, which is the basis of diagnosis and study of diseases of the tissues, image analysis helps us identify the disease’s location and allows us to classify the type of cancer. Recently, due to the abundant accumulation of WSIs, there has been an increased demand for effective and efficient gigapixel image analysis, such as computer-aided diagnosis using DL techniques. Also, due to the high diversity of shapes and structures in WSIs, it is not possible to use conventional DL techniques for classification. Though computer-aided diagnosis using DL has good prediction accuracy, in the medical domain, there is a need to explain the prediction of the model to have a better understanding beyond standard quantitative performance evaluation. This thesis presents three different findings. Firstly, I provide a comparative analysis of various transformer models such as BioBERT, Clinical BioBERT, BioMed-RoBERTaand TF-IDF and our results demonstrate the effectiveness of various word embedding techniques for pathology reports in the classification task. Secondly, with the help of slide labels of WSIs, I classify them to their disease types, with an architecture having an attention mechanism and instance-level clustering. Finally, I introduced a method to fuse the features of the pathology reports and the features of their respective images. I investigated the effect of the combination of the features in the classification of both histopathology images and their respective reports simultaneously. This proved to be better than the individual classification tasks achieving an accuracy of 95.73%

    Construction of machine learning-based models for cancer outcomes in low and lower-middle income countries: A scoping review

    Get PDF
    Background: The impact and utility of machine learning (ML)-based prediction tools for cancer outcomes including assistive diagnosis, risk stratification, and adjunctive decision-making have been largely described and realized in the high income and upper-middle-income countries. However, statistical projections have estimated higher cancer incidence and mortality risks in low and lower-middle-income countries (LLMICs). Therefore, this review aimed to evaluate the utilization, model construction methods, and degree of implementation of ML-based models for cancer outcomes in LLMICs. Methods: PubMed/Medline, Scopus, and Web of Science databases were searched and articles describing the use of ML-based models for cancer among local populations in LLMICs between 2002 and 2022 were included. A total of 140 articles from 22,516 citations that met the eligibility criteria were included in this study. Results: ML-based models from LLMICs were often based on traditional ML algorithms than deep or deep hybrid learning. We found that the construction of ML-based models was skewed to particular LLMICs such as India, Iran, Pakistan, and Egypt with a paucity of applications in sub-Saharan Africa. Moreover, models for breast, head and neck, and brain cancer outcomes were frequently explored. Many models were deemed suboptimal according to the Prediction model Risk of Bias Assessment tool (PROBAST) due to sample size constraints and technical flaws in ML modeling even though their performance accuracy ranged from 0.65 to 1.00. While the development and internal validation were described for all models included (n=137), only 4.4% (6/137) have been validated in independent cohorts and 0.7% (1/137) have been assessed for clinical impact and efficacy. Conclusion: Overall, the application of ML for modeling cancer outcomes in LLMICs is increasing. However, model development is largely unsatisfactory. We recommend model retraining using larger sample sizes, intensified external validation practices, and increased impact assessment studies using randomized controlled trial design

    Mutational drivers of a dysfunctional local immune response in resected non-small cell lung cancer (NSCLC) patients

    Get PDF
    Background: Patients with KEAP1 and STK11 alterations have shown poor response to immunotherapy in non-small cell lung cancer (NSCLC) due to unknown underlying mechanisms. In a sub-study of the TNM-I trial (NCT03299478), we discovered that lung adenocarcinomas (LUAD) with concurrent KEAP1 and STK11 mutations exhibit predominantly non-inflamed immunological features, potentially contributing to immunotherapy resistance (PMID: 37100205). However, it is unclear whether single mutations or co-mutations drive this phenomenon. Methods: Among 215 patients (stage I-IIIA) who underwent genomic profiling, tumor tissue from 23 LUAD patients with STK11 and KEAP1 mutations were included in this thesis. NanoString gene expression analysis with the nCounter PanCancer IO 360β„’ Panel was performed and analyzed. Comparisons of gene expression and metagene changes were assessed across single versus co-mutations. Results: 44% (n = 10) of the cohort had co-mutations, while 56% (n = 13) had a single mutation with either KEAP1 or STK11. In STK11 vs co-mutation, pathway analysis revealed up-regulation of genes associated with adaptive immunity. Specifically, B cells were generally upregulated (p-adj < 0.05) in STK11 altered cases. In KEAP1 vs co-mutation, matrix remodeling and metastasis pathways were highly enriched, with the highest fold changes for MMP7 and MMP9 (5.19, 3.34, respectively; p-adj < 0.05). Additionally, we found up-regulation of chemoresistant pathways in KEAP1 mutated patients (p-adj < 0.05). In STK11 vs KEAP1, NF-kappaB was the most altered pathway. Conclusion: KEAP1 mutation is the main driver of the non-inflamed phenotype in LUAD compared to STK11 mutation, and it contributes to a more aggressive disease through activation of metastatic pathways and chemoresistance features. These results need to be validated in larger datasets

    Revealing effects of psychosocial factors of cancer patients

    Get PDF
    Abstract. This research shows different methodologies applied on different platforms in order to extract both social and psychosocial factors that might be related to caner by applying natural language processing tools on text from different platforms as social media or other online forums. We also present challenges associated with every platform and the corresponding tools used on it. From text mining to text analysis and then data visualisation, this research compares different analysis methods and outputs. We discuss many tools either tested, used or modified in order to achieve such analysis. Meanwhile, we were able to get interesting findings for the medical fields to explore and research more. We developed a modular system that can help clinicians and medical experts use to analyse similar forums.SyâpÀpotilaiden psykososiaalisten tekijâiden vaikutusten paljastaminen. TiivistelmÀ. TÀmÀ tutkimus esittelee erilaisia menetelmiÀ sovellettuina eri alustoilla, tavoitteena hahmottaa sekÀ sosiaalisia ettÀ psykokososiaalisia tekijâitÀ, jotka voivat liittyÀ syâpÀÀn sovellettaessa luonnollisia kielenkÀsittelyvÀlineitÀ eri alustojen tekstille sosiaalisen median tai muiden online-foorumeiden muodossa. EsitÀmme myâs haasteita, jotka liittyvÀt jokaiseen alustaan ja siihen liittyviin tyâkaluihin. Teksti-mining, tekstianalyysiin ja sitten datan visualisointiin tÀssÀ tutkimuksessa verrataan erilaisia analyysimenetelmiÀ ja -tuloksia. Keskustelemme monista tyâkaluista, jotka on testattu, kÀytetty tai muunnettu tÀllaisen analyysin saavuttamiseksi. Samaan aikaan saimme mielenkiintoisia tuloksia lÀÀketieteen aloille tutkia ja tutkia lisÀÀ. Kehitimme modulaarisen jÀrjestelmÀn, jonka avulla lÀÀkÀrit ja lÀÀketieteen asiantuntijat voivat analysoida samanlaisia foorumeita

    The immune microenvironment in mantle cell lymphoma : Targeted liquid and spatial proteomic analyses

    Get PDF
    The complex interplay of the tumour and immune cells affects tumour growth, progression, and response to treatment. Restorationof effective immune response forms the basis of onco-immunology, which further enabled the development of immunotherapy. Inthe era of precision medicine, pin-pointing patient biological heterogeneity especially in relation to patient-specific immunemicroenvironment is a necessity for the discovery of novel biomarkers and for development of patient stratification tools for targetedtherapeutics. Mantle cell lymphoma (MCL) is a rare and aggressive subtype of B-cell lymphoma with poor survival and high relapserates. Previous investigations of MCL have largely focused on the tumour itself and explorations of the immune microenvironmenthave been limited. This thesis and the included five papers, investigates multiple aspects of the immune microenvironment withrespect to proteomic analysis performed on tissue and liquid biopsies of diagnostic and relapsed/refractory (R/R) MCL cohorts.Analyses based on liquid biopsies (serum) in particular are relevant for aggressive cases such as in relapse, where invasiveprocedures for extracting tissues is not recommended. Thus, paper I-II probes the possibility of using serum for treatment andoutcome-associated biomarker discovery in R/R MCL, using a targeted affinity-based protein microarray platform quantifyingimmune-regulatory and tumor-secretory proteins in sera. Analysis performed in paper I using pre-treatment samples, identifies 11-plex biomarker signature (RIS – relapsed immune signature) associated with overall survival. Further integration of RIS with mantlecell lymphoma international prognostic index (MIPI) led to the development of MIPIris index for the stratification of R/R MCL intothree risk groups. Moreover, longitudinal analysis can be important in understanding how patient respond to treatment and thiscan further guide therapeutic interventions. Thus, paper II is a follow-up study wherein longitudinal analyses was performed onpaired samples collected at pre-treatment (baseline) and after three months of chemo-immunotherapy (on-treatment). We showhow genetic aberrations can influence systemic profiles and thus integrating genetic information can be crucial for treatmentselection. Furthermore, we observe that the inter-patient heterogeneity associated with absolute values can be circumvented byusing velocity of change to capture general changes over time in groups of patients. Thus, using velocity of change in serumproteins between pre- and on-treatment samples identified response biomarkers associated with minimal residual disease andprogression. While exploratory analysis using high dimensional omics-based data can be important for accelerating discovery,translating such information for clinical utility is a necessity. Thus, in paper III, we show how serum quantification can be usedcomplementary tissue-identified prognostic biomarkers and this can enable faster clinical implementation. Presence of CD163+M2-like macrophages has shown to be associated with poor outcome in MCL tissues. We show that higher expression of sCD163levels in sera quantified using ELISA, is also associated with poor outcome in diagnostic and relapsed MCL. Furthermore, wesuggest a cut-off for sCD163 levels that can be used for clinical utility. Further exploration of the dynamic interplay of tumourimmunemicroenvironment is now possible using spatial resolved omics for tissue-based analysis. Thus, in paper IV and V, weanalyse cell-type specific proteomic data collected from tumour and immune cells using GeoMxβ„’ digital spatial profiler. In paperIV, we show that presence as well as spatial localization of CD163+ macrophage with respect to tumour regions impactsmacrophage phenotypic profiles. Further modulation in the profile of surrounding tumour and T-cells is observed whenmacrophages are present in the vicinity. Based on this analysis, we suggest MAPK pathway as a potential therapeutic target intumours with CD163+ macrophages. Immune composition can be defined not just by the type of cells, but also with respect tofrequency and spatial localization and this is explored in paper V with respect to T-cell subtypes. Thus, in paper V, we optimizeda workflow of multiplexed immunofluorescence image segmentation that allowed us to extract cell metrics for four subtypes ofCD3+ T-cells. Using this data, we show that higher infiltration of T-cells is associated with a positive outcome in MCL. Moreover,by combining image derived metrics to cell specific spatial omics data, we were able to identify immunosuppressivemicroenvironment associated with highly infiltrated tumours and suggests new potential targets of immunotherapy with respect toIDO1, GITR and STING. In conclusion, this thesis explores systemic and tumor-associated immune microenvironment in MCL, fordefining patient heterogeneity, developing methods of patient stratification and for identifying novel and actionable biomarkers

    Hidden Treasures in Ò€œAncientÒ€ Microarrays: Gene-Expression Portrays Biology and Potential Resistance Pathways of Major Lung Cancer Subtypes and Normal Tissue

    Get PDF
    Objective: Novel statistical methods and increasingly more accurate gene annotations can transform β€œold” biological data into a renewed source of knowledge with potential clinical relevance. Here, we provide an in silico proof-of-concept by extracting novel information from a high-quality mRNA expression dataset, originally published in 2001, using state-of-the-art bioinformatics approaches. Methods: The dataset consists of histologically defined cases of lung adenocarcinoma (AD), squamous (SQ) cell carcinoma, small-cell lung cancer, carcinoid, metastasis (breast and colon AD), and normal lung specimens (203 samples in total). A battery of statistical tests was used for identifying differential gene expressions, diagnostic and prognostic genes, enriched gene ontologies, and signaling pathways. Results: Our results showed that gene expressions faithfully recapitulate immunohistochemical subtype markers, as chromogranin A in carcinoids, cytokeratin 5, p63 in SQ, and TTF1 in non-squamous types. Moreover, biological information with putative clinical relevance was revealed as potentially novel diagnostic genes for each subtype with specificity 93–100% (AUC = 0.93–1.00). Cancer subtypes were characterized by (a) differential expression of treatment target genes as TYMS, HER2, and HER3 and (b) overrepresentation of treatment-related pathways like cell cycle, DNA repair, and ERBB pathways. The vascular smooth muscle contraction, leukocyte trans-endothelial migration, and actin cytoskeleton pathways were overexpressed in normal tissue. Conclusion: Reanalysis of this public dataset displayed the known biological features of lung cancer subtypes and revealed novel pathways of potentially clinical importance. The findings also support our hypothesis that even old omics data of high quality can be a source of significant biological information when appropriate bioinformatics methods are used

    Systems analyses of the Fabry kidney transcriptome and its response to enzyme replacement therapy identified and cross-validated enzyme replacement therapy-resistant targets amenable to drug repurposing

    Full text link
    Fabry disease is a rare disorder caused by variations in the alpha-galactosidase gene. To a degree, Fabry disease is manageable via enzyme replacement therapy (ERT). By understanding the molecular basis of Fabry nephropathy (FN) and ERT's long-term impact, here we aimed to provide a framework for selection of potential disease biomarkers and drug targets. We obtained biopsies from eight control individuals and two independent FN cohorts comprising 16 individuals taken prior to and after up to ten years of ERT, and performed RNAseq analysis. Combining pathway-centered analyses with network-science allowed computation of transcriptional landscapes from four nephron compartments and their integration with existing proteome and drug-target interactome data. Comparing these transcriptional landscapes revealed high inter-cohort heterogeneity. Kidney compartment transcriptional landscapes comprehensively reflected differences in FN cohort characteristics. With exception of a few aspects, in particular arteries, early ERT in patients with classical Fabry could lastingly revert FN gene expression patterns to closely match that of control individuals. Pathways nonetheless consistently altered in both FN cohorts pre-ERT were mostly in glomeruli and arteries and related to the same biological themes. While keratinization-related processes in glomeruli were sensitive to ERT, a majority of alterations, such as transporter activity and responses to stimuli, remained dysregulated or reemerged despite ERT. Inferring an ERT-resistant genetic module of expressed genes identified 69 drugs for potential repurposing matching the proteins encoded by 12 genes. Thus, we identified and cross-validated ERT-resistant gene product modules that, when leveraged with external data, allowed estimating their suitability as biomarkers to potentially track disease course or treatment efficacy and potential targets for adjunct pharmaceutical treatment

    Food, Nutrition, Physical Activity, and the Prevention of Cancer: a Global Perspective

    Get PDF
    This Report has a number of inter-related general purposes. One is to explore the extent to which food, nutrition, physical activity, and body composition modify the risk of cancer, and to specify which factors are most important. To the extent that environmental factors such as food, nutrition, and physical activity influence the risk of cancer, it is a preventable disease. The Report specifies recommendations based on solid evidence which, when followed, will be expected to reduce the incidence of cancer
    • …
    corecore