187 research outputs found

    Pancancer analysis of DNA methylation-driven genes using MethylMix.

    Get PDF
    Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hypermethylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated genes that are also predictive of transcription. We apply MethylMix to 12 individual cancer sites, and additionally combine all cancer sites in a pancancer analysis. We discover pancancer hypo- and hypermethylated genes and identify novel methylation-driven subgroups with clinical implications. MethylMix analysis on combined cancer sites reveals 10 pancancer clusters reflecting new similarities across malignantly transformed tissues

    Reliably Filter Drug-Induced Liver Injury Literature With Natural Language Processing and Conformal Prediction

    Get PDF
    Drug-induced liver injury describes the adverse effects of drugs that damage the liver. Life-threatening results were also reported in severe cases. Therefore, liver toxicity is an important assessment for new drug candidates. These reports are documented in research papers that contain preliminary in vitro and in vivo experiments. Conventionally, data extraction from publications relies on resource-demanding manual labeling, which restricts the efficiency of the information extraction. The development of natural language processing techniques enables the automatic processing of biomedical texts. Herein, based on around 28,000 papers (titles and abstracts) provided by the Critical Assessment of Massive Data Analysis challenge, this study benchmarked model performances on filtering liver-damage-related literature. Among five text embedding techniques, the model using term frequency-inverse document frequency (TF-IDF) and logistic regression outperformed others with an accuracy of 0.957 on the validation set. Furthermore, an ensemble model with similar overall performances was developed with a logistic regression model on the predicted probability given by separate models with different vectorization techniques. The ensemble model achieved a high accuracy of 0.954 and an F1 score of 0.955 in the hold-out validation data in the challenge. Moreover, important words in positive/negative predictions were identified via model interpretation. The prediction reliability was quantified with conformal prediction, which provides users with a control over the prediction uncertainty. Overall, the ensemble model and TF-IDF model reached satisfactory classification results, which can be used by researchers to rapidly filter literature that describes events related to liver injury induced by medications

    A Rapid Segmentation-Insensitive "Digital Biopsy" Method for Radiomic Feature Extraction: Method and Pilot Study Using CT Images of Non-Small Cell Lung Cancer.

    Get PDF
    Quantitative imaging approaches compute features within images' regions of interest. Segmentation is rarely completely automatic, requiring time-consuming editing by experts. We propose a new paradigm, called "digital biopsy," that allows for the collection of intensity- and texture-based features from these regions at least 1 order of magnitude faster than the current manual or semiautomated methods. A radiologist reviewed automated segmentations of lung nodules from 100 preoperative volume computed tomography scans of patients with non-small cell lung cancer, and manually adjusted the nodule boundaries in each section, to be used as a reference standard, requiring up to 45 minutes per nodule. We also asked a different expert to generate a digital biopsy for each patient using a paintbrush tool to paint a contiguous region of each tumor over multiple cross-sections, a procedure that required an average of <3 minutes per nodule. We simulated additional digital biopsies using morphological procedures. Finally, we compared the features extracted from these digital biopsies with our reference standard using intraclass correlation coefficient (ICC) to characterize robustness. Comparing the reference standard segmentations to our digital biopsies, we found that 84/94 features had an ICC >0.7; comparing erosions and dilations, using a sphere of 1.5-mm radius, of our digital biopsies to the reference standard segmentations resulted in 41/94 and 53/94 features, respectively, with ICCs >0.7. We conclude that many intensity- and texture-based features remain consistent between the reference standard and our method while substantially reducing the amount of operator time required

    Topological image modification for object detection and topological image processing of skin lesions

    Get PDF
    We propose a new method based on Topological Data Analysis (TDA) consisting of Topological Image Modification (TIM) and Topological Image Processing (TIP) for object detection. Through this newly introduced method, we artificially destruct irrelevant objects, and construct new objects with known topological properties in irrelevant regions of an image. This ensures that we are able to identify the important objects in relevant regions of the image. We do this by means of persistent homology, which allows us to simultaneously select appropriate thresholds, as well as the objects corresponding to these thresholds, and separate them from the noisy background of an image. This leads to a new image, processed in a completely unsupervised manner, from which one may more efficiently extract important objects. We demonstrate the usefulness of this proposed method for topological image processing through a case-study of unsupervised segmentation of the ISIC 2018 skin lesion images. Code for this project is available on https://bitbucket.org/ghentdatascience/topimgprocess

    Development and characterization of protein kinase B/AKT isoform-specific nanobodies

    Get PDF
    The serine/threonine protein kinase AKT is frequently over-activated in cancer and is associated with poor prognosis. As a central node in the PI3K/AKT/mTOR pathway, which regulates various processes considered to be hallmarks of cancer, this kinase has become a prime target for cancer therapy. However, AKT has proven to be a highly complex target as it comes in three isoforms (AKT1, AKT2 and AKT3) which are highly homologous, yet non-redundant. The isoform-specific functions of the AKT kinases can be dependent on context (i.e. different types of cancer) and even opposed to one another. To date, there is no isoform-specific inhibitor available and no alternative to genetic approaches to study the function of a single AKT isoform. We have developed and characterized nanobodies that specifically interact with the AKT1 or AKT2 isoforms. These new tools should enable future studies of AKT1 and AKT2 isoform-specific functions. Furthermore, for both isoforms we obtained a nanobody that interferes with the AKT-PIP3-interaction, an essential step in the activation of the kinase. The nanobodies characterized in this study are a new stepping stone towards unravelling AKT isoform-specific signalling

    An AKT2-specific nanobody that targets the hydrophobic motif induces cell cycle arrest, autophagy and loss of focal adhesions in MDA-MB-231 cells

    Get PDF
    The AKT kinase family is a high-profile target for cancer therapy. Despite their high degree of homology the three AKT isoforms (AKT1, AKT2 and AKT3) are non-redundant and can even have opposing functions. Small-molecule AKT inhibitors affect all three isoforms which severely limits their usefulness as research tool or therapeutic. Using AKT2-specific nanobodies we examined the function of endogenous AKT2 in breast cancer cells. Two AKT2 nanobodies (Nb8 and Nb9) modulate AKT2 and reduce MDA-MB-231 cell viability/proliferation. Nb8 binds the AKT2 hydrophobic motif and reduces IGF-1-induced phosphorylation of this site. This nanobody also affects the phosphorylation and/or expression levels of a wide range of proteins downstream of AKT, resulting in a G0/G1 cell cycle arrest, the induction of autophagy, a reduction in focal adhesion count and loss of stress fibers. While cell cycle progression is likely to be regulated by more than one isoform, our results indicate that both the effects on autophagy and the cytoskeleton are specific to AKT2. By using an isoform-specific nanobody we were able to map a part of the AKT2 pathway. Our results confirm AKT2 and the hydrophobic motif as targets for cancer therapy. Nb8 can be used as a research tool to study AKT2 signalling events and aid in the design of an AKT2-specific inhibitor

    Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining

    Full text link
    Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data

    Improved Microarray-Based Decision Support with Graph Encoded Interactome Data

    Get PDF
    In the past, microarray studies have been criticized due to noise and the limited overlap between gene signatures. Prior biological knowledge should therefore be incorporated as side information in models based on gene expression data to improve the accuracy of diagnosis and prognosis in cancer. As prior knowledge, we investigated interaction and pathway information from the human interactome on different aspects of biological systems. By exploiting the properties of kernel methods, relations between genes with similar functions but active in alternative pathways could be incorporated in a support vector machine classifier based on spectral graph theory. Using 10 microarray data sets, we first reduced the number of data sources relevant for multiple cancer types and outcomes. Three sources on metabolic pathway information (KEGG), protein-protein interactions (OPHID) and miRNA-gene targeting (microRNA.org) outperformed the other sources with regard to the considered class of models. Both fixed and adaptive approaches were subsequently considered to combine the three corresponding classifiers. Averaging the predictions of these classifiers performed best and was significantly better than the model based on microarray data only. These results were confirmed on 6 validation microarray sets, with a significantly improved performance in 4 of them. Integrating interactome data thus improves classification of cancer outcome for the investigated microarray technologies and cancer types. Moreover, this strategy can be incorporated in any kernel method or non-linear version of a non-kernel method

    Toward more accurate and generalizable brain deformation estimators for traumatic brain injury detection with unsupervised domain adaptation

    Full text link
    Machine learning head models (MLHMs) are developed to estimate brain deformation for early detection of traumatic brain injury (TBI). However, the overfitting to simulated impacts and the lack of generalizability caused by distributional shift of different head impact datasets hinders the broad clinical applications of current MLHMs. We propose brain deformation estimators that integrates unsupervised domain adaptation with a deep neural network to predict whole-brain maximum principal strain (MPS) and MPS rate (MPSR). With 12,780 simulated head impacts, we performed unsupervised domain adaptation on on-field head impacts from 302 college football (CF) impacts and 457 mixed martial arts (MMA) impacts using domain regularized component analysis (DRCA) and cycle-GAN-based methods. The new model improved the MPS/MPSR estimation accuracy, with the DRCA method significantly outperforming other domain adaptation methods in prediction accuracy (p<0.001): MPS RMSE: 0.027 (CF) and 0.037 (MMA); MPSR RMSE: 7.159 (CF) and 13.022 (MMA). On another two hold-out test sets with 195 college football impacts and 260 boxing impacts, the DRCA model significantly outperformed the baseline model without domain adaptation in MPS and MPSR estimation accuracy (p<0.001). The DRCA domain adaptation reduces the MPS/MPSR estimation error to be well below TBI thresholds, enabling accurate brain deformation estimation to detect TBI in future clinical applications

    A kernel-based integration of genome-wide data for clinical decision support

    Get PDF
    ABSTRACT : BACKGROUND : Although microarray technology allows the investigation of the transcriptomic make-up of a tumor in one experiment, the transcriptome does not completely reflect the underlying biology due to alternative splicing, post-translational modifications, as well as the influence of pathological conditions (for example, cancer) on transcription and translation. This increases the importance of fusing more than one source of genome-wide data, such as the genome, transcriptome, proteome, and epigenome. The current increase in the amount of available omics data emphasizes the need for a methodological integration framework. METHODS : We propose a kernel-based approach for clinical decision support in which many genome-wide data sources are combined. Integration occurs within the patient domain at the level of kernel matrices before building the classifier. As supervised classification algorithm, a weighted least squares support vector machine is used. We apply this framework to two cancer cases, namely, a rectal cancer data set containing microarray and proteomics data and a prostate cancer data set containing microarray and genomics data. For both cases, multiple outcomes are predicted. RESULTS : For the rectal cancer outcomes, the highest leave-one-out (LOO) areas under the receiver operating characteristic curves (AUC) were obtained when combining microarray and proteomics data gathered during therapy and ranged from 0.927 to 0.987. For prostate cancer, all four outcomes had a better LOO AUC when combining microarray and genomics data, ranging from 0.786 for recurrence to 0.987 for metastasis. CONCLUSIONS : For both cancer sites the prediction of all outcomes improved when more than one genome-wide data set was considered. This suggests that integrating multiple genome-wide data sources increases the predictive performance of clinical decision support models. This emphasizes the need for comprehensive multi-modal data. We acknowledge that, in a first phase, this will substantially increase costs; however, this is a necessary investment to ultimately obtain cost-efficient models usable in patient tailored therapy
    • …
    corecore