Search CORE

852 research outputs found

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

One-class models for validation of miRNAs and ERBB2 gene interactions based on sequence features for breast cancer scenarios

Author: Gutiérrez Cárdenas Juan Manuel
Wan Zenghui
Publication venue: 'Stellenbosch University'
Publication date: 01/01/2021
Field of study

One challenge in miRNA–genes–diseases interaction studies is that it is challenging to find labeled data that indicate a positive or negative relationship between miRNA and genes. The use of one-class classification methods shows a promising path for validating them. We have applied two one-class classification methods, Isolation Forest and One-class SVM, to validate miRNAs interactions with the ERBB2 gene present in breast cancer scenarios using features extracted via sequence-binding. We found that the One-class SVM outperforms the Isolation Forest model, with values of sensitivity of 80.49% and a specificity of 86.49% showing results that are comparable to previous studies. Additionally, we have demonstrated that the use of features extracted from a sequence-based approach (considering miRNA and gene sequence binding characteristics) and one-class models have proven to be a feasible method for validating these genetic molecule interactions

Repositorio Institucional Ulima

Algorithms for pre-microrna classification and a GPU program for whole genome comparison

Author: Zhong Ling
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2016
Field of study

MicroRNAs (miRNAs) are non-coding RNAs with approximately 22 nucleotides that are derived from precursor molecules. These precursor molecules or pre-miRNAs often fold into stem-loop hairpin structures. However, a large number of sequences with pre-miRNA-like hairpin can be found in genomes. It is a challenge to distinguish the real pre-miRNAs from other hairpin sequences with similar stem-loops (referred to as pseudo pre-miRNAs). The first part of this dissertation presents a new method, called MirID, for identifying and classifying microRNA precursors. MirID is comprised of three steps. Initially, a combinatorial feature mining algorithm is developed to identify suitable feature sets. Then, the feature sets are used to train support vector machines to obtain classification models, based on which classifier ensemble is constructed. Finally, an AdaBoost algorithm is adopted to further enhance the accuracy of the classifier ensemble. Experimental results on a variety of species demonstrate the good performance of the proposed approach, and its superiority over existing methods. In the second part of this dissertation, A GPU (Graphics Processing Unit) program is developed for whole genome comparison. The goal for the research is to identify the commonalities and differences of two genomes from closely related organisms, via multiple sequencing alignments by using a seed and extend technique to choose reliable subsets of exact or near exact matches, which are called anchors. A rigorous method named Smith-Waterman search is applied for the anchor seeking, but takes days and months to map millions of bases for mammalian genome sequences. With GPU programming, which is designed to run in parallel hundreds of short functions called threads, up to 100X speed up is achieved over similar CPU executions

Digital Commons @ New Jersey Institute of Technology (NJIT)

Algebraic shortcuts for leave-one-out cross-validation in supervised network inference

Author: Airola Antti
De Baets Bernard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models

Ghent University Academic Bibliography

Recommended from our members

Integration of Genome Scale Data for Identifying New Biomarkers in Colon Cancer: Integrated Analysis of Transcriptomics and Epigenomics Data from High Throughput Technologies in Order to Identifying New Biomarkers Genes for Personalised Targeted Therapies for Patients Suffering from Colon Cancer

Author: Hassan Aamir Ul
Publication venue: Faculty of Engineering and Informatics
Publication date: 01/01/2017
Field of study

Colorectal cancer is the third most common cancer and the leading cause of cancer deaths in Western industrialised countries. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year due to colon cancer. Our current knowledge of colorectal carcinogenesis indicates a multifactorial and multi-step process that involves various genetic alterations and several biological pathways. The identification of molecular markers with early diagnostic and precise clinical outcome in colon cancer is a challenging task because of tumour heterogeneity. This Ph.D.-thesis presents the molecular and cellular mechanisms leading to colorectal cancer. A systematical review of the literature is conducted on Microarray Gene expression profiling, gene ontology enrichment analysis, microRNA and system Biology and various bioinformatics tools. We aimed this study to stratify a colon tumour into molecular distinct subtypes, identification of novel diagnostic targets and prediction of reliable prognostic signatures for clinical practice using microarray expression datasets. We performed an integrated analysis of gene expression data based on genetic, epigenetic and extensive clinical information using unsupervised learning, correlation and functional network analysis. As results, we identified 267-gene and 124-gene signatures that can distinguish normal, primary and metastatic tissues, and also involved in important regulatory functions such as immune-response, lipid metabolism and peroxisome proliferator-activated receptors (PPARs) signalling pathways. For the first time, we also identify miRNAs that can differentiate between primary colon from metastatic and a prognostic signature of grade and stage levels, which can be a major contributor to complex transcriptional phenotypes in a colon tumour

Bradford Scholars

An Explainable Approach for Lung Cancer Classification and Integrative Survival Analysis using Omics Data

Author: Bernardo Manuel Faria Ramos
Publication venue
Publication date: 10/02/2020
Field of study

Repositório Aberto da Universidade do Porto

Pathway-Based Multi-Omics Data Integration for Breast Cancer Diagnosis and Prognosis.

Author: Huang Sijia
Publication venue: University of Hawaiʻi at Mānoa
Publication date: 01/12/2017
Field of study

Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

ScholarSpace at University of Hawai'i at Manoa

Network-Based Biomarker Discovery : Development of Prognostic Biomarkers for Personalized Medicine by Integrating Data and Prior Knowledge

Author: Cun Yupeng
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Advances in genome science and technology offer a deeper understanding of biology while at the same time improving the practice of medicine. The expression profiling of some diseases, such as cancer, allows for identifying marker genes, which could be able to diagnose a disease or predict future disease outcomes. Marker genes (biomarkers) are selected by scoring how well their expression levels can discriminate between different classes of disease or between groups of patients with different clinical outcome (e.g. therapy response, survival time, etc.). A current challenge is to identify new markers that are directly related to the underlying disease mechanism

bonndoc – Der Publikationsserver der Universität Bonn

Medical Image Analytics (Radiomics) with Machine/Deeping Learning for Outcome Modeling in Radiation Oncology

Author: Wei Lise
Publication venue
Publication date: 01/01/2020
Field of study

Image-based quantitative analysis (radiomics) has gained great attention recently. Radiomics possesses promising potentials to be applied in the clinical practice of radiotherapy and to provide personalized healthcare for cancer patients. However, there are several challenges along the way that this thesis will attempt to address. Specifically, this thesis focuses on the investigation of repeatability and reproducibility of radiomics features, the development of new machine/deep learning models, and combining these for robust outcomes modeling and their applications in radiotherapy. Radiomics features suffer from robustness issues when applied to outcome modeling problems, especially in head and neck computed tomography (CT) images. These images tend to contain streak artifacts due to patients’ dental implants. To investigate the influence of artifacts for radiomics modeling performance, we firstly developed an automatic artifact detection algorithm using gradient-based hand-crafted features. Then, comparing the radiomics models trained on ‘clean’ and ‘contaminated’ datasets. The second project focused on using hand-crafted radiomics features and conventional machine learning methods for the prediction of overall response and progression-free survival for Y90 treated liver cancer patients. By identifying robust features and embedding prior knowledge in the engineered radiomics features and using bootstrapped LASSO to select robust features, we trained imaging and dose based models for the desired clinical endpoints, highlighting the complementary nature of this information in Y90 outcomes prediction. Combining hand-crafted and machine learnt features can take advantage of both expert domain knowledge and advanced data-driven approaches (e.g., deep learning). Thus, we proposed a new variational autoencoder network framework that modeled radiomics features, clinical factors, and raw CT images for the prediction of intrahepatic recurrence-free and overall survival for hepatocellular carcinoma (HCC) patients in this third project. The proposed approach was compared with widely used Cox proportional hazard model for survival analysis. Our proposed methods achieved significant improvement in terms of the prediction using the c-index metric highlighting the value of advanced modeling techniques in learning from limited and heterogeneous information in actuarial prediction of outcomes. Advances in stereotactic radiation therapy (SBRT) has led to excellent local tumor control with limited toxicities for HCC patients, but intrahepatic recurrence still remains prevalent. As an extension of the third project, we not only hope to predict the time to intrahepatic recurrence, but also the location where the tumor might recur. This will be clinically beneficial for better intervention and optimizing decision making during the process of radiotherapy treatment planning. To address this challenging task, firstly, we proposed an unsupervised registration neural network to register atlas CT to patient simulation CT and obtain the liver’s Couinaud segments for the entire patient cohort. Secondly, a new attention convolutional neural network has been applied to utilize multimodality images (CT, MR and 3D dose distribution) for the prediction of high-risk segments. The results showed much improved efficiency for obtaining segments compared with conventional registration methods and the prediction performance showed promising accuracy for anticipating the recurrence location as well. Overall, this thesis contributed new methods and techniques to improve the utilization of radiomics for personalized radiotherapy. These contributions included new algorithm for detecting artifacts, a joint model of dose with image heterogeneity, combining hand-crafted features with machine learnt features for actuarial radiomics modeling, and a novel approach for predicting location of treatment failure.PHDApplied PhysicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163092/1/liswei_1.pd

Deep Blue Documents at the University of Michigan