12,938 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
An evaluation of DNA-damage response and cell-cycle pathways for breast cancer classification
Accurate subtyping or classification of breast cancer is important for
ensuring proper treatment of patients and also for understanding the molecular
mechanisms driving this disease. While there have been several gene signatures
proposed in the literature to classify breast tumours, these signatures show
very low overlaps, different classification performance, and not much relevance
to the underlying biology of these tumours. Here we evaluate DNA-damage
response (DDR) and cell cycle pathways, which are critical pathways implicated
in a considerable proportion of breast tumours, for their usefulness and
ability in breast tumour subtyping. We think that subtyping breast tumours
based on these two pathways could lead to vital insights into molecular
mechanisms driving these tumours. Here, we performed a systematic evaluation of
DDR and cell-cycle pathways for subtyping of breast tumours into the five known
intrinsic subtypes. Homologous Recombination (HR) pathway showed the best
performance in subtyping breast tumours, indicating that HR genes are strongly
involved in all breast tumours. Comparisons of pathway based signatures and two
standard gene signatures supported the use of known pathways for breast tumour
subtyping. Further, the evaluation of these standard gene signatures showed
that breast tumour subtyping, prognosis and survival estimation are all closely
related. Finally, we constructed an all-inclusive super-signature by combining
(union of) all genes and performing a stringent feature selection, and found it
to be reasonably accurate and robust in classification as well as prognostic
value. Adopting DDR and cell cycle pathways for breast tumour subtyping
achieved robust and accurate breast tumour subtyping, and constructing a
super-signature which contains feature selected mix of genes from these
molecular pathways as well as clinical aspects is valuable in clinical
practice.Comment: 28 pages, 7 figures, 6 table
Stable Feature Selection for Biomarker Discovery
Feature selection techniques have been used as the workhorse in biomarker
discovery applications for a long time. Surprisingly, the stability of feature
selection with respect to sampling variations has long been under-considered.
It is only until recently that this issue has received more and more attention.
In this article, we review existing stable feature selection methods for
biomarker discovery using a generic hierarchal framework. We have two
objectives: (1) providing an overview on this new yet fast growing topic for a
convenient reference; (2) categorizing existing methods under an expandable
framework for future research and development
Elephant Search with Deep Learning for Microarray Data Analysis
Even though there is a plethora of research in Microarray gene expression
data analysis, still, it poses challenges for researchers to effectively and
efficiently analyze the large yet complex expression of genes. The feature
(gene) selection method is of paramount importance for understanding the
differences in biological and non-biological variation between samples. In
order to address this problem, a novel elephant search (ES) based optimization
is proposed to select best gene expressions from the large volume of microarray
data. Further, a promising machine learning method is envisioned to leverage
such high dimensional and complex microarray dataset for extracting hidden
patterns inside to make a meaningful prediction and most accurate
classification. In particular, stochastic gradient descent based Deep learning
(DL) with softmax activation function is then used on the reduced features
(genes) for better classification of different samples according to their gene
expression levels. The experiments are carried out on nine most popular Cancer
microarray gene selection datasets, obtained from UCI machine learning
repository. The empirical results obtained by the proposed elephant search
based deep learning (ESDL) approach are compared with most recent published
article for its suitability in future Bioinformatics research.Comment: 12 pages, 5 Tabl
Recommended from our members
Integrative analysis of the inter-tumoral heterogeneity of triple-negative breast cancer.
Triple-negative breast cancers (TNBC) lack estrogen and progesterone receptors and HER2 amplification, and are resistant to therapies that target these receptors. Tumors from TNBC patients are heterogeneous based on genetic variations, tumor histology, and clinical outcomes. We used high throughput genomic data for TNBC patients (n = 137) from TCGA to characterize inter-tumor heterogeneity. Similarity network fusion (SNF)-based integrative clustering combining gene expression, miRNA expression, and copy number variation, revealed three distinct patient clusters. Integrating multiple types of data resulted in more distinct clusters than analyses with a single datatype. Whereas most TNBCs are classified by PAM50 as basal subtype, one of the clusters was enriched in the non-basal PAM50 subtypes, exhibited more aggressive clinical features and had a distinctive signature of oncogenic mutations, miRNAs and expressed genes. Our analyses provide a new classification scheme for TNBC based on multiple omics datasets and provide insight into molecular features that underlie TNBC heterogeneity
- …