369 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Towards a more efficient and cost-sensitive extreme learning machine: A state-of-the-art review of recent trend

    Get PDF
    In spite of the prominence of extreme learning machine model, as well as its excellent features such as insignificant intervention for learning and model tuning, the simplicity of implementation, and high learning speed, which makes it a fascinating alternative method for Artificial Intelligence, including Big Data Analytics, it is still limited in certain aspects. These aspects must be treated to achieve an effective and cost-sensitive model. This review discussed the major drawbacks of ELM, which include difficulty in determination of hidden layer structure, prediction instability and Imbalanced data distributions, the poor capability of sample structure preserving (SSP), and difficulty in accommodating lateral inhibition by direct random feature mapping. Other drawbacks include multi-graph complexity, global memory size, one-by-one or chuck-by-chuck (a block of data), global memory size limitation, and challenges with big data. The recent trend proposed by experts for each drawback is discussed in detail towards achieving an effective and cost-sensitive mode

    GECKO: a complete large-scale gene expression analysis platform

    Get PDF
    BACKGROUND: Gecko (Gene Expression: Computation and Knowledge Organization) is a complete, high-capacity centralized gene expression analysis system, developed in response to the needs of a distributed user community. RESULTS: Based on a client-server architecture, with a centralized repository of typically many tens of thousands of Affymetrix scans, Gecko includes automatic processing pipelines for uploading data from remote sites, a data base, a computational engine implementing ~ 50 different analysis tools, and a client application. Among available analysis tools are clustering methods, principal component analysis, supervised classification including feature selection and cross-validation, multi-factorial ANOVA, statistical contrast calculations, and various post-processing tools for extracting data at given error rates or significance levels. On account of its open architecture, Gecko also allows for the integration of new algorithms. The Gecko framework is very general: non-Affymetrix and non-gene expression data can be analyzed as well. A unique feature of the Gecko architecture is the concept of the Analysis Tree (actually, a directed acyclic graph), in which all successive results in ongoing analyses are saved. This approach has proven invaluable in allowing a large (~ 100 users) and distributed community to share results, and to repeatedly return over a span of years to older and potentially very complex analyses of gene expression data. CONCLUSIONS: The Gecko system is being made publicly available as free software . In totality or in parts, the Gecko framework should prove useful to users and system developers with a broad range of analysis needs

    Explainable Artificial Intelligence based Ensemble Machine Learning for Ovarian Cancer Stratification using Electronic Health Records

    Get PDF
    The purpose of this study is to show how ensemble learning-driven machine learning algorithms outperform individual machine learning algorithms at predicting ovarian cancer on a biomarker dataset. Additionally, this study provides model explanations using explainable Artificial Intelligence methods, The method involved gathering and combining 49 risk factors from 349 patients. We hypothesize that ensemble machine learning systems are superior to individual Machine Learning systems in predicting ovarian cancer. The Machine Learning system consists of five individual Machine Learning and five ensemble Machine Learning systems were trained using K-10 cross validation protocols. These training models were then used to predict the development of benign ovarian tumors and ovarian cancer tumors patients. The AUC and Accuracy metrics for ensemble machine learning increased by 19% and 16%. The MCC and Kappa scores for ensemble Machine Learning also increased over individual machine learning by 29% and 33%, respectively. As a result, we draw the conclusion that ensembled-based algorithms outperform individual machine learning in terms of ovarian carcinoma prediction

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

    SMOTified-GAN for class imbalanced pattern classification problems

    Get PDF
    Class imbalance in a dataset is a major problem for classifiers that results in poor prediction with a high true positive rate (TPR) but a low true negative rate (TNR) for a majority positive training dataset. Generally, the pre-processing technique of oversampling of minority class(es) are used to overcome this deficiency. Our focus is on using the hybridization of Generative Adversarial Network (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE) to address class imbalanced problems. We propose a novel two-phase oversampling approach involving knowledge transfer that has the synergy of SMOTE and GAN. The unrealistic or overgeneralized samples of SMOTE are transformed into realistic distribution of data by GAN where there is not enough minority class data available for GAN to process them by itself effectively. We named it SMOTified-GAN as GAN works on pre-sampled minority data produced by SMOTE rather than randomly generating the samples itself. The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets. Its performance is improved by up to 9\% from the next best algorithm tested on F1-score measurements. Its time complexity is also reasonable which is around O(N2d2T) for a sequential algorithm
    • …
    corecore