9,549 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository

    Full text link
    Machine learning qualifies computers to assimilate with data, without being solely programmed [1, 2]. Machine learning can be classified as supervised and unsupervised learning. In supervised learning, computers learn an objective that portrays an input to an output hinged on training input-output pairs [3]. Most efficient and widely used supervised learning algorithms are K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Large Margin Nearest Neighbor (LMNN), and Extended Nearest Neighbor (ENN). The main contribution of this paper is to implement these elegant learning algorithms on eleven different datasets from the UCI machine learning repository to observe the variation of accuracies for each of the algorithms on all datasets. Analyzing the accuracy of the algorithms will give us a brief idea about the relationship of the machine learning algorithms and the data dimensionality. All the algorithms are developed in Matlab. Upon such accuracy observation, the comparison can be built among KNN, SVM, LMNN, and ENN regarding their performances on each dataset.Comment: To be published in the 4th IEEE International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT 2018

    Adversarial Unsupervised Representation Learning for Activity Time-Series

    Full text link
    Sufficient physical activity and restful sleep play a major role in the prevention and cure of many chronic conditions. Being able to proactively screen and monitor such chronic conditions would be a big step forward for overall health. The rapid increase in the popularity of wearable devices provides a significant new source, making it possible to track the user's lifestyle real-time. In this paper, we propose a novel unsupervised representation learning technique called activity2vec that learns and "summarizes" the discrete-valued activity time-series. It learns the representations with three components: (i) the co-occurrence and magnitude of the activity levels in a time-segment, (ii) neighboring context of the time-segment, and (iii) promoting subject-invariance with adversarial training. We evaluate our method on four disorder prediction tasks using linear classifiers. Empirical evaluation demonstrates that our proposed method scales and performs better than many strong baselines. The adversarial regime helps improve the generalizability of our representations by promoting subject invariant features. We also show that using the representations at the level of a day works the best since human activity is structured in terms of daily routinesComment: Accepted at AAAI'19. arXiv admin note: text overlap with arXiv:1712.0952

    A comparative analysis on diagnosis of diabetes mellitus using different approaches: A survey

    Get PDF
    Diabetes Mellitus is commonly known as diabetes. It is one of the most chronic diseases as the World Health Organization (WHO) report shows that the number of diabetes patients has risen from 108 million to 422 million in 2014. Early diagnosis of diabetes is important because it can cause different diseases that include kidney failure, stroke, blindness, heart attacks, and lower limb amputation. Different diabetes diagnosis models are found in literature, but there is still a need to perform a survey to analyze which model is best. This paper performs a literature review for diabetes diagnosis approaches using Artificial Intelligence (neural networks, machine learning, deep learning, hybrid methods, and/or stacked-integrated use of different machine learning algorithms). More than thirty-five papers have been shortlisted that focus on diabetes diagnosis approaches. Different datasets are available online for the diagnosis of diabetes. Pima Indian Diabetes Dataset (PIDD) is the most commonly used for diabetes prediction. In contrast with other datasets, it has key factors which play an important role in diabetes diagnosis. This survey also throws light on the weaknesses of the existing approaches that make them less appropriate for a diabetes diagnosis. In artificial intelligence techniques, deep learning is widespread and in medical research, heart rate is getting more attention. Deep learning combined with other algorithms can give better results in diabetes diagnosis and heart rate should be used for other cardiac disease diagnoses

    An Optimized Recursive General Regression Neural Network Oracle for the Prediction and Diagnosis of Diabetes

    Get PDF
    Diabetes is a serious, chronic disease that has been seeing a rise in the number of cases and prevalence over the past few decades. It can lead to serious complications and can increase the overall risk of dying prematurely. Data-oriented prediction models have become effective tools that help medical decision-making and diagnoses in which the use of machine learning in medicine has increased substantially. This research introduces the Recursive General Regression Neural Network Oracle (R-GRNN Oracle) and is applied on the Pima Indians Diabetes dataset for the prediction and diagnosis of diabetes. The R-GRNN Oracle (Bani-Hani, 2017) is an enhancement to the GRNN Oracle developed by Masters et al. in 1998, in which the recursive model is created of two oracles: one within the other. Several classifiers, along with the R-GRNN Oracle and the GRNN Oracle, are applied to the dataset, they are: Support Vector Machine (SVM), Multilayer Perceptron (MLP), Probabilistic Neural Network (PNN), Gaussian NaEF;ve Bayes (GNB), K-Nearest Neighbor (KNN), and Random Forest (RF). Genetic Algorithm (GA) was used for feature selection as well as the hyperparameter optimization of SVM and MLP, and Grid Search (GS) was used to optimize the hyperparameters of KNN and RF. The performance metrics accuracy, AUC, sensitivity, and specificity were recorded for each classifier. The R-GRNN Oracle was able to achieve the highest accuracy, AUC, and sensitivity (81.14%, 86.03%, and 63.80%, respectively), while the optimized MLP had the highest specificity (89.71%)
    • …
    corecore