Search CORE

4,101 research outputs found

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Resampling methods for parameter-free and robust feature selection with mutual information

Author: Andreas Hahn
Battiti
Bellmann
Benoudjit
Bonnlander
Conrad
Craddock
D. François
Dijck
Diks
F. Rossi
Fleuret
Frank
François
Friedman
Fung
Good
Guyon
Guyon
Hammer
Hild
Hoffman
Hummel
Kraskov
Kwak
Kwak
M. Verleysen
Nicolaou
Opdyke
Purushothaman
Rossi
Rossi
Rossi
Scott
Stefansson
V. Wertz
Verikas
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic dataset as well as on real-world examples

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

DIAL UCLouvain

sEMG based Techniques to Detect and Predict Localised Muscle Fatigue

Author: Al-Mulla MR
Colley MJ
Sepulveda F
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

University of Essex Research Repository

Selecting Negative Samples for PPI Prediction Using Hierarchical Clustering Methodology

Author: H. Pomares
I. Rojas
J. Herrera
J. M. Urquiza
J. P. Florido
O. Valenzuela
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

Protein-protein interactions (PPIs) play a crucial role in cellular processes. In the present work, a new approach is proposed to construct a PPI predictor training a support vector machine model through a mutual information filter-wrapper parallel feature selection algorithm and an iterative and hierarchical clustering to select a relevance negative training set. By means of a selected suboptimum set of features, the constructed support vector machine model is able to classify PPIs with high accuracy in any positive and negative datasets

Crossref

Directory of Open Access Journals

Forecasting day-ahead electricity prices in Europe: the importance of considering market integration

Author: De Ridder Fjo
De Schutter Bart
Lago Jesus
Vrancx Peter
Publication venue: 'Elsevier BV'
Publication date: 07/12/2017
Field of study

Motivated by the increasing integration among electricity markets, in this paper we propose two different methods to incorporate market integration in electricity price forecasting and to improve the predictive performance. First, we propose a deep neural network that considers features from connected markets to improve the predictive accuracy in a local market. To measure the importance of these features, we propose a novel feature selection algorithm that, by using Bayesian optimization and functional analysis of variance, evaluates the effect of the features on the algorithm performance. In addition, using market integration, we propose a second model that, by simultaneously predicting prices from two markets, improves the forecasting accuracy even further. As a case study, we consider the electricity market in Belgium and the improvements in forecasting accuracy when using various French electricity features. We show that the two proposed models lead to improvements that are statistically significant. Particularly, due to market integration, the predictive accuracy is improved from 15.7% to 12.5% sMAPE (symmetric mean absolute percentage error). In addition, we show that the proposed feature selection algorithm is able to perform a correct assessment, i.e. to discard the irrelevant features

arXiv.org e-Print Archive

An Integrative Approach for the Functional Analysis of Metagenomic Studies

Author: A Belanche
A Gonzalez
D McDonald
H Mark
M Sokolova
P Hugenholtz
PJ Turnbaugh
R Roehe
SB Kotsiantis
SC Schuster
T Prakash
T Thomas
V Jonsson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/07/2017
Field of study

Crossref

Ulster University's Research Portal