4,101 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Resampling methods for parameter-free and robust feature selection with mutual information
Combining the mutual information criterion with a forward feature selection
strategy offers a good trade-off between optimality of the selected feature
subset and computation time. However, it requires to set the parameter(s) of
the mutual information estimator and to determine when to halt the forward
procedure. These two choices are difficult to make because, as the
dimensionality of the subset increases, the estimation of the mutual
information becomes less and less reliable. This paper proposes to use
resampling methods, a K-fold cross-validation and the permutation test, to
address both issues. The resampling methods bring information about the
variance of the estimator, information which can then be used to automatically
set the parameter and to calculate a threshold to stop the forward procedure.
The procedure is illustrated on a synthetic dataset as well as on real-world
examples
Selecting Negative Samples for PPI Prediction Using Hierarchical Clustering Methodology
Protein-protein interactions (PPIs) play a crucial role in cellular processes. In the present work, a new approach is proposed to construct a PPI predictor training a support vector machine model through a mutual information filter-wrapper parallel feature selection algorithm and an iterative and hierarchical clustering to select a relevance negative training set. By means of a selected
suboptimum set of features, the constructed support vector machine model is able to classify PPIs with high accuracy in any positive and negative datasets
Forecasting day-ahead electricity prices in Europe: the importance of considering market integration
Motivated by the increasing integration among electricity markets, in this
paper we propose two different methods to incorporate market integration in
electricity price forecasting and to improve the predictive performance. First,
we propose a deep neural network that considers features from connected markets
to improve the predictive accuracy in a local market. To measure the importance
of these features, we propose a novel feature selection algorithm that, by
using Bayesian optimization and functional analysis of variance, evaluates the
effect of the features on the algorithm performance. In addition, using market
integration, we propose a second model that, by simultaneously predicting
prices from two markets, improves the forecasting accuracy even further. As a
case study, we consider the electricity market in Belgium and the improvements
in forecasting accuracy when using various French electricity features. We show
that the two proposed models lead to improvements that are statistically
significant. Particularly, due to market integration, the predictive accuracy
is improved from 15.7% to 12.5% sMAPE (symmetric mean absolute percentage
error). In addition, we show that the proposed feature selection algorithm is
able to perform a correct assessment, i.e. to discard the irrelevant features
- …