2,623 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Adaptive imputation of missing values for incomplete pattern classification
In classification of incomplete pattern, the missing values can either play a
crucial role in the class determination, or have only little influence (or
eventually none) on the classification results according to the context. We
propose a credal classification method for incomplete pattern with adaptive
imputation of missing values based on belief function theory. At first, we try
to classify the object (incomplete pattern) based only on the available
attribute values. As underlying principle, we assume that the missing
information is not crucial for the classification if a specific class for the
object can be found using only the available information. In this case, the
object is committed to this particular class. However, if the object cannot be
classified without ambiguity, it means that the missing values play a main role
for achieving an accurate classification. In this case, the missing values will
be imputed based on the K-nearest neighbor (K-NN) and self-organizing map (SOM)
techniques, and the edited pattern with the imputation is then classified. The
(original or edited) pattern is respectively classified according to each
training class, and the classification results represented by basic belief
assignments are fused with proper combination rules for making the credal
classification. The object is allowed to belong with different masses of belief
to the specific classes and meta-classes (which are particular disjunctions of
several single classes). The credal classification captures well the
uncertainty and imprecision of classification, and reduces effectively the rate
of misclassifications thanks to the introduction of meta-classes. The
effectiveness of the proposed method with respect to other classical methods is
demonstrated based on several experiments using artificial and real data sets
Predicting Clinical Events by Combining Static and Dynamic Information Using Recurrent Neural Networks
In clinical data sets we often find static information (e.g. patient gender,
blood type, etc.) combined with sequences of data that are recorded during
multiple hospital visits (e.g. medications prescribed, tests performed, etc.).
Recurrent Neural Networks (RNNs) have proven to be very successful for
modelling sequences of data in many areas of Machine Learning. In this work we
present an approach based on RNNs, specifically designed for the clinical
domain, that combines static and dynamic information in order to predict future
events. We work with a database collected in the Charit\'{e} Hospital in Berlin
that contains complete information concerning patients that underwent a kidney
transplantation. After the transplantation three main endpoints can occur:
rejection of the kidney, loss of the kidney and death of the patient. Our goal
is to predict, based on information recorded in the Electronic Health Record of
each patient, whether any of those endpoints will occur within the next six or
twelve months after each visit to the clinic. We compared different types of
RNNs that we developed for this work, with a model based on a Feedforward
Neural Network and a Logistic Regression model. We found that the RNN that we
developed based on Gated Recurrent Units provides the best performance for this
task. We also used the same models for a second task, i.e., next event
prediction, and found that here the model based on a Feedforward Neural Network
outperformed the other models. Our hypothesis is that long-term dependencies
are not as relevant in this task
Recommended from our members
The effect of missing values using genetic programming on evolvable diagnosis
Medical databases usually contain missing values due the policy of
reducing stress and harm to the patient. In practice missing values has been a
problem mainly due to the necessity to evaluate mathematical equations obtained
by genetic programming. The solution to this problem is to use fill in methods to
estimate the missing values. This paper analyses three fill in methods: (1) attribute
means, (2) conditional means, and (3) random number generation. The methods
are evaluated using sensitivity, specificity, and entropy to explain the exchange in
knowledge of the results. The results are illustrated based on the breast cancer
database. Conditional means produced the best fill in experimental results
- …