138 research outputs found
HypeRS: Building a Hypergraph-driven ensemble Recommender System
Recommender systems are designed to predict user preferences over collections
of items. These systems process users' previous interactions to decide which
items should be ranked higher to satisfy their desires. An ensemble recommender
system can achieve great recommendation performance by effectively combining
the decisions generated by individual models. In this paper, we propose a novel
ensemble recommender system that combines predictions made by different models
into a unified hypergraph ranking framework. This is the first time that
hypergraph ranking has been employed to model an ensemble of recommender
systems. Hypergraphs are generalizations of graphs where multiple vertices can
be connected via hyperedges, efficiently modeling high-order relations. We
differentiate real and predicted connections between users and items by
assigning different hyperedge weights to individual recommender systems. We
perform experiments using four datasets from the fields of movie, music and
news media recommendation. The obtained results show that the ensemble
hypergraph ranking method generates more accurate recommendations compared to
the individual models and a weighted hybrid approach. The assignment of
different hyperedge weights to the ensemble hypergraph further improves the
performance compared to a setting with identical hyperedge weights
Deep tree-ensembles for multi-output prediction
Recently, deep neural networks have expanded the state-of-art in various
scientific fields and provided solutions to long standing problems across
multiple application domains. Nevertheless, they also suffer from weaknesses
since their optimal performance depends on massive amounts of training data and
the tuning of an extended number of parameters. As a countermeasure, some
deep-forest methods have been recently proposed, as efficient and low-scale
solutions. Despite that, these approaches simply employ label classification
probabilities as induced features and primarily focus on traditional
classification and regression tasks, leaving multi-output prediction
under-explored. Moreover, recent work has demonstrated that tree-embeddings are
highly representative, especially in structured output prediction. In this
direction, we propose a novel deep tree-ensemble (DTE) model, where every layer
enriches the original feature set with a representation learning component
based on tree-embeddings. In this paper, we specifically focus on two
structured output prediction tasks, namely multi-label classification and
multi-target regression. We conducted experiments using multiple benchmark
datasets and the obtained results confirm that our method provides superior
results to state-of-the-art methods in both tasks
A machine learning based framework to identify and classify long terminal repeat retrotransposons
Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-LEARNER, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: REPEATMASKER, CENSOR and LTRDIGEST. In contrast to these methods, TE-LEARNER is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance , while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-LEARNER'S predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE
Predicting adverse long-term neurocognitive outcomes after pediatric intensive care unit admission
Background and objective: Critically ill children may suffer from impaired neurocognitive functions years after ICU (intensive care unit) discharge. To assess neurocognitive functions, these children are subjected to a fixed sequence of tests. Undergoing all tests is, however, arduous for former pediatric ICU patients, resulting in interrupted evaluations where several neurocognitive deficiencies remain undetected. As a solution, we propose using machine learning to predict the optimal order of tests for each child, reducing the number of tests required to identify the most severe neurocognitive deficiencies. Methods: We have compared the current clinical approach against several machine learning methods, mainly multi-target regression and label ranking methods. We have also proposed a new method that builds several multi-target predictive models and combines the outputs into a ranking that prioritizes the worse neurocognitive outcomes. We used data available at discharge, from children who participated in the PEPaNIC-RCT trial (ClinicalTrials.gov-NCT01536275), as well as data from a 2-year follow-up study. The institutional review boards at each participating site have also approved this follow-up study (ML8052; NL49708.078; Pro00038098). Results: Our proposed method managed to outperform other machine learning methods and also the current clinical practice. Precisely, our method reaches approximately 80% precision when considering top-4 outcomes, in comparison to 65% and 78% obtained by the current clinical practice and the state-of-the-art method in label ranking, respectively. Conclusions: Our experiments demonstrated that machine learning can be competitive or even superior to the current testing order employed in clinical practice, suggesting that our model can be used to severely reduce the number of tests necessary for each child. Moreover, the results indicate that possible long-term adverse outcomes are already predictable as early as at ICU discharge. Thus, our work can be seen as the first step to allow more personalized follow-up after ICU discharge leading to preventive care rather than curative.</p
Predicting adverse long-term neurocognitive outcomes after pediatric intensive care unit admission
Background and objective: Critically ill children may suffer from impaired neurocognitive functions years after ICU (intensive care unit) discharge. To assess neurocognitive functions, these children are subjected to a fixed sequence of tests. Undergoing all tests is, however, arduous for former pediatric ICU patients, resulting in interrupted evaluations where several neurocognitive deficiencies remain undetected. As a solution, we propose using machine learning to predict the optimal order of tests for each child, reducing the number of tests required to identify the most severe neurocognitive deficiencies. Methods: We have compared the current clinical approach against several machine learning methods, mainly multi-target regression and label ranking methods. We have also proposed a new method that builds several multi-target predictive models and combines the outputs into a ranking that prioritizes the worse neurocognitive outcomes. We used data available at discharge, from children who participated in the PEPaNIC-RCT trial (ClinicalTrials.gov-NCT01536275), as well as data from a 2-year follow-up study. The institutional review boards at each participating site have also approved this follow-up study (ML8052; NL49708.078; Pro00038098). Results: Our proposed method managed to outperform other machine learning methods and also the current clinical practice. Precisely, our method reaches approximately 80% precision when considering top-4 outcomes, in comparison to 65% and 78% obtained by the current clinical practice and the state-of-the-art method in label ranking, respectively. Conclusions: Our experiments demonstrated that machine learning can be competitive or even superior to the current testing order employed in clinical practice, suggesting that our model can be used to severely reduce the number of tests necessary for each child. Moreover, the results indicate that possible long-term adverse outcomes are already predictable as early as at ICU discharge. Thus, our work can be seen as the first step to allow more personalized follow-up after ICU discharge leading to preventive care rather than curative.</p
Predicting gene function using hierarchical multi-label decision tree ensembles
<p>Abstract</p> <p>Background</p> <p><it>S. cerevisiae</it>, <it>A. thaliana </it>and <it>M. musculus </it>are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability.</p> <p>Results</p> <p>We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use.</p> <p>Conclusions</p> <p>Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.</p
Complex aggregates in relational learning
In relational learning, one learns patterns from relational databases, which usually contain multiple tables that are interconnected via relations. Thus, an example for which a prediction is to be given may be related to a set of objects that are possibly relevant for that prediction.
Relational classifiers differ with respect to how they handle these sets: some use
properties of the set as a whole (using aggregation), some refer to
properties of specific individuals, however, most classifiers do not combine both. This imposes an undesirable bias on these learners. This dissertation describes a learning approach that
avoids this bias, using complex aggregates, i.e., aggregates that impose selection conditions on the set to aggregate on.status: publishe
- …