538,109 research outputs found

    Robustness of Random Forest-based gene selection methods

    Full text link
    Gene selection is an important part of microarray data analysis because it provides information that can lead to a better mechanistic understanding of an investigated phenomenon. At the same time, gene selection is very difficult because of the noisy nature of microarray data. As a consequence, gene selection is often performed with machine learning methods. The Random Forest method is particularly well suited for this purpose. In this work, four state-of-the-art Random Forest-based feature selection methods were compared in a gene selection context. The analysis focused on the stability of selection because, although it is necessary for determining the significance of results, it is often ignored in similar studies. The comparison of post-selection accuracy in the validation of Random Forest classifiers revealed that all investigated methods were equivalent in this context. However, the methods substantially differed with respect to the number of selected genes and the stability of selection. Of the analysed methods, the Boruta algorithm predicted the most genes as potentially important. The post-selection classifier error rate, which is a frequently used measure, was found to be a potentially deceptive measure of gene selection quality. When the number of consistently selected genes was considered, the Boruta algorithm was clearly the best. Although it was also the most computationally intensive method, the Boruta algorithm's computational demands could be reduced to levels comparable to those of other algorithms by replacing the Random Forest importance with a comparable measure from Random Ferns (a similar but simplified classifier). Despite their design assumptions, the minimal optimal selection methods, were found to select a high fraction of false positives

    Cortical topography of intracortical inhibition influences the speed of decision making

    Get PDF
    The neocortex contains orderly topographic maps; however, their functional role remains controversial. Theoretical studies have suggested a role in minimizing computational costs, whereas empirical studies have focused on spatial localization. Using a tactile multiple-choice reaction time (RT) task before and after the induction of perceptual learning through repetitive sensory stimulation, we extend the framework of cortical topographies by demonstrating that the topographic arrangement of intracortical inhibition contributes to the speed of human perceptual decision-making processes. RTs differ among fingers, displaying an inverted U-shaped function. Simulations using neural fields show the inverted U-shaped RT distribution as an emergent consequence of lateral inhibition. Weakening inhibition through learning shortens RTs, which is modeled through topographically reorganized inhibition. Whereas changes in decision making are often regarded as an outcome of higher cortical areas, our data show that the spatial layout of interaction processes within representational maps contributes to selection and decision-making processes

    Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression

    Get PDF
    Ordinary linear and generalized linear regression models relate the mean of a response variable to a linear combination of covariate effects and, as a consequence, focus on average properties of the response. Analyzing childhood malnutrition in developing or transition countries based on such a regression model implies that the estimated effects describe the average nutritional status. However, it is of even larger interest to analyze quantiles of the response distribution such as the 5% or 10% quantile that relate to the risk of children for extreme malnutrition. In this paper, we analyze data on childhood malnutrition collected in the 2005/2006 India Demographic and Health Survey based on a semiparametric extension of quantile regression models where nonlinear effects are included in the model equation, leading to additive quantile regression. The variable selection and model choice problems associated with estimating an additive quantile regression model are addressed by a novel boosting approach. Based on this rather general class of statistical learning procedures for empirical risk minimization, we develop, evaluate and apply a boosting algorithm for quantile regression. Our proposal allows for data-driven determination of the amount of smoothness required for the nonlinear effects and combines model selection with an automatic variable selection property. The results of our empirical evaluation suggest that boosting is an appropriate tool for estimation in linear and additive quantile regression models and helps to identify yet unknown risk factors for childhood malnutrition

    Anergy in self-directed B lymphocytes from a statistical mechanics perspective

    Full text link
    The ability of the adaptive immune system to discriminate between self and non-self mainly stems from the ontogenic clonal-deletion of lymphocytes expressing strong binding affinity with self-peptides. However, some self-directed lymphocytes may evade selection and still be harmless due to a mechanism called clonal anergy. As for B lymphocytes, two major explanations for anergy developed over three decades: according to "Varela theory", it stems from a proper orchestration of the whole B-repertoire, in such a way that self-reactive clones, due to intensive interactions and feed-back from other clones, display more inertia to mount a response. On the other hand, according to the `two-signal model", which has prevailed nowadays, self-reacting cells are not stimulated by helper lymphocytes and the absence of such signaling yields anergy. The first result we present, achieved through disordered statistical mechanics, shows that helper cells do not prompt the activation and proliferation of a certain sub-group of B cells, which turn out to be just those broadly interacting, hence it merges the two approaches as a whole (in particular, Varela theory is then contained into the two-signal model). As a second result, we outline a minimal topological architecture for the B-world, where highly connected clones are self-directed as a natural consequence of an ontogenetic learning; this provides a mathematical framework to Varela perspective. As a consequence of these two achievements, clonal deletion and clonal anergy can be seen as two inter-playing aspects of the same phenomenon too

    Multi-classifier prediction of knee osteoarthritis progression from incomplete imbalanced longitudinal data

    Full text link
    Conventional inclusion criteria used in osteoarthritis clinical trials are not very effective in selecting patients who would benefit from a therapy being tested. Typically majority of selected patients show no or limited disease progression during a trial period. As a consequence, the effect of the tested treatment cannot be observed, and the efforts and resources invested in running the trial are not rewarded. This could be avoided, if selection criteria were more predictive of the future disease progression. In this article, we formulated the patient selection problem as a multi-class classification task, with classes based on clinically relevant measures of progression (over a time scale typical for clinical trials). Using data from two long-term knee osteoarthritis studies OAI and CHECK, we tested multiple algorithms and learning process configurations (including multi-classifier approaches, cost-sensitive learning, and feature selection), to identify the best performing machine learning models. We examined the behaviour of the best models, with respect to prediction errors and the impact of used features, to confirm their clinical relevance. We found that the model-based selection outperforms the conventional inclusion criteria, reducing by 20-25% the number of patients who show no progression. This result might lead to more efficient clinical trials.Comment: 22 pages, 12 figures, 10 table

    Video quality prediction under time-varying loads

    Get PDF
    We are on the cusp of an era where we can responsively and adaptively predict future network performance from network device statistics in the Cloud. To make this happen, regression-based models have been applied to learn mappings between the kernel metrics of a machine in a service cluster and service quality metrics on a client machine. The path ahead requires the ability to adaptively parametrize learning algorithms for arbitrary problems and to increase computation speed. We consider methods to adaptively parametrize regularization penalties, coupled with methods for compensating for the effects of the time-varying loads present in the system, namely load-adjusted learning. The time-varying nature of networked systems gives rise to the need for faster learning models to manage them; paradoxically, models that have been applied have not explicitly accounted for their time-varying nature. Consequently previous studies have reported that the learning problems were ill-conditioned -the practical, undesirable consequence of this is variability in prediction quality. Subset selection has been proposed as a solution. We highlight the short-comings of subset selection. We demonstrate that load-adjusted learning, using a suitable adaptive regularization function, outperforms current subset selection approaches by 10% and reduces computation
    • …
    corecore