538,109 research outputs found
Robustness of Random Forest-based gene selection methods
Gene selection is an important part of microarray data analysis because it
provides information that can lead to a better mechanistic understanding of an
investigated phenomenon. At the same time, gene selection is very difficult
because of the noisy nature of microarray data. As a consequence, gene
selection is often performed with machine learning methods. The Random Forest
method is particularly well suited for this purpose. In this work, four
state-of-the-art Random Forest-based feature selection methods were compared in
a gene selection context. The analysis focused on the stability of selection
because, although it is necessary for determining the significance of results,
it is often ignored in similar studies.
The comparison of post-selection accuracy in the validation of Random Forest
classifiers revealed that all investigated methods were equivalent in this
context. However, the methods substantially differed with respect to the number
of selected genes and the stability of selection. Of the analysed methods, the
Boruta algorithm predicted the most genes as potentially important.
The post-selection classifier error rate, which is a frequently used measure,
was found to be a potentially deceptive measure of gene selection quality. When
the number of consistently selected genes was considered, the Boruta algorithm
was clearly the best. Although it was also the most computationally intensive
method, the Boruta algorithm's computational demands could be reduced to levels
comparable to those of other algorithms by replacing the Random Forest
importance with a comparable measure from Random Ferns (a similar but
simplified classifier). Despite their design assumptions, the minimal optimal
selection methods, were found to select a high fraction of false positives
Cortical topography of intracortical inhibition influences the speed of decision making
The neocortex contains orderly topographic maps; however, their functional role remains controversial. Theoretical studies have suggested a role in minimizing computational costs, whereas empirical studies have focused on spatial localization. Using a tactile multiple-choice reaction time (RT) task before and after the induction of perceptual learning through repetitive sensory stimulation, we extend the framework of cortical topographies by demonstrating that the topographic arrangement of intracortical inhibition contributes to the speed of human perceptual decision-making processes. RTs differ among fingers, displaying an inverted U-shaped function. Simulations using neural fields show the inverted U-shaped RT distribution as an emergent consequence of lateral inhibition. Weakening inhibition through learning shortens RTs, which is modeled through topographically reorganized inhibition. Whereas changes in decision making are often regarded as an outcome of higher cortical areas, our data show that the spatial layout of interaction processes within representational maps contributes to selection and decision-making processes
Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression
Ordinary linear and generalized linear regression models relate the mean of a response variable to a linear combination of covariate effects and, as a consequence, focus on average properties of the response. Analyzing childhood malnutrition in developing or transition countries based on such a regression model implies that the estimated effects describe the average nutritional status. However, it is of even larger interest to analyze quantiles of the response distribution such as the 5% or 10% quantile that relate to the risk of children for extreme malnutrition. In this paper, we analyze data on childhood malnutrition collected in the 2005/2006 India Demographic and Health Survey based on a semiparametric extension of quantile
regression models where nonlinear effects are included in the model equation, leading to additive quantile regression. The variable selection and model choice problems associated with estimating an additive quantile regression model are addressed by a novel boosting approach. Based on this rather general class of statistical learning procedures for empirical risk minimization, we develop, evaluate and apply a boosting algorithm for quantile regression. Our proposal allows for data-driven determination of the amount of smoothness required for the nonlinear effects and combines model selection with an automatic variable selection property. The results of our empirical evaluation suggest that boosting is an appropriate tool for estimation in linear and additive quantile regression models and helps to identify yet unknown risk factors for childhood malnutrition
Anergy in self-directed B lymphocytes from a statistical mechanics perspective
The ability of the adaptive immune system to discriminate between self and
non-self mainly stems from the ontogenic clonal-deletion of lymphocytes
expressing strong binding affinity with self-peptides. However, some
self-directed lymphocytes may evade selection and still be harmless due to a
mechanism called clonal anergy. As for B lymphocytes, two major explanations
for anergy developed over three decades: according to "Varela theory", it stems
from a proper orchestration of the whole B-repertoire, in such a way that
self-reactive clones, due to intensive interactions and feed-back from other
clones, display more inertia to mount a response. On the other hand, according
to the `two-signal model", which has prevailed nowadays, self-reacting cells
are not stimulated by helper lymphocytes and the absence of such signaling
yields anergy. The first result we present, achieved through disordered
statistical mechanics, shows that helper cells do not prompt the activation and
proliferation of a certain sub-group of B cells, which turn out to be just
those broadly interacting, hence it merges the two approaches as a whole (in
particular, Varela theory is then contained into the two-signal model). As a
second result, we outline a minimal topological architecture for the B-world,
where highly connected clones are self-directed as a natural consequence of an
ontogenetic learning; this provides a mathematical framework to Varela
perspective. As a consequence of these two achievements, clonal deletion and
clonal anergy can be seen as two inter-playing aspects of the same phenomenon
too
Multi-classifier prediction of knee osteoarthritis progression from incomplete imbalanced longitudinal data
Conventional inclusion criteria used in osteoarthritis clinical trials are
not very effective in selecting patients who would benefit from a therapy being
tested. Typically majority of selected patients show no or limited disease
progression during a trial period. As a consequence, the effect of the tested
treatment cannot be observed, and the efforts and resources invested in running
the trial are not rewarded. This could be avoided, if selection criteria were
more predictive of the future disease progression.
In this article, we formulated the patient selection problem as a multi-class
classification task, with classes based on clinically relevant measures of
progression (over a time scale typical for clinical trials). Using data from
two long-term knee osteoarthritis studies OAI and CHECK, we tested multiple
algorithms and learning process configurations (including multi-classifier
approaches, cost-sensitive learning, and feature selection), to identify the
best performing machine learning models. We examined the behaviour of the best
models, with respect to prediction errors and the impact of used features, to
confirm their clinical relevance. We found that the model-based selection
outperforms the conventional inclusion criteria, reducing by 20-25% the number
of patients who show no progression. This result might lead to more efficient
clinical trials.Comment: 22 pages, 12 figures, 10 table
Video quality prediction under time-varying loads
We are on the cusp of an era where we can responsively and adaptively predict future network performance from network device statistics in the Cloud. To make this happen, regression-based models have been applied to learn mappings between the kernel metrics of a machine in a service cluster and service quality metrics on a client machine. The path ahead requires the ability to adaptively parametrize learning algorithms for arbitrary problems and to increase computation speed. We consider methods to adaptively parametrize regularization penalties, coupled with methods for compensating for the effects of the time-varying loads present in the system, namely load-adjusted learning. The time-varying nature of networked systems gives rise to the need for faster learning models to manage them; paradoxically, models that have been applied have not explicitly accounted for their time-varying nature. Consequently previous studies have reported that the learning problems were ill-conditioned -the practical, undesirable consequence of this is variability in prediction quality. Subset selection has been proposed as a solution. We highlight the short-comings of subset selection. We demonstrate that load-adjusted learning, using a suitable adaptive regularization function, outperforms current subset selection approaches by 10% and reduces computation
- âŚ