14,331 research outputs found
Universality of Bayesian mixture predictors
The problem is that of sequential probability forecasting for finite-valued
time series. The data is generated by an unknown probability distribution over
the space of all one-way infinite sequences. It is known that this measure
belongs to a given set C, but the latter is completely arbitrary (uncountably
infinite, without any structure given). The performance is measured with
asymptotic average log loss. In this work it is shown that the minimax
asymptotic performance is always attainable, and it is attained by a convex
combination of a countably many measures from the set C (a Bayesian mixture).
This was previously only known for the case when the best achievable asymptotic
error is 0. This also contrasts previous results that show that in the
non-realizable case all Bayesian mixtures may be suboptimal, while there is a
predictor that achieves the optimal performance
Reconciling modern machine learning practice and the bias-variance trade-off
Breakthroughs in machine learning are rapidly changing science and society,
yet our fundamental understanding of this technology has lagged far behind.
Indeed, one of the central tenets of the field, the bias-variance trade-off,
appears to be at odds with the observed behavior of methods used in the modern
machine learning practice. The bias-variance trade-off implies that a model
should balance under-fitting and over-fitting: rich enough to express
underlying structure in data, simple enough to avoid fitting spurious patterns.
However, in the modern practice, very rich models such as neural networks are
trained to exactly fit (i.e., interpolate) the data. Classically, such models
would be considered over-fit, and yet they often obtain high accuracy on test
data. This apparent contradiction has raised questions about the mathematical
foundations of machine learning and their relevance to practitioners.
In this paper, we reconcile the classical understanding and the modern
practice within a unified performance curve. This "double descent" curve
subsumes the textbook U-shaped bias-variance trade-off curve by showing how
increasing model capacity beyond the point of interpolation results in improved
performance. We provide evidence for the existence and ubiquity of double
descent for a wide spectrum of models and datasets, and we posit a mechanism
for its emergence. This connection between the performance and the structure of
machine learning models delineates the limits of classical analyses, and has
implications for both the theory and practice of machine learning
Ecological Traits Fail to Consistently Predict Moth Species Persistance in Managed Forest Stands
Species traits have been used as predictors of species extinction and colonization probabilities in fragmented landscapes. Thus far, trait-based analytical frameworks have been less commonly employed as predictive tools for species persistence following a disturbance. I tested whether life history traits, dietary traits, and functional traits were correlated with moth species persistence probabilities in forest stands subjected to varying levels of timber harvest. Three harvest treatments were used: control stands (unharvested since 1960), shelterwood cut stands (15% canopy removed), and patch cut stands (80% standing bole removed). Logistic regression models were built to assess whether species persistence probabilities were a function of species traits; separate models were constructed for each level of timber harvest treatment. Species persistence probabilities were mainly a function of pre-harvest abundances. Species traits had idiosyncratic effects on species persistence depending on the level of timber harvest employed. These results suggest that species traits may indirectly influence how moth species assemblages change as a result of forest management by determining pre-harvest abundance rather than persistence per se. The absence of significant trait effects on persistence probabilities may also reflect prior reduction in species trait space. That is, the range of species trait combinations sampled in this study was much lower than observed in historically unlogged eastern deciduous forest systems. Thus, the lack of significant trait-persistence correlations observed here might indicate historic extinctions of species from prior logging events that have not been offset by post-harvest recovery of original species assemblages
Probabilistic and fuzzy reasoning in simple learning classifier systems
This paper is concerned with the general stimulus-response problem as addressed by a variety of simple learning c1assifier systems (CSs). We suggest a theoretical model from which the assessment of uncertainty emerges as primary concern. A number of representation schemes borrowing from fuzzy logic theory are reviewed, and sorne connections with a well-known neural architecture revisited. In pursuit of the uncertainty measuring goal, usage of explicit probability distributions in the action part of c1assifiers is advocated. Sorne ideas supporting the design of a hybrid system incorpo'rating bayesian learning on top of the CS basic algorithm are sketched
- …