367 research outputs found
Multivariate Analysis from a Statistical Point of View
Multivariate Analysis is an increasingly common tool in experimental high
energy physics; however, many of the common approaches were borrowed from other
fields. We clarify what the goal of a multivariate algorithm should be for the
search for a new particle and compare different approaches. We also translate
the Neyman-Pearson theory into the language of statistical learning theory.Comment: Talk from PhyStat2003, Stanford, Ca, USA, September 2003, 4 pages,
LaTeX, 1 eps figures. PSN WEJT00
Fast rates in statistical and online learning
The speed with which a learning algorithm converges as it is presented with
more data is a central problem in machine learning --- a fast rate of
convergence means less data is needed for the same level of performance. The
pursuit of fast rates in online and statistical learning has led to the
discovery of many conditions in learning theory under which fast learning is
possible. We show that most of these conditions are special cases of a single,
unifying condition, that comes in two forms: the central condition for 'proper'
learning algorithms that always output a hypothesis in the given model, and
stochastic mixability for online algorithms that may make predictions outside
of the model. We show that under surprisingly weak assumptions both conditions
are, in a certain sense, equivalent. The central condition has a
re-interpretation in terms of convexity of a set of pseudoprobabilities,
linking it to density estimation under misspecification. For bounded losses, we
show how the central condition enables a direct proof of fast rates and we
prove its equivalence to the Bernstein condition, itself a generalization of
the Tsybakov margin condition, both of which have played a central role in
obtaining fast rates in statistical learning. Yet, while the Bernstein
condition is two-sided, the central condition is one-sided, making it more
suitable to deal with unbounded losses. In its stochastic mixability form, our
condition generalizes both a stochastic exp-concavity condition identified by
Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying
conditions thus provide a substantial step towards a characterization of fast
rates in statistical learning, similar to how classical mixability
characterizes constant regret in the sequential prediction with expert advice
setting.Comment: 69 pages, 3 figure
What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation
One of the most important aspects of any machine learning paradigm is
how it scales according to problem size and complexity. Using a task
with known optimal training error, and a pre-specified maximum number
of training updates, we investigate the convergence of the
backpropagation algorithm with respect to a) the complexity of the
required function approximation, b) the size of the network in
relation to the size required for an optimal solution, and c) the
degree of noise in the training data. In general, for a) the solution
found is worse when the function to be approximated is more complex,
for b) oversize networks can result in lower training and
generalization error, and for c) the use of committee or ensemble
techniques can be more beneficial as the amount of noise in the
training data is increased. For the experiments we performed, we do
not obtain the optimal solution in any case. We further support the
observation that larger networks can produce better training and
generalization error using a face recognition example where a network
with many more parameters than training points generalizes better than
smaller networks.
(Also cross-referenced as UMIACS-TR-96-22
Automatic solar feature detection using image processing and pattern recognition techniques
The objective of the research in this dissertation is to develop a software system to automatically detect and characterize solar flares, filaments and Corona Mass Ejections (CMEs), the core of so-called solar activity. These tools will assist us to predict space weather caused by violent solar activity. Image processing and pattern recognition techniques are applied to this system.
For automatic flare detection, the advanced pattern recognition techniques such as Multi-Layer Perceptron (MLP), Radial Basis Function (RBF), and Support Vector Machine (SVM) are used. By tracking the entire process of flares, the motion properties of two-ribbon flares are derived automatically. In the applications of the solar filament detection, the Stabilized Inverse Diffusion Equation (SIDE) is used to enhance and sharpen filaments; a new method for automatic threshold selection is proposed to extract filaments from background; an SVM classifier with nine input features is used to differentiate between sunspots and filaments. Once a filament is identified, morphological thinning, pruning, and adaptive edge linking methods are applied to determine filament properties. Furthermore, a filament matching method is proposed to detect filament disappearance. The automatic detection and characterization of flares and filaments have been successfully applied on Hα full-disk images that are continuously obtained at Big Bear Solar Observatory (BBSO). For automatically detecting and classifying CMEs, the image enhancement, segmentation, and pattern recognition techniques are applied to Large Angle Spectrometric Coronagraph (LASCO) C2 and C3 images.
The processed LASCO and BBSO images are saved to file archive, and the physical properties of detected solar features such as intensity and speed are recorded in our database. Researchers are able to access the solar feature database and analyze the solar data efficiently and effectively. The detection and characterization system greatly improves the ability to monitor the evolution of solar events and has potential to be used to predict the space weather
A Model of Inductive Bias Learning
A major problem in machine learning is that of inductive bias: how to choose
a learner's hypothesis space so that it is large enough to contain a solution
to the problem being learnt, yet small enough to ensure reliable generalization
from reasonably-sized training sets. Typically such bias is supplied by hand
through the skill and insights of experts. In this paper a model for
automatically learning bias is investigated. The central assumption of the
model is that the learner is embedded within an environment of related learning
tasks. Within such an environment the learner can sample from multiple tasks,
and hence it can search for a hypothesis space that contains good solutions to
many of the problems in the environment. Under certain restrictions on the set
of all hypothesis spaces available to the learner, we show that a hypothesis
space that performs well on a sufficiently large number of training tasks will
also perform well when learning novel tasks in the same environment. Explicit
bounds are also derived demonstrating that learning multiple tasks within an
environment of related tasks can potentially give much better generalization
than learning a single task
- …