14,895 research outputs found
A Comprehensive Approach to Universal Piecewise Nonlinear Regression Based on Trees
Cataloged from PDF version of article.In this paper, we investigate adaptive nonlinear
regression and introduce tree based piecewise linear regression
algorithms that are highly efficient and provide significantly
improved performance with guaranteed upper bounds in an
individual sequence manner. We use a tree notion in order to
partition the space of regressors in a nested structure. The introduced
algorithms adapt not only their regression functions but
also the complete tree structure while achieving the performance
of the “best” linear mixture of a doubly exponential number
of partitions, with a computational complexity only polynomial
in the number of nodes of the tree. While constructing these
algorithms, we also avoid using any artificial “weighting” of
models (with highly data dependent parameters) and, instead,
directly minimize the final regression error, which is the ultimate
performance goal. The introduced methods are generic such that
they can readily incorporate different tree construction methods
such as random trees in their framework and can use different
regressor or partitioning functions as demonstrated in the paper
Theoretical Interpretations and Applications of Radial Basis Function Networks
Medical applications usually used Radial Basis Function Networks just as Artificial Neural Networks. However, RBFNs are Knowledge-Based Networks that can be interpreted in several way: Artificial Neural Networks, Regularization Networks, Support Vector Machines, Wavelet Networks, Fuzzy Controllers, Kernel Estimators, Instanced-Based Learners. A survey of their interpretations and of their corresponding learning algorithms is provided as well as a brief survey on dynamic learning algorithms. RBFNs' interpretations can suggest applications that are particularly interesting in medical domains
Crop Yield Prediction Using Deep Neural Networks
Crop yield is a highly complex trait determined by multiple factors such as
genotype, environment, and their interactions. Accurate yield prediction
requires fundamental understanding of the functional relationship between yield
and these interactive factors, and to reveal such relationship requires both
comprehensive datasets and powerful algorithms. In the 2018 Syngenta Crop
Challenge, Syngenta released several large datasets that recorded the genotype
and yield performances of 2,267 maize hybrids planted in 2,247 locations
between 2008 and 2016 and asked participants to predict the yield performance
in 2017. As one of the winning teams, we designed a deep neural network (DNN)
approach that took advantage of state-of-the-art modeling and solution
techniques. Our model was found to have a superior prediction accuracy, with a
root-mean-square-error (RMSE) being 12% of the average yield and 50% of the
standard deviation for the validation dataset using predicted weather data.
With perfect weather data, the RMSE would be reduced to 11% of the average
yield and 46% of the standard deviation. We also performed feature selection
based on the trained DNN model, which successfully decreased the dimension of
the input space without significant drop in the prediction accuracy. Our
computational results suggested that this model significantly outperformed
other popular methods such as Lasso, shallow neural networks (SNN), and
regression tree (RT). The results also revealed that environmental factors had
a greater effect on the crop yield than genotype.Comment: 9 pages, Presented at 2018 INFORMS Conference on Business Analytics
and Operations Research (Baltimore, MD, USA). One of the winning solutions to
the 2018 Syngenta Crop Challeng
From patterned response dependency to structured covariate dependency: categorical-pattern-matching
Data generated from a system of interest typically consists of measurements
from an ensemble of subjects across multiple response and covariate features,
and is naturally represented by one response-matrix against one
covariate-matrix. Likely each of these two matrices simultaneously embraces
heterogeneous data types: continuous, discrete and categorical. Here a matrix
is used as a practical platform to ideally keep hidden dependency among/between
subjects and features intact on its lattice. Response and covariate dependency
is individually computed and expressed through mutliscale blocks via a newly
developed computing paradigm named Data Mechanics. We propose a categorical
pattern matching approach to establish causal linkages in a form of information
flows from patterned response dependency to structured covariate dependency.
The strength of an information flow is evaluated by applying the combinatorial
information theory. This unified platform for system knowledge discovery is
illustrated through five data sets. In each illustrative case, an information
flow is demonstrated as an organization of discovered knowledge loci via
emergent visible and readable heterogeneity. This unified approach
fundamentally resolves many long standing issues, including statistical
modeling, multiple response, renormalization and feature selections, in data
analysis, but without involving man-made structures and distribution
assumptions. The results reported here enhance the idea that linking patterns
of response dependency to structures of covariate dependency is the true
philosophical foundation underlying data-driven computing and learning in
sciences.Comment: 32 pages, 10 figures, 3 box picture
Kernel methods in machine learning
We review machine learning methods employing positive definite kernels. These
methods formulate learning and estimation problems in a reproducing kernel
Hilbert space (RKHS) of functions defined on the data domain, expanded in terms
of a kernel. Working in linear spaces of function has the benefit of
facilitating the construction and analysis of learning algorithms while at the
same time allowing large classes of functions. The latter include nonlinear
functions as well as functions defined on nonvectorial data. We cover a wide
range of methods, ranging from binary classifiers to sophisticated methods for
estimation with structured data.Comment: Published in at http://dx.doi.org/10.1214/009053607000000677 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …