14 research outputs found
Forecasting the Index of Financial Safety (IFS) of South Africa using neural networks
This paper investigates neural network tools, especially the nonlinear autoregressive model with exogenous input (NARX), to forecast the future conditions of the Index of Financial Safety (IFS) of South Africa. Based on the time series that was used to construct the IFS for South Africa (Matkovskyy, 2012), the NARX model was built to forecast the future values of this index and the results are benchmarked against that of Bayesian Vector-Autoregressive Models. The results show that the NARX model applied to IFS of South Africa and trained by the Levenberg-Marquardt algorithm may ensure a forecast of adequate quality with less computation expanses, compared to BVAR models with different priors
Forecasting the Index of Financial Safety (IFS) of South Africa using neural networks
This paper investigates neural network tools, especially the nonlinear autoregressive model with exogenous input (NARX), to forecast the future conditions of the Index of Financial Safety (IFS) of South Africa. Based on the time series that was used to construct the IFS for South Africa (Matkovskyy, 2012), the NARX model was built to forecast the future values of this index and the results are benchmarked against that of Bayesian Vector-Autoregressive Models. The results show that the NARX model applied to IFS of South Africa and trained by the Levenberg-Marquardt algorithm may ensure a forecast of adequate quality with less computation expanses, compared to BVAR models with different priors
VC-dimension of univariate decision trees
PubMed ID: 25594983In this paper, we give and prove the lower bounds of the Vapnik-Chervonenkis (VC)-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. Via a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively, we show that our VC-dimension bounds are tight for simple trees. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using structural risk minimization in decision trees, i.e., pruning. Our simulation results show that structural risk minimization pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross validation.Publisher's VersionAuthor Post Prin
Resolution of similar patterns in a solvable model of unsupervised deep learning with structured data
Empirical data, on which deep learning relies, has substantial internal
structure, yet prevailing theories often disregard this aspect. Recent research
has led to the definition of structured data ensembles, aimed at equipping
established theoretical frameworks with interpretable structural elements, a
pursuit that aligns with the broader objectives of spin glass theory. We
consider a one-parameter structured ensemble where data consists of correlated
pairs of patterns, and a simplified model of unsupervised learning, whereby the
internal representation of the training set is fixed at each layer. A mean
field solution of the model identifies a set of layer-wise recurrence equations
for the overlaps between the internal representations of an unseen input and of
the training set. The bifurcation diagram of this discrete-time dynamics is
topologically inequivalent to the unstructured one, and displays transitions
between different phases, selected by varying the load (the number of training
pairs divided by the width of the network). The network's ability to resolve
different patterns undergoes a discontinuous transition to a phase where signal
processing along the layers dissipates differential information about an
input's proximity to the different patterns in a pair. A critical value of the
parameter tuning the correlations separates regimes where data structure
improves or hampers the identification of a given pair of patterns
What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation
One of the most important aspects of any machine learning paradigm is
how it scales according to problem size and complexity. Using a task
with known optimal training error, and a pre-specified maximum number
of training updates, we investigate the convergence of the
backpropagation algorithm with respect to a) the complexity of the
required function approximation, b) the size of the network in
relation to the size required for an optimal solution, and c) the
degree of noise in the training data. In general, for a) the solution
found is worse when the function to be approximated is more complex,
for b) oversize networks can result in lower training and
generalization error, and for c) the use of committee or ensemble
techniques can be more beneficial as the amount of noise in the
training data is increased. For the experiments we performed, we do
not obtain the optimal solution in any case. We further support the
observation that larger networks can produce better training and
generalization error using a face recognition example where a network
with many more parameters than training points generalizes better than
smaller networks.
(Also cross-referenced as UMIACS-TR-96-22
Characterizing Rational versus Exponential Learning Curves
AbstractWe consider the standard problem of learning a concept from random examples. Here alearning curveis defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone, and Warmuth have shown that, in the distribution-free setting, the smallest expected error a learner can achieve in the worst case over a class of conceptsCconverges rationally to zero error; i.e.,Θ(t−1) in the training sample sizet. However, Cohn and Tesauro have recently demonstrated thatexponentialconvergence can often be observed in experimental settings (i.e., average error decreasing aseΘ−t)). By addressing a simple non-uniformity in the original analysis this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution-free theory. In particular, our results support the experimental findings of Cohn and Tesauro: for finite concept classes any consistent learner achieves exponential convergence, even in the worst case, whereas for continuous concept classes no learner can exhibit sub-rational convergence for every target concept and domain distribution. We also draw a precise boundary between rational and exponential convergence for simple concept chains—showing that somewhere-dense chains always force rational convergence in the worst case, while exponential convergence can always be achieved for nowhere-dense chains