8 research outputs found
Forecasting with Machine Learning
For years, people have been forecasting weather patterns, economic and political events, sports outcomes, and more. In this paper we discussed the ways of using machine learning in forecasting, machine learning is a branch of computer science where algorithms learn from data. The fundamental problem for machine learning and time series is the same: to predict new outcomes based on previously known results. Using the suitable technique of machine learning depend on how much data you have, how noisy the data is, and what kind of new features can be derived from the data. But these techniques can improve accuracy and don’t have to be difficult to implement
A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition
Multi-step ahead forecasting is still an open challenge in time series
forecasting. Several approaches that deal with this complex problem have been
proposed in the literature but an extensive comparison on a large number of
tasks is still missing. This paper aims to fill this gap by reviewing existing
strategies for multi-step ahead forecasting and comparing them in theoretical
and practical terms. To attain such an objective, we performed a large scale
comparison of these different strategies using a large experimental benchmark
(namely the 111 series from the NN5 forecasting competition). In addition, we
considered the effects of deseasonalization, input variable selection, and
forecast combination on these strategies and on multi-step ahead forecasting at
large. The following three findings appear to be consistently supported by the
experimental results: Multiple-Output strategies are the best performing
approaches, deseasonalization leads to uniformly improved forecast accuracy,
and input selection is more effective when performed in conjunction with
deseasonalization
Learning for informative path planning
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 104-108).Through the combined use of regression techniques, we will learn models of the uncertainty propagation efficiently and accurately to replace computationally intensive Monte- Carlo simulations in informative path planning. This will enable us to decrease the uncertainty of the weather estimates more than current methods by enabling the evaluation of many more candidate paths given the same amount of resources. The learning method and the path planning method will be validated by the numerical experiments using the Lorenz-2003 model [32], an idealized weather model.by Sooho Park.S.M
Modelo Computacional para la Estimación de Oxígeno Disuelto en Estanques de Producción Acuícola Empleando Redes Neuronales Artificiales
Documento en formato PDFEn la actualidad, la población mundial está en constante aumento, lo que tiene como consecuencia, entre otras cosas, un mayor consumo de alimentos. Así pues, la acuicultura se ha convertido en el sector alimenticio con mayor crecimiento a nivel mundial (Lekang, 2013). Sin embargo, llevar a cabo esta actividad implica controlar diversos parámetros como el oxígeno, temperatura, salinidad, nitritos y nitratos entre otros, para mantenerlos en rangos adecuados o lo más parecido a los que se encontrarían en la naturaleza, permitiendo así obtener producciones acuícolas exitosas donde los organismos no se estresen, enfermen o mueran, y a la vez se tenga el máximo rendimiento en reproducción y crecimiento. El oxígeno disuelto es principal indicador de la calidad del agua; por ello, los acuicultores presentan especial atención a las concentraciones de este parámetro. Para evitar las fluctuaciones de este gas, inherentes a la dinámica natural de los sistemas acuícolas y los problemas que esto ocasiona a los organismos de cultivo, los productores generalmente utilizan aireación artificial a máxima potencia (potencia nominal) para complementar los suministros de oxígeno necesarios a lo largo del día (Tucker, 2005). Sin embargo, como lo sugiere Boyd (1998), el uso de la aireación máxima para lograr la mayor producción posible es menos rentable que la aireación moderada, cuando se trata de mejorar la calidad del agua y la eficiencia de conversión alimenticia. Así, una aireación convencional (máxima) trae consigo un uso, en la mayoría de las ocasiones, ineficiente del oxígeno disuelto, además de un significativo incremento en el consumo de energía de los equipos y el posible deterioro de estos al estarse activando y desactivando constantemente durante periodos prolongados de tiempo, además de posibles problemas relacionados al estrés de los organismos que esto provoca. Los sistemas de cultivo actuales, tienen la finalidad de una mayor producción de organismos en un menor espacio de cultivo, por lo que se han comenzado a desarrollar nuevas técnicas de control y formas de predicción para integrarlas dentro de sistemas de automatización comerciales con bajo costo, mínimo impacto ecológico y fácil de usar. Las mediciones de oxígeno disuelto que son tomadas cada determinado tiempo generan una serie tiempo la cual oscila estacionalmente y durante un periodo de 24 horas. Derivado de las múltiples variables que influyen en él, presenta un comportamiento complejo y no lineal, generalmente con niveles de concentración por la mañana y por la noche, contrastando en la tarde, donde se suelen encontrar niveles altos. En años recientes, las redes neuronales artificiales (RNAs) se han utilizado en problemas de estimación y predicción de series temporales en distintas disciplinas; sin embargo, son pocos los trabajos en los que se han aplicado para problemas de calidad del agua (y todos los parámetros relacionados). Su uso en predicción de series temporales de oxígeno disuelto puede permitir, entre otras cosas, encontrar las relaciones no lineales entre las variables de entrada (principalmente valores pasados de la misma serie de tiempo y valores de otras variables que influyen en la serie) y las variables de salida (valores futuros de la serie). En este trabajo de tesis se propone el desarrollo de un modelo computacional para la estimación de oxígeno disuelto utilizando RNAs, las cuales realizarán predicciones para conocer las concentraciones de este parámetro en periodos futuros de tiempo. El diseño de las RNAs está basado en Algoritmos Evolutivos (AEs), particularmente en el algoritmo llamado: Selección de Características en el Algoritmo de Programación Evolutiva de Redes Neuronales Artificiales del Inglés Feature Selection of Evolutionary Programming of Artificial Neural Networks (FS-EPNet) (Lopez et. al., 2013; Landassuri et. al., 2013), el cual determinará la arquitectura de la red, donde la función objetivo será la predicción en un lapso determinado de tiempo. Aunque el análisis de la calidad del agua se ve afectado por varios parámetros, este trabajo considera únicamente la predicción del oxígeno, utilizando dos formas de predicción: predicción a un paso adelante y predicción iterada. Esto permitirá sentar las bases para futuras investigaciones sobre predicciones multiparamétricas, análisis del estado de la calidad del agua y control predictivo de la calidad del agua
Local Learning for Iterated Time Series Prediction
We introduce and discuss a local method to learn one-step-ahead predictors for iterated time series forecasting. For each single one-step-ahead prediction, our method selects among different alternatives a local model representation on the basis of a local cross-validation procedure. In the literature, local learning is generally used for function estimation tasks which do not take temporal behaviors into account. Our technique extends this approach to the problem of long-horizon prediction by proposing a local model selection based on an iterated version of the PRESS leave-one-out statistic. In order to show the effectiveness of our method, we present the results obtained on two time series from the Santa Fe competition and on a time series proposed in a recent international contest. 1 INTRODUCTION The use of local memory-based approximators for time series analysis has been the focus of numerous studies in the literature (Farmer & Sidorowich, 1987; Yakowitz, 1987)..
Research of mixture of experts model for time series prediction
xxiv, 237 leaves :ill. ; 30 cm. Includes bibliographical references. University of Otago department: Information Science. "15 November 2005".For the prediction of chaotic time series, a dichotomy has arisen between local approaches and global approaches. Local approaches hold the reputation of simplicity and feasibility, but they generally do not produce a compact description of the underlying system and are computationally intensive. Global approaches have the advantage of requiring less computation and are able to yield a global representation of the studied time series. However, due to the complexity of the time series process, it is often not easy to construct a global model to perform the prediction precisely. In addition to these approaches, a combination of the global and local techniques, called mixture of experts (ME), is also possible, where a smaller number of models work cooperatively to implement the prediction.
This thesis reports on research about ME models for chaotic time series prediction. Based on a review of the techniques in time series prediction, a HMM-based ME model called "Timeline" Hidden Markov Experts (THME) is developed, where the trajectory of the time series is divided into some regimes in the state space and regression models called local experts are applied to learn the mapping on the regimes separately. The dynamics for the expert combination is a HMM, however, the transition probabilities are designed to be time-varying and conditional on the "real time" information of the time series. For the learning of the "time-line" HMM, a modified Baum—Welch algorithm is developed and the convergence of the algorithm is proved.
Different versions of the model, based on MLP, RBF and SVM experts, are constructed and applied to a number of chaotic time series on both one-step-ahead and multi-step-ahead predictions. Experiments show that in general THME achieves better generalization performance than the corresponding single models in one-step-ahead prediction and comparable to some published benchmarks in multi-step-ahead prediction. Various properties of THME, such as the feature selection for trajectory dividing, the clustering techniques for regime extraction, the "time-line" HMM for expert combination and the performance of the model when it has different number of experts, are investigated.
A number of interesting future directions for this work are suggested, which include the feature selection for regime extraction, the model selection for transition probability modelling, the extension to distribution prediction and the application on other time series.UnpublishedAronszajn, N. (1950), Theory of reproducing kernels. Transactions of the American
Mathematical Society, 68,337-404.
Atiya, A. F., El-Shoura, S. M., Shaheen, S. I., and El-Sherif, M. S. (1999), A
comparison between neural-network forecasting techniques — Case study: river flow
forecasting. IEEE Transactions on Neural Networks, 10(2), 402-409.
Atkeson, C. G. (1992), Memory-based approaches to approximating continuous
functions. Nonlinear Modeling and Forecasting, M. Casdagli and S. Eubank, eds.,
Addison-Wesley, New York, 503-521.
Atkeson, C. G., Moore, A. W., and Schaal, S. (1997), Locally weighted learning.
Articial Intelligence Review, 11,11-73.
Aupetit, M., Couturier, P., and Massotte, P. (2000), Function approximation with
continuous self-organizing maps using neighboring influence interpolation. Proceedings
of ICSC Symposia on Neural Computation (NC'2000), Berlin, Germany.
Bakker, R., Schouten, J. C., Giles, C. L., Takens, F., and Van den Bleek, C. M. (2000),
Learning chaotic attractors by neural networks. Neural Computation, 12(10), 2355–
2383.
Barron, A. R. (1993), Universal approximation bounds for superpositions of a sigmoidal
function. IEEE Transactions on Information Theory, 39(3), 930-945.
Baum, L. E. (1972), An inequality and associated maximization technique occuring in
the statistical analysis of probabilistic functions of Markov chains. Inequalities, 3,1-8.
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970), A maximization technique
occurring in the statistical analysis of probabilistic functions of Markov chains. Annals
of Mathematical Statistics, 41,164-171.
Bellman, R. E. (1961), Adaptive control processes : A guided tour, Princeton University
Press, Princeton, N.J.
Bengio, Y., and Frasconi, P. (1995), An input output HMM architecture. Advances in
Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen,
eds., MIT Press, Cambridge, MA, 427-434.
Bengio, S., Fessant, F., and Collobert, D. (1996), Use of modular architectures for timeseries
prediction. Neural Processing Letters, 3(2), 101-106.
Bengio, Y., and Frasconi, P. (1996), Input/Output HMMs for sequence processing.
IEEE Transactions on Neural Networks, 7(5), 1231-1249.
Bengio, Y., Lauzon, V., and Ducharme, R. (2001), Experiments on the application of
IOHMMs to model financial return series. IEEE Transactions on Neural Networks,
12(1), 113-123.
Bersini, H., Birattari, M., and Bontempi, G. (1998), Adaptive memory-based regression
methods. Proceedings of the 1998 IEEE International Joint Conference on Neural
Networks, 2102-2106.
Bezdek, J. (1981), Pattern recognition with fuzzy objective function algorithms, Plenum
Press, New York.
Bilmes, J. A. (1998), A gentle tutorial of the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov models. International
Computer Science Institute, Berkeley, CA.
Bishop, C. M. (1990), Curvature-driven smoothing in back-propagation neural
networks. Proceedings of International Neural Networks Conference (INNC'90), 749–
752.
Bone, R., and Crucianu, M. (2002), Multi-step-ahead prediction with neural networks:
A review. Wines rencontres internationales: Approches Connexionnistes en Sciences,
Boulogne sur Mer, France, 97-106.
Bontempi, G., Birattari, M., and Bersini, H. (1999), Local learning for iterated time
series prediction. Machine Learning: Proceedings of the Sixteenth International
Conference, San Francisco, CA, 32-38.
Box, G. E. P., and Jenkins, G. M. (1970), Time series analysis, forecasting and control,
Holden-Day.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, P. J. (1984), Classification and
regression trees, Wadsworth International Group, CA.
Bridle, J. (1989), Probabilistic interpretation of feedforward classification network
outputs, with relationships to statistical pattern recognition. Neurocomputing:
Algorithms, Architectures and Applications, F. Fogelman Souli'e and J. H'erault, eds.,
Springer-Verlag, 227-236.
Broomhead, D. S., and Lowe, D. (1988), Multivariable function interpolation and
adaptive networks. Complex Systems, 2,321-355.
Burges, C. C. J. (1998), A tutorial on support vector machines for pattern recognition.
Data Mining and Knowledge Discovery, 2,121-167.
Cao, L. (2003), Support vector machines experts for time series forecasting.
Neurocomputing, 51,321-339.
Casdagli, M. (1989), Nonlinear prediction of chaotic time series. Physica D, 35,335–
356.
Chan, K.-S., and Tong, H. (2001), Chaos: a statistical perspective, Springer-Verlag,
New York.
Carroll, T. L. (1998), Multiple Attractors and Periodic Transients in Synchronized
Chaotic Circuits. Physices Letter A, 238,365-368.
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., and Freeman, D. (1988a),
AutoClass: a Bayesian classification system. Proceedings of the Fifth International
Conference on Machine Learning, 54-64.
Cheeseman, P., Stutz, J., Self, M., Kelly, J., Taylor, W., and Freeman, D. (1988b),
Bayesian classification. Proceedings of the Seventh National Conference of Artificial
Intelligence, 607-611.
Cheeseman, P., and Stutz, J. (1996), Bayesian Classification (AutoClass): theory and
results. Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G.
Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., American Association for
Artificial Intelligence Press/MIT Press, Menlo Park, CA, USA, 153-180.
Chen, H., and Liu, R.-W. (1992), Adaptive distributed orthogonalization processing for
principal components analysis. Proceedings of International Conference on Acoustics,
Speech and Signal Processing, San Francisco, CA, 293-296.
Chen, R. (1995), Threshold variable selection in open-loop threshold autoregressive
models. Journal of Time Series Analysis, 16,461-481.
Chen, S., Billings, S. A., and Grant, P. M. (1990), Non-linear system identification
using neural networks. International Journal of Control, 51,1191-1214.
Chudy, L., and Farkas, I. (1998), Prediction of chaotic time-series using dynamic cell
structures and local linear models. Neural Network World, 8(5), 481-489.
Cleveland, W. (1979), Robust locally weighted regression and smoothing scatterplots.
Journal of the American Statistical Association, 74,829-836.
Cleveland, W., and Devlin, S. (1988), Locally weighted regression: An approach to
regression analysis by local fitting. Journal of the American Statistical Association, 83,
596-609.
[39] Cortes, C., and Vapnik, V. (1995), Support-vector networks. Machine Learning, 20(3),
273-297.
Cottrell, B. M., Girard, Y., Mangeas, M., and Muller, C. (1995), Neural modeling for
time series: a statistical stepwise method for weight elimination. IEEE Trans on Neural
Networks, 6(6), 1355-1364.
Crowder, R. (1990), Predicting the Mackey-Glass time series with cascade correlation
learning. The Connectionists Models Summer School, 117-123.
Cybenko, G. (1989), Approximation by superpositions of a sigmoid function.
Mathematics of Control, Signals and Systems, 2,303-314.
Dangelmayr, G., Gada]eta, S., Hundley, D., and Kirby, M. (1999), Time series
prediction by estimating Markov probabilities through topology preserving maps. SPIE
Vol. 3812, Applications and Science of Neural Networks, Fuzzy Systems, and
Evolutionary Computation II, 86-93.
De Groot, C., and Wurtz, D. (1991), Analysis of univariate time series with
connectionist nets. A case study of two classical examples. Neurocomputing, 3(4), 177–
192.
Dempster, A., Laird, N., and Rubin, D. (1977), Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society, Series B(39), 1-38.
Deppisch, J., Bauer, H.-U., and Geisel, T. (1991), Hierarchical training and its
application to dynamical systems and prediction of chaotic time series. Physics Letters,
158,57-62.
Der, R., and Herrmann, M. (1994), Nonlinear chaos control by neural nets. Proceedings
of International Conference on Artificial Neural Networks (ICANN'94), 1227-1230.
Devijver, P., and Kittler, J. (1982), Pattern recognition. A statistical approach, Prentice
Hall, Englewood Cliffs.
Drucker, H., Burges, C., Kaufman, L., Smola, A. J., and Vapnik, V. (1997), Support
vector regression machines. Advances in Neural Information Processing Systems 9, M.
Mozer, M. Jordan, and T. Petsche, eds., MIT Press, Cambridge, MA.
Drucker, H., Wu, D., and Vapnik, V. (1999), Support vector machines for spam
categorization. IEEE Transactions on Neural Networks, 10(5), 1048-1054.
Elman, J. (1990), Finding structure in time. Cognitive Science, 14,179-211.
Elsner, J. B. (1992), Predicting time series using a neural network as a method of
distinguishing chaos from noise. Journal of Physics A: Mathematical and General, 25,
843-850.
Engle, R. F. (1982), Autoregressive conditional heteroscedasticity with estimates of the
variance of United Kingdom inflation. Econometrica, 50,987-1007.
Epanechnikov, V. A. (1969), Nonparametric estimation of a multivariate probability
density. Theory of Probability and its Applications, 14,153-158.
Fair, R. C, and Jaffee, D. M. (1972), Methods of estimation for markets in
disequilibrium. Econometrica, 40,497-514.
Fan, J., and Yao, Q. (2003), Nonlinear time series, nonparametric and parametric
methods, Springer-Verlag, New York.
Fan, J., and Gijbels, I. (1996), Local polynominal modelling and its application,
Chapman and Hall, London.
Farmer, J. D., and Sidorowich, J. J. (1987), Predicting chaotic time series. Physical
Review Letters, 59(8), 845-848.
Farmer, J. D., and Sidorowich, J. J. (1988), Exploiting chaos to predict the future and
reduce noise. Evolution, Learning and Cognition, Y. C. Lee, eds., World Scientific
Press, 277-330.
Fernandez, R. (1999), Predicting time series with a local support vector regression
machine. Advanced Course on Artificial Intelligence 99.
Flake, G. W., and Lawrence, S. (2002), Efficient SVM regression training with SMO.
Machine Learning, 46(1-3), 271-290.
Fletcher, R. (1987), Practical methods of optimization, Jon Wiley and Sons.
Fraser, A. M., and Dimitriadis, A. (1994), Forecasting probability densities by using
hidden Markov models. Time Series Prediction: Forecasting the Future and
Understanding the Past, A. S. Weigend and N. A. Gershenfeld, eds., Addison-Wesley,
MA, 265-282.
Friedman, J. (1991), Multivariate adaptive regression splines. Annals of Statistics, 19,
1-142.
Friedman, J. H. (1994), An overview of predictive learning and function approximation.
From Statistics to Neural Networks, V. Cherkassky, J. H. Friedman, and H. Wechsler,
eds., Springer-Verlag, 1-61.
Funahashi, K. (1989), On the approximate realization of continuous mappings by neural
networks. Neural Networks, 2,183-192.
Geladi, P., and Kowalski, B. R. (1986), Partial least squares regression: a tutorial.
Analytica Chimica Acta, 185(1), 1-17.
Geman, S., Bienestock, E., and Doursat, R. (1992), Neural networks and the
bias/variance dilemma. Neural Computation, 4,1-58.
Gers, F. A., Eck, D., and Schmidhuber, J. (2001), Applying LSTM to time series
predictable through time-window approaches. Proceeding of International Conference
on Artificial Neural Networks (ICANN 2001), Vienna, Austria, 669-675.
Girosi, F. (1997), An equivalence between sparse approximation and support vector
machines. MIT Artificial Intelligence Laboratory.
Goldfeld, S. M., and Quandt, R. (1972), Nonlinear methods in econometrics, North-
Holland Publishing Co., Amsterdam.
Goldfeld, S. M., and Quandt, R. (1973), A Markov model for switching regressions.
Journal of Econometrics, 1,3-16.
Gorr, W. L. (1994), Research prospective on neural network forecasting. International
Journal of Forecasting, 10(1), 1-4.
Gray, S. F. (]996), Modelling the conditional distribution of interest rates as a regimeswitching
process. Journal of Financial Economics, 42,27-62.
Grosse, E. (1989), LOESS: Multivariate smoothing by moving least squares. Approximation
Theory, C. K. Chui and L. L. Schumaker, eds., Academic Press, 299-302.
Hamilton, J. D. (1990), Analysis of time series subject to changes in regime, Journal of
Econometrics, 45,39-70.
Hamilton, J. D., and Susmel, R. (1994), Autoregressive conditional heteroskedasticity
and changes in regime. Journal of Econometrics, 64,307-333.
Hardie, W. (1990), Applied nonparametric regression, Cambridge University Press,
Cambridge.
Haykin, S. (1999), Neural networks. A comprehensive foundation, Macmillan College
Publishing Company, N. J.
Hill, T., Marquez, L., O'Connor, M., and Remus, W. (1994), Artificial neural network
models for forecasting and decision making. International Journal of Forecasting, 10,
5-15.
Hinton, G. E. (1989), Connectionist learning procedures. Artificial Intelligence, 40,
185-234.
Hornik, K., Stinchcombe, M., and White, H. (1989), Multilayer feedforward networks
are universal approximations. Neural Networks, 2,359-366.
Hu, M. J. C. (1964), Application of the adaline system to weather forecasting, Technical
Report 6775-1, Stanford Electronic Lab, Stanford, CA.
Huber, P. J. (1964), Robust estimation of a location parameter. Annals of Mathematical
Statistics, B(35), 73-101.
Huber, P. J. (1981), Robust statistics, Wiley, New York.
Htibner, U., Weiss, C. 0., Abraham, N. B., and Tang, D. (1994), Lorenz-like chaos in
NH 3-FIR lasers. Time Series Prediction: Forecasting the Future and Understanding
the Past, A. S. Weigend and N. A. Gershenfeld, eds., Addison-Wesley, MA, 73-104.
Ikeda, K. (1979), Multiple-valued stationary state and its instability of the transmitted
light by a ring cavity system. Optics Communications, 30(2), 257-261.
Inoue, H., Fukunaga, Y., and Narihisa, H. (2001), Efficient hybrid neural network for
chaotic time series prediction. Proceedings of International Conference on Artificial
Neural Networks (ICANN 2001), 712-718.
Jacobs, R. A. (1997), Bias/Variance analyses of mixtures-of-experts architectures.
Neural Computation, 9,369-383.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991), Adaptive mixtures
of local experts. Neural Computation, 3,79-87.
Jang, J.-S. R. (1993), Anfis: Adaptive-network-based fuzzy inference system. IEEE
Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 23(3), 665-685.
Jones, R., Lee, Y., Barnes, C., Flake, G., Lee, K., and Lewis, P. (1990), Function
approximation and time series prediction with neural networks. Proceedings of
International Joint Conference on Neural Networks (IJCNN1990), 649-665.
Jordan, M. I. (1986), Attractor dynamics and parallelism in a connectionist sequential
machine. Eighth Annual Conference of the Cognitive Science Society, Englewood
Cliffs, NJ: Erlbaum, 531-546.
Jordan, M. I., and Xu, L. (1995), Convergence results for the EM approach to mixtures
of experts architectures. Neural Networks, 8,1409-1431.
Kasabov, N., and Song, Q. (2002), DENFIS: dynamic evolving neural-fuzzy inference
system and its application for time series prediction. IEEE Transactions on Fuzzy
Systems, 10(2), 144-154.
Kohonen, T. (1982), Self-organized formation of topologically correct feature maps.
Biological Cybernetics, 43,59-69.
Lapedes, A., and Farber, R. (1988), How neural nets work. Evolution learning and
cognition, Y. C. Lee, eds., World Scientific, 331-346.
LeCun, Y. A., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J. S., Drucker,
H., Guyon, I., Muller, U. A., Sackinger, E., Simard, P., and Vapnik, V. N. (1995),
Learning algorithms for classification: a comparison of learning algorithms for
handwritten digit recognition. Neural Networks: The Statistical Mechanics Perspective,
J. H. Oh, C. Kwon, and S. Cho, eds., World Scientific, 261-276.
Liehr, S., Pawelzik, K., Kohlmorgen, J., Lemm, S., and Muller, K.-R. (1999), Hidden
Markov mixtures of experts for prediction of non-stationary dynamics. Proceedings of
Neural Networks for Signal Processing IX, IEEE, NJ, 195-204.
Liporace, L. A. (1982), Maximum likelihood estimation for multivariate observations of
Markov source. IEEE Transactions on Information Theory, 28(5), 729-734.
Lippmann, R. (1989), Pattern classification using neural networks. IEEE
Communications Magazine, 27(11), 47-64.
Littmann, E., and Ritter, H. (1996), Learning and generalization in cascade network
architectures. Neural Computation, 8,1521-1539.
Lorenz, E. N. (1963), Deterministic non-periodic flows. Journal of Atmospheric
Science, 20,130-141.
Lowe, D., and Webb, A. R. (1994), Time series prediction by adaptive networks: A
dynamical systems perspective. Artificial Neural Networks, Forecasting Time Series, V.
R. Vemuri and R. D. Rogers, eds., IEEE Computer Society Press, 12-19.
Mackey, M. C., and Glass, L. (1977), Oscillations and chaos in physiological control
systems. Science, 197,287-289.
Martinez, T., Berkovich, S., and Schulten, G. (1993), "Neural-gas" network for vector
quantization and its application to time-series prediction. IEEE Transactions on Neural
Networks, 4,558-569.
Mattera, D., and Haykin, S. (1999), Support vector machines for dynamic
reconstruction of a chaotic system. Advances in Kernel Methods — Support Vector
Learning, B. Scholkopf, C. Burges, and A. Smola, eds., MIT Press, 211-242.
McCullagh, P., and Nelder, J. A. (1989), Generalised linear models, monographs on
statistics and applied probability, Chapman and Hall, London.
MeNames, J. (1999), Innovations in local modeling for time series prediction, Ph.D.
thesis, Stanford University.
McNames, J., Suykens, J., and Vandewalle, J. (1999), Wining entry of the K. U. Leuven
time-series prediction competition. International Journal of Bifurcation and Chaos,
9(8), 1485-1500.
Meir, R. ()994), Bias, variance and the combination of estimators; the case of linear
least squares. Department of Electrical Engineering, Technion, Haifa, Israel.
R. L., Machado, R. J., and Renteria, R. P. (1999), Time-series forecasting
through wavelets transformation and a mixture of expert models. Neurocomputing,
28(1-3), 145-156.
Moody, J., and Darken, C. J. (1989), Fast learning in networks of locally-tuned
processing units. Neural Computation, 1(2), 281-294.
Moran, P. A. P. (1953), The statistical analysis of the Canadian Lynx cycle I: Structure
and prediction. Australian Journal of Zoology, 1,163-173.
Mukherjee, S., Osuna, E., and Girosi, F. (1997), Nonlinear prediction of chaotic time
series using support vector machines. Proceeding of IEEE NNSP 97, 511-519.
Muller, K., Smola, A., Ratsch, G., Scholkopf, B., Kohlmorgen, J., and Vapnik, V.
(1997), Predicting time series with support vector machines. Proceedings of
International Conference on Artificial Neural Networks (ICANN'9 7) , 999-1004.
Muller, K., Smola, A., Misch, G., SchOlkopf, B., Kohlmorgen, J., and Vapnik, V.
(1999), Using support vector machines for time series prediction. Advances in Kernel
Methods Support Vector Learning, B. SchOlkopf, C. J. C. Burges, and A. J. Smola,
eds., MIT Press, Cambridge, MA, 243-254.
Muller, K., Mika, S., Ratsch, G., Tsuda, K., and SchOlkopf, B. (2001), An introduction
to kernel-based learning alg
Research of mixture of experts model for time series prediction
xxiv, 237 leaves :ill. ; 30 cm. Includes bibliographical references. University of Otago department: Information Science. "15 November 2005".For the prediction of chaotic time series, a dichotomy has arisen between local approaches and global approaches. Local approaches hold the reputation of simplicity and feasibility, but they generally do not produce a compact description of the underlying system and are computationally intensive. Global approaches have the advantage of requiring less computation and are able to yield a global representation of the studied time series. However, due to the complexity of the time series process, it is often not easy to construct a global model to perform the prediction precisely. In addition to these approaches, a combination of the global and local techniques, called mixture of experts (ME), is also possible, where a smaller number of models work cooperatively to implement the prediction.
This thesis reports on research about ME models for chaotic time series prediction. Based on a review of the techniques in time series prediction, a HMM-based ME model called "Timeline" Hidden Markov Experts (THME) is developed, where the trajectory of the time series is divided into some regimes in the state space and regression models called local experts are applied to learn the mapping on the regimes separately. The dynamics for the expert combination is a HMM, however, the transition probabilities are designed to be time-varying and conditional on the "real time" information of the time series. For the learning of the "time-line" HMM, a modified Baum—Welch algorithm is developed and the convergence of the algorithm is proved.
Different versions of the model, based on MLP, RBF and SVM experts, are constructed and applied to a number of chaotic time series on both one-step-ahead and multi-step-ahead predictions. Experiments show that in general THME achieves better generalization performance than the corresponding single models in one-step-ahead prediction and comparable to some published benchmarks in multi-step-ahead prediction. Various properties of THME, such as the feature selection for trajectory dividing, the clustering techniques for regime extraction, the "time-line" HMM for expert combination and the performance of the model when it has different number of experts, are investigated.
A number of interesting future directions for this work are suggested, which include the feature selection for regime extraction, the model selection for transition probability modelling, the extension to distribution prediction and the application on other time series.UnpublishedAronszajn, N. (1950), Theory of reproducing kernels. Transactions of the American
Mathematical Society, 68,337-404.
Atiya, A. F., El-Shoura, S. M., Shaheen, S. I., and El-Sherif, M. S. (1999), A
comparison between neural-network forecasting techniques — Case study: river flow
forecasting. IEEE Transactions on Neural Networks, 10(2), 402-409.
Atkeson, C. G. (1992), Memory-based approaches to approximating continuous
functions. Nonlinear Modeling and Forecasting, M. Casdagli and S. Eubank, eds.,
Addison-Wesley, New York, 503-521.
Atkeson, C. G., Moore, A. W., and Schaal, S. (1997), Locally weighted learning.
Articial Intelligence Review, 11,11-73.
Aupetit, M., Couturier, P., and Massotte, P. (2000), Function approximation with
continuous self-organizing maps using neighboring influence interpolation. Proceedings
of ICSC Symposia on Neural Computation (NC'2000), Berlin, Germany.
Bakker, R., Schouten, J. C., Giles, C. L., Takens, F., and Van den Bleek, C. M. (2000),
Learning chaotic attractors by neural networks. Neural Computation, 12(10), 2355–
2383.
Barron, A. R. (1993), Universal approximation bounds for superpositions of a sigmoidal
function. IEEE Transactions on Information Theory, 39(3), 930-945.
Baum, L. E. (1972), An inequality and associated maximization technique occuring in
the statistical analysis of probabilistic functions of Markov chains. Inequalities, 3,1-8.
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970), A maximization technique
occurring in the statistical analysis of probabilistic functions of Markov chains. Annals
of Mathematical Statistics, 41,164-171.
Bellman, R. E. (1961), Adaptive control processes : A guided tour, Princeton University
Press, Princeton, N.J.
Bengio, Y., and Frasconi, P. (1995), An input output HMM architecture. Advances in
Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen,
eds., MIT Press, Cambridge, MA, 427-434.
Bengio, S., Fessant, F., and Collobert, D. (1996), Use of modular architectures for timeseries
prediction. Neural Processing Letters, 3(2), 101-106.
Bengio, Y., and Frasconi, P. (1996), Input/Output HMMs for sequence processing.
IEEE Transactions on Neural Networks, 7(5), 1231-1249.
Bengio, Y., Lauzon, V., and Ducharme, R. (2001), Experiments on the application of
IOHMMs to model financial return series. IEEE Transactions on Neural Networks,
12(1), 113-123.
Bersini, H., Birattari, M., and Bontempi, G. (1998), Adaptive memory-based regression
methods. Proceedings of the 1998 IEEE International Joint Conference on Neural
Networks, 2102-2106.
Bezdek, J. (1981), Pattern recognition with fuzzy objective function algorithms, Plenum
Press, New York.
Bilmes, J. A. (1998), A gentle tutorial of the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov models. International
Computer Science Institute, Berkeley, CA.
Bishop, C. M. (1990), Curvature-driven smoothing in back-propagation neural
networks. Proceedings of International Neural Networks Conference (INNC'90), 749–
752.
Bone, R., and Crucianu, M. (2002), Multi-step-ahead prediction with neural networks:
A review. Wines rencontres internationales: Approches Connexionnistes en Sciences,
Boulogne sur Mer, France, 97-106.
Bontempi, G., Birattari, M., and Bersini, H. (1999), Local learning for iterated time
series prediction. Machine Learning: Proceedings of the Sixteenth International
Conference, San Francisco, CA, 32-38.
Box, G. E. P., and Jenkins, G. M. (1970), Time series analysis, forecasting and control,
Holden-Day.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, P. J. (1984), Classification and
regression trees, Wadsworth International Group, CA.
Bridle, J. (1989), Probabilistic interpretation of feedforward classification network
outputs, with relationships to statistical pattern recognition. Neurocomputing:
Algorithms, Architectures and Applications, F. Fogelman Souli'e and J. H'erault, eds.,
Springer-Verlag, 227-236.
Broomhead, D. S., and Lowe, D. (1988), Multivariable function interpolation and
adaptive networks. Complex Systems, 2,321-355.
Burges, C. C. J. (1998), A tutorial on support vector machines for pattern recognition.
Data Mining and Knowledge Discovery, 2,121-167.
Cao, L. (2003), Support vector machines experts for time series forecasting.
Neurocomputing, 51,321-339.
Casdagli, M. (1989), Nonlinear prediction of chaotic time series. Physica D, 35,335–
356.
Chan, K.-S., and Tong, H. (2001), Chaos: a statistical perspective, Springer-Verlag,
New York.
Carroll, T. L. (1998), Multiple Attractors and Periodic Transients in Synchronized
Chaotic Circuits. Physices Letter A, 238,365-368.
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., and Freeman, D. (1988a),
AutoClass: a Bayesian classification system. Proceedings of the Fifth International
Conference on Machine Learning, 54-64.
Cheeseman, P., Stutz, J., Self, M., Kelly, J., Taylor, W., and Freeman, D. (1988b),
Bayesian classification. Proceedings of the Seventh National Conference of Artificial
Intelligence, 607-611.
Cheeseman, P., and Stutz, J. (1996), Bayesian Classification (AutoClass): theory and
results. Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G.
Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., American Association for
Artificial Intelligence Press/MIT Press, Menlo Park, CA, USA, 153-180.
Chen, H., and Liu, R.-W. (1992), Adaptive distributed orthogonalization processing for
principal components analysis. Proceedings of International Conference on Acoustics,
Speech and Signal Processing, San Francisco, CA, 293-296.
Chen, R. (1995), Threshold variable selection in open-loop threshold autoregressive
models. Journal of Time Series Analysis, 16,461-481.
Chen, S., Billings, S. A., and Grant, P. M. (1990), Non-linear system identification
using neural networks. International Journal of Control, 51,1191-1214.
Chudy, L., and Farkas, I. (1998), Prediction of chaotic time-series using dynamic cell
structures and local linear models. Neural Network World, 8(5), 481-489.
Cleveland, W. (1979), Robust locally weighted regression and smoothing scatterplots.
Journal of the American Statistical Association, 74,829-836.
Cleveland, W., and Devlin, S. (1988), Locally weighted regression: An approach to
regression analysis by local fitting. Journal of the American Statistical Association, 83,
596-609.
[39] Cortes, C., and Vapnik, V. (1995), Support-vector networks. Machine Learning, 20(3),
273-297.
Cottrell, B. M., Girard, Y., Mangeas, M., and Muller, C. (1995), Neural modeling for
time series: a statistical stepwise method for weight elimination. IEEE Trans on Neural
Networks, 6(6), 1355-1364.
Crowder, R. (1990), Predicting the Mackey-Glass time series with cascade correlation
learning. The Connectionists Models Summer School, 117-123.
Cybenko, G. (1989), Approximation by superpositions of a sigmoid function.
Mathematics of Control, Signals and Systems, 2,303-314.
Dangelmayr, G., Gada]eta, S., Hundley, D., and Kirby, M. (1999), Time series
prediction by estimating Markov probabilities through topology preserving maps. SPIE
Vol. 3812, Applications and Science of Neural Networks, Fuzzy Systems, and
Evolutionary Computation II, 86-93.
De Groot, C., and Wurtz, D. (1991), Analysis of univariate time series with
connectionist nets. A case study of two classical examples. Neurocomputing, 3(4), 177–
192.
Dempster, A., Laird, N., and Rubin, D. (1977), Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society, Series B(39), 1-38.
Deppisch, J., Bauer, H.-U., and Geisel, T. (1991), Hierarchical training and its
application to dynamical systems and prediction of chaotic time series. Physics Letters,
158,57-62.
Der, R., and Herrmann, M. (1994), Nonlinear chaos control by neural nets. Proceedings
of International Conference on Artificial Neural Networks (ICANN'94), 1227-1230.
Devijver, P., and Kittler, J. (1982), Pattern recognition. A statistical approach, Prentice
Hall, Englewood Cliffs.
Drucker, H., Burges, C., Kaufman, L., Smola, A. J., and Vapnik, V. (1997), Support
vector regression machines. Advances in Neural Information Processing Systems 9, M.
Mozer, M. Jordan, and T. Petsche, eds., MIT Press, Cambridge, MA.
Drucker, H., Wu, D., and Vapnik, V. (1999), Support vector machines for spam
categorization. IEEE Transactions on Neural Networks, 10(5), 1048-1054.
Elman, J. (1990), Finding structure in time. Cognitive Science, 14,179-211.
Elsner, J. B. (1992), Predicting time series using a neural network as a method of
distinguishing chaos from noise. Journal of Physics A: Mathematical and General, 25,
843-850.
Engle, R. F. (1982), Autoregressive conditional heteroscedasticity with estimates of the
variance of United Kingdom inflation. Econometrica, 50,987-1007.
Epanechnikov, V. A. (1969), Nonparametric estimation of a multivariate probability
density. Theory of Probability and its Applications, 14,153-158.
Fair, R. C, and Jaffee, D. M. (1972), Methods of estimation for markets in
disequilibrium. Econometrica, 40,497-514.
Fan, J., and Yao, Q. (2003), Nonlinear time series, nonparametric and parametric
methods, Springer-Verlag, New York.
Fan, J., and Gijbels, I. (1996), Local polynominal modelling and its application,
Chapman and Hall, London.
Farmer, J. D., and Sidorowich, J. J. (1987), Predicting chaotic time series. Physical
Review Letters, 59(8), 845-848.
Farmer, J. D., and Sidorowich, J. J. (1988), Exploiting chaos to predict the future and
reduce noise. Evolution, Learning and Cognition, Y. C. Lee, eds., World Scientific
Press, 277-330.
Fernandez, R. (1999), Predicting time series with a local support vector regression
machine. Advanced Course on Artificial Intelligence 99.
Flake, G. W., and Lawrence, S. (2002), Efficient SVM regression training with SMO.
Machine Learning, 46(1-3), 271-290.
Fletcher, R. (1987), Practical methods of optimization, Jon Wiley and Sons.
Fraser, A. M., and Dimitriadis, A. (1994), Forecasting probability densities by using
hidden Markov models. Time Series Prediction: Forecasting the Future and
Understanding the Past, A. S. Weigend and N. A. Gershenfeld, eds., Addison-Wesley,
MA, 265-282.
Friedman, J. (1991), Multivariate adaptive regression splines. Annals of Statistics, 19,
1-142.
Friedman, J. H. (1994), An overview of predictive learning and function approximation.
From Statistics to Neural Networks, V. Cherkassky, J. H. Friedman, and H. Wechsler,
eds., Springer-Verlag, 1-61.
Funahashi, K. (1989), On the approximate realization of continuous mappings by neural
networks. Neural Networks, 2,183-192.
Geladi, P., and Kowalski, B. R. (1986), Partial least squares regression: a tutorial.
Analytica Chimica Acta, 185(1), 1-17.
Geman, S., Bienestock, E., and Doursat, R. (1992), Neural networks and the
bias/variance dilemma. Neural Computation, 4,1-58.
Gers, F. A., Eck, D., and Schmidhuber, J. (2001), Applying LSTM to time series
predictable through time-window approaches. Proceeding of International Conference
on Artificial Neural Networks (ICANN 2001), Vienna, Austria, 669-675.
Girosi, F. (1997), An equivalence between sparse approximation and support vector
machines. MIT Artificial Intelligence Laboratory.
Goldfeld, S. M., and Quandt, R. (1972), Nonlinear methods in econometrics, North-
Holland Publishing Co., Amsterdam.
Goldfeld, S. M., and Quandt, R. (1973), A Markov model for switching regressions.
Journal of Econometrics, 1,3-16.
Gorr, W. L. (1994), Research prospective on neural network forecasting. International
Journal of Forecasting, 10(1), 1-4.
Gray, S. F. (]996), Modelling the conditional distribution of interest rates as a regimeswitching
process. Journal of Financial Economics, 42,27-62.
Grosse, E. (1989), LOESS: Multivariate smoothing by moving least squares. Approximation
Theory, C. K. Chui and L. L. Schumaker, eds., Academic Press, 299-302.
Hamilton, J. D. (1990), Analysis of time series subject to changes in regime, Journal of
Econometrics, 45,39-70.
Hamilton, J. D., and Susmel, R. (1994), Autoregressive conditional heteroskedasticity
and changes in regime. Journal of Econometrics, 64,307-333.
Hardie, W. (1990), Applied nonparametric regression, Cambridge University Press,
Cambridge.
Haykin, S. (1999), Neural networks. A comprehensive foundation, Macmillan College
Publishing Company, N. J.
Hill, T., Marquez, L., O'Connor, M., and Remus, W. (1994), Artificial neural network
models for forecasting and decision making. International Journal of Forecasting, 10,
5-15.
Hinton, G. E. (1989), Connectionist learning procedures. Artificial Intelligence, 40,
185-234.
Hornik, K., Stinchcombe, M., and White, H. (1989), Multilayer feedforward networks
are universal approximations. Neural Networks, 2,359-366.
Hu, M. J. C. (1964), Application of the adaline system to weather forecasting, Technical
Report 6775-1, Stanford Electronic Lab, Stanford, CA.
Huber, P. J. (1964), Robust estimation of a location parameter. Annals of Mathematical
Statistics, B(35), 73-101.
Huber, P. J. (1981), Robust statistics, Wiley, New York.
Htibner, U., Weiss, C. 0., Abraham, N. B., and Tang, D. (1994), Lorenz-like chaos in
NH 3-FIR lasers. Time Series Prediction: Forecasting the Future and Understanding
the Past, A. S. Weigend and N. A. Gershenfeld, eds., Addison-Wesley, MA, 73-104.
Ikeda, K. (1979), Multiple-valued stationary state and its instability of the transmitted
light by a ring cavity system. Optics Communications, 30(2), 257-261.
Inoue, H., Fukunaga, Y., and Narihisa, H. (2001), Efficient hybrid neural network for
chaotic time series prediction. Proceedings of International Conference on Artificial
Neural Networks (ICANN 2001), 712-718.
Jacobs, R. A. (1997), Bias/Variance analyses of mixtures-of-experts architectures.
Neural Computation, 9,369-383.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991), Adaptive mixtures
of local experts. Neural Computation, 3,79-87.
Jang, J.-S. R. (1993), Anfis: Adaptive-network-based fuzzy inference system. IEEE
Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 23(3), 665-685.
Jones, R., Lee, Y., Barnes, C., Flake, G., Lee, K., and Lewis, P. (1990), Function
approximation and time series prediction with neural networks. Proceedings of
International Joint Conference on Neural Networks (IJCNN1990), 649-665.
Jordan, M. I. (1986), Attractor dynamics and parallelism in a connectionist sequential
machine. Eighth Annual Conference of the Cognitive Science Society, Englewood
Cliffs, NJ: Erlbaum, 531-546.
Jordan, M. I., and Xu, L. (1995), Convergence results for the EM approach to mixtures
of experts architectures. Neural Networks, 8,1409-1431.
Kasabov, N., and Song, Q. (2002), DENFIS: dynamic evolving neural-fuzzy inference
system and its application for time series prediction. IEEE Transactions on Fuzzy
Systems, 10(2), 144-154.
Kohonen, T. (1982), Self-organized formation of topologically correct feature maps.
Biological Cybernetics, 43,59-69.
Lapedes, A., and Farber, R. (1988), How neural nets work. Evolution learning and
cognition, Y. C. Lee, eds., World Scientific, 331-346.
LeCun, Y. A., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J. S., Drucker,
H., Guyon, I., Muller, U. A., Sackinger, E., Simard, P., and Vapnik, V. N. (1995),
Learning algorithms for classification: a comparison of learning algorithms for
handwritten digit recognition. Neural Networks: The Statistical Mechanics Perspective,
J. H. Oh, C. Kwon, and S. Cho, eds., World Scientific, 261-276.
Liehr, S., Pawelzik, K., Kohlmorgen, J., Lemm, S., and Muller, K.-R. (1999), Hidden
Markov mixtures of experts for prediction of non-stationary dynamics. Proceedings of
Neural Networks for Signal Processing IX, IEEE, NJ, 195-204.
Liporace, L. A. (1982), Maximum likelihood estimation for multivariate observations of
Markov source. IEEE Transactions on Information Theory, 28(5), 729-734.
Lippmann, R. (1989), Pattern classification using neural networks. IEEE
Communications Magazine, 27(11), 47-64.
Littmann, E., and Ritter, H. (1996), Learning and generalization in cascade network
architectures. Neural Computation, 8,1521-1539.
Lorenz, E. N. (1963), Deterministic non-periodic flows. Journal of Atmospheric
Science, 20,130-141.
Lowe, D., and Webb, A. R. (1994), Time series prediction by adaptive networks: A
dynamical systems perspective. Artificial Neural Networks, Forecasting Time Series, V.
R. Vemuri and R. D. Rogers, eds., IEEE Computer Society Press, 12-19.
Mackey, M. C., and Glass, L. (1977), Oscillations and chaos in physiological control
systems. Science, 197,287-289.
Martinez, T., Berkovich, S., and Schulten, G. (1993), "Neural-gas" network for vector
quantization and its application to time-series prediction. IEEE Transactions on Neural
Networks, 4,558-569.
Mattera, D., and Haykin, S. (1999), Support vector machines for dynamic
reconstruction of a chaotic system. Advances in Kernel Methods — Support Vector
Learning, B. Scholkopf, C. Burges, and A. Smola, eds., MIT Press, 211-242.
McCullagh, P., and Nelder, J. A. (1989), Generalised linear models, monographs on
statistics and applied probability, Chapman and Hall, London.
MeNames, J. (1999), Innovations in local modeling for time series prediction, Ph.D.
thesis, Stanford University.
McNames, J., Suykens, J., and Vandewalle, J. (1999), Wining entry of the K. U. Leuven
time-series prediction competition. International Journal of Bifurcation and Chaos,
9(8), 1485-1500.
Meir, R. ()994), Bias, variance and the combination of estimators; the case of linear
least squares. Department of Electrical Engineering, Technion, Haifa, Israel.
R. L., Machado, R. J., and Renteria, R. P. (1999), Time-series forecasting
through wavelets transformation and a mixture of expert models. Neurocomputing,
28(1-3), 145-156.
Moody, J., and Darken, C. J. (1989), Fast learning in networks of locally-tuned
processing units. Neural Computation, 1(2), 281-294.
Moran, P. A. P. (1953), The statistical analysis of the Canadian Lynx cycle I: Structure
and prediction. Australian Journal of Zoology, 1,163-173.
Mukherjee, S., Osuna, E., and Girosi, F. (1997), Nonlinear prediction of chaotic time
series using support vector machines. Proceeding of IEEE NNSP 97, 511-519.
Muller, K., Smola, A., Ratsch, G., Scholkopf, B., Kohlmorgen, J., and Vapnik, V.
(1997), Predicting time series with support vector machines. Proceedings of
International Conference on Artificial Neural Networks (ICANN'9 7) , 999-1004.
Muller, K., Smola, A., Misch, G., SchOlkopf, B., Kohlmorgen, J., and Vapnik, V.
(1999), Using support vector machines for time series prediction. Advances in Kernel
Methods Support Vector Learning, B. SchOlkopf, C. J. C. Burges, and A. J. Smola,
eds., MIT Press, Cambridge, MA, 243-254.
Muller, K., Mika, S., Ratsch, G., Tsuda, K., and SchOlkopf, B. (2001), An introduction
to kernel-based learning alg