8 research outputs found

    Forecasting with Machine Learning

    Get PDF
    For years, people have been forecasting weather patterns, economic and political events, sports outcomes, and more. In this paper we discussed the ways of using machine learning in forecasting, machine learning is a branch of computer science where algorithms learn from data. The fundamental problem for machine learning and time series is the same: to predict new outcomes based on previously known results. Using the suitable technique of machine learning depend on how much data you have, how noisy the data is, and what kind of new features can be derived from the data. But these techniques can improve accuracy and don’t have to be difficult to implement

    A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition

    Full text link
    Multi-step ahead forecasting is still an open challenge in time series forecasting. Several approaches that deal with this complex problem have been proposed in the literature but an extensive comparison on a large number of tasks is still missing. This paper aims to fill this gap by reviewing existing strategies for multi-step ahead forecasting and comparing them in theoretical and practical terms. To attain such an objective, we performed a large scale comparison of these different strategies using a large experimental benchmark (namely the 111 series from the NN5 forecasting competition). In addition, we considered the effects of deseasonalization, input variable selection, and forecast combination on these strategies and on multi-step ahead forecasting at large. The following three findings appear to be consistently supported by the experimental results: Multiple-Output strategies are the best performing approaches, deseasonalization leads to uniformly improved forecast accuracy, and input selection is more effective when performed in conjunction with deseasonalization

    Learning for informative path planning

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 104-108).Through the combined use of regression techniques, we will learn models of the uncertainty propagation efficiently and accurately to replace computationally intensive Monte- Carlo simulations in informative path planning. This will enable us to decrease the uncertainty of the weather estimates more than current methods by enabling the evaluation of many more candidate paths given the same amount of resources. The learning method and the path planning method will be validated by the numerical experiments using the Lorenz-2003 model [32], an idealized weather model.by Sooho Park.S.M

    Modelo Computacional para la Estimación de Oxígeno Disuelto en Estanques de Producción Acuícola Empleando Redes Neuronales Artificiales

    Get PDF
    Documento en formato PDFEn la actualidad, la población mundial está en constante aumento, lo que tiene como consecuencia, entre otras cosas, un mayor consumo de alimentos. Así pues, la acuicultura se ha convertido en el sector alimenticio con mayor crecimiento a nivel mundial (Lekang, 2013). Sin embargo, llevar a cabo esta actividad implica controlar diversos parámetros como el oxígeno, temperatura, salinidad, nitritos y nitratos entre otros, para mantenerlos en rangos adecuados o lo más parecido a los que se encontrarían en la naturaleza, permitiendo así obtener producciones acuícolas exitosas donde los organismos no se estresen, enfermen o mueran, y a la vez se tenga el máximo rendimiento en reproducción y crecimiento. El oxígeno disuelto es principal indicador de la calidad del agua; por ello, los acuicultores presentan especial atención a las concentraciones de este parámetro. Para evitar las fluctuaciones de este gas, inherentes a la dinámica natural de los sistemas acuícolas y los problemas que esto ocasiona a los organismos de cultivo, los productores generalmente utilizan aireación artificial a máxima potencia (potencia nominal) para complementar los suministros de oxígeno necesarios a lo largo del día (Tucker, 2005). Sin embargo, como lo sugiere Boyd (1998), el uso de la aireación máxima para lograr la mayor producción posible es menos rentable que la aireación moderada, cuando se trata de mejorar la calidad del agua y la eficiencia de conversión alimenticia. Así, una aireación convencional (máxima) trae consigo un uso, en la mayoría de las ocasiones, ineficiente del oxígeno disuelto, además de un significativo incremento en el consumo de energía de los equipos y el posible deterioro de estos al estarse activando y desactivando constantemente durante periodos prolongados de tiempo, además de posibles problemas relacionados al estrés de los organismos que esto provoca. Los sistemas de cultivo actuales, tienen la finalidad de una mayor producción de organismos en un menor espacio de cultivo, por lo que se han comenzado a desarrollar nuevas técnicas de control y formas de predicción para integrarlas dentro de sistemas de automatización comerciales con bajo costo, mínimo impacto ecológico y fácil de usar. Las mediciones de oxígeno disuelto que son tomadas cada determinado tiempo generan una serie tiempo la cual oscila estacionalmente y durante un periodo de 24 horas. Derivado de las múltiples variables que influyen en él, presenta un comportamiento complejo y no lineal, generalmente con niveles de concentración por la mañana y por la noche, contrastando en la tarde, donde se suelen encontrar niveles altos. En años recientes, las redes neuronales artificiales (RNAs) se han utilizado en problemas de estimación y predicción de series temporales en distintas disciplinas; sin embargo, son pocos los trabajos en los que se han aplicado para problemas de calidad del agua (y todos los parámetros relacionados). Su uso en predicción de series temporales de oxígeno disuelto puede permitir, entre otras cosas, encontrar las relaciones no lineales entre las variables de entrada (principalmente valores pasados de la misma serie de tiempo y valores de otras variables que influyen en la serie) y las variables de salida (valores futuros de la serie). En este trabajo de tesis se propone el desarrollo de un modelo computacional para la estimación de oxígeno disuelto utilizando RNAs, las cuales realizarán predicciones para conocer las concentraciones de este parámetro en periodos futuros de tiempo. El diseño de las RNAs está basado en Algoritmos Evolutivos (AEs), particularmente en el algoritmo llamado: Selección de Características en el Algoritmo de Programación Evolutiva de Redes Neuronales Artificiales del Inglés Feature Selection of Evolutionary Programming of Artificial Neural Networks (FS-EPNet) (Lopez et. al., 2013; Landassuri et. al., 2013), el cual determinará la arquitectura de la red, donde la función objetivo será la predicción en un lapso determinado de tiempo. Aunque el análisis de la calidad del agua se ve afectado por varios parámetros, este trabajo considera únicamente la predicción del oxígeno, utilizando dos formas de predicción: predicción a un paso adelante y predicción iterada. Esto permitirá sentar las bases para futuras investigaciones sobre predicciones multiparamétricas, análisis del estado de la calidad del agua y control predictivo de la calidad del agua

    Local learning for iterated time-series prediction

    No full text
    info:eu-repo/semantics/publishe

    Local Learning for Iterated Time Series Prediction

    No full text
    We introduce and discuss a local method to learn one-step-ahead predictors for iterated time series forecasting. For each single one-step-ahead prediction, our method selects among different alternatives a local model representation on the basis of a local cross-validation procedure. In the literature, local learning is generally used for function estimation tasks which do not take temporal behaviors into account. Our technique extends this approach to the problem of long-horizon prediction by proposing a local model selection based on an iterated version of the PRESS leave-one-out statistic. In order to show the effectiveness of our method, we present the results obtained on two time series from the Santa Fe competition and on a time series proposed in a recent international contest. 1 INTRODUCTION The use of local memory-based approximators for time series analysis has been the focus of numerous studies in the literature (Farmer & Sidorowich, 1987; Yakowitz, 1987)..

    Research of mixture of experts model for time series prediction

    No full text
    xxiv, 237 leaves :ill. ; 30 cm. Includes bibliographical references. University of Otago department: Information Science. "15 November 2005".For the prediction of chaotic time series, a dichotomy has arisen between local approaches and global approaches. Local approaches hold the reputation of simplicity and feasibility, but they generally do not produce a compact description of the underlying system and are computationally intensive. Global approaches have the advantage of requiring less computation and are able to yield a global representation of the studied time series. However, due to the complexity of the time series process, it is often not easy to construct a global model to perform the prediction precisely. In addition to these approaches, a combination of the global and local techniques, called mixture of experts (ME), is also possible, where a smaller number of models work cooperatively to implement the prediction. This thesis reports on research about ME models for chaotic time series prediction. Based on a review of the techniques in time series prediction, a HMM-based ME model called "Timeline" Hidden Markov Experts (THME) is developed, where the trajectory of the time series is divided into some regimes in the state space and regression models called local experts are applied to learn the mapping on the regimes separately. The dynamics for the expert combination is a HMM, however, the transition probabilities are designed to be time-varying and conditional on the "real time" information of the time series. For the learning of the "time-line" HMM, a modified Baum—Welch algorithm is developed and the convergence of the algorithm is proved. Different versions of the model, based on MLP, RBF and SVM experts, are constructed and applied to a number of chaotic time series on both one-step-ahead and multi-step-ahead predictions. Experiments show that in general THME achieves better generalization performance than the corresponding single models in one-step-ahead prediction and comparable to some published benchmarks in multi-step-ahead prediction. Various properties of THME, such as the feature selection for trajectory dividing, the clustering techniques for regime extraction, the "time-line" HMM for expert combination and the performance of the model when it has different number of experts, are investigated. A number of interesting future directions for this work are suggested, which include the feature selection for regime extraction, the model selection for transition probability modelling, the extension to distribution prediction and the application on other time series.UnpublishedAronszajn, N. (1950), Theory of reproducing kernels. Transactions of the American Mathematical Society, 68,337-404. Atiya, A. F., El-Shoura, S. M., Shaheen, S. I., and El-Sherif, M. S. (1999), A comparison between neural-network forecasting techniques — Case study: river flow forecasting. IEEE Transactions on Neural Networks, 10(2), 402-409. Atkeson, C. G. (1992), Memory-based approaches to approximating continuous functions. Nonlinear Modeling and Forecasting, M. Casdagli and S. Eubank, eds., Addison-Wesley, New York, 503-521. Atkeson, C. G., Moore, A. W., and Schaal, S. (1997), Locally weighted learning. Articial Intelligence Review, 11,11-73. Aupetit, M., Couturier, P., and Massotte, P. (2000), Function approximation with continuous self-organizing maps using neighboring influence interpolation. Proceedings of ICSC Symposia on Neural Computation (NC'2000), Berlin, Germany. Bakker, R., Schouten, J. C., Giles, C. L., Takens, F., and Van den Bleek, C. M. (2000), Learning chaotic attractors by neural networks. Neural Computation, 12(10), 2355– 2383. Barron, A. R. (1993), Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930-945. Baum, L. E. (1972), An inequality and associated maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains. Inequalities, 3,1-8. Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970), A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41,164-171. Bellman, R. E. (1961), Adaptive control processes : A guided tour, Princeton University Press, Princeton, N.J. Bengio, Y., and Frasconi, P. (1995), An input output HMM architecture. Advances in Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, eds., MIT Press, Cambridge, MA, 427-434. Bengio, S., Fessant, F., and Collobert, D. (1996), Use of modular architectures for timeseries prediction. Neural Processing Letters, 3(2), 101-106. Bengio, Y., and Frasconi, P. (1996), Input/Output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5), 1231-1249. Bengio, Y., Lauzon, V., and Ducharme, R. (2001), Experiments on the application of IOHMMs to model financial return series. IEEE Transactions on Neural Networks, 12(1), 113-123. Bersini, H., Birattari, M., and Bontempi, G. (1998), Adaptive memory-based regression methods. Proceedings of the 1998 IEEE International Joint Conference on Neural Networks, 2102-2106. Bezdek, J. (1981), Pattern recognition with fuzzy objective function algorithms, Plenum Press, New York. Bilmes, J. A. (1998), A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, Berkeley, CA. Bishop, C. M. (1990), Curvature-driven smoothing in back-propagation neural networks. Proceedings of International Neural Networks Conference (INNC'90), 749– 752. Bone, R., and Crucianu, M. (2002), Multi-step-ahead prediction with neural networks: A review. Wines rencontres internationales: Approches Connexionnistes en Sciences, Boulogne sur Mer, France, 97-106. Bontempi, G., Birattari, M., and Bersini, H. (1999), Local learning for iterated time series prediction. Machine Learning: Proceedings of the Sixteenth International Conference, San Francisco, CA, 32-38. Box, G. E. P., and Jenkins, G. M. (1970), Time series analysis, forecasting and control, Holden-Day. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, P. J. (1984), Classification and regression trees, Wadsworth International Group, CA. Bridle, J. (1989), Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. Neurocomputing: Algorithms, Architectures and Applications, F. Fogelman Souli'e and J. H'erault, eds., Springer-Verlag, 227-236. Broomhead, D. S., and Lowe, D. (1988), Multivariable function interpolation and adaptive networks. Complex Systems, 2,321-355. Burges, C. C. J. (1998), A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2,121-167. Cao, L. (2003), Support vector machines experts for time series forecasting. Neurocomputing, 51,321-339. Casdagli, M. (1989), Nonlinear prediction of chaotic time series. Physica D, 35,335– 356. Chan, K.-S., and Tong, H. (2001), Chaos: a statistical perspective, Springer-Verlag, New York. Carroll, T. L. (1998), Multiple Attractors and Periodic Transients in Synchronized Chaotic Circuits. Physices Letter A, 238,365-368. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., and Freeman, D. (1988a), AutoClass: a Bayesian classification system. Proceedings of the Fifth International Conference on Machine Learning, 54-64. Cheeseman, P., Stutz, J., Self, M., Kelly, J., Taylor, W., and Freeman, D. (1988b), Bayesian classification. Proceedings of the Seventh National Conference of Artificial Intelligence, 607-611. Cheeseman, P., and Stutz, J. (1996), Bayesian Classification (AutoClass): theory and results. Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., American Association for Artificial Intelligence Press/MIT Press, Menlo Park, CA, USA, 153-180. Chen, H., and Liu, R.-W. (1992), Adaptive distributed orthogonalization processing for principal components analysis. Proceedings of International Conference on Acoustics, Speech and Signal Processing, San Francisco, CA, 293-296. Chen, R. (1995), Threshold variable selection in open-loop threshold autoregressive models. Journal of Time Series Analysis, 16,461-481. Chen, S., Billings, S. A., and Grant, P. M. (1990), Non-linear system identification using neural networks. International Journal of Control, 51,1191-1214. Chudy, L., and Farkas, I. (1998), Prediction of chaotic time-series using dynamic cell structures and local linear models. Neural Network World, 8(5), 481-489. Cleveland, W. (1979), Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74,829-836. Cleveland, W., and Devlin, S. (1988), Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83, 596-609. [39] Cortes, C., and Vapnik, V. (1995), Support-vector networks. Machine Learning, 20(3), 273-297. Cottrell, B. M., Girard, Y., Mangeas, M., and Muller, C. (1995), Neural modeling for time series: a statistical stepwise method for weight elimination. IEEE Trans on Neural Networks, 6(6), 1355-1364. Crowder, R. (1990), Predicting the Mackey-Glass time series with cascade correlation learning. The Connectionists Models Summer School, 117-123. Cybenko, G. (1989), Approximation by superpositions of a sigmoid function. Mathematics of Control, Signals and Systems, 2,303-314. Dangelmayr, G., Gada]eta, S., Hundley, D., and Kirby, M. (1999), Time series prediction by estimating Markov probabilities through topology preserving maps. SPIE Vol. 3812, Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation II, 86-93. De Groot, C., and Wurtz, D. (1991), Analysis of univariate time series with connectionist nets. A case study of two classical examples. Neurocomputing, 3(4), 177– 192. Dempster, A., Laird, N., and Rubin, D. (1977), Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B(39), 1-38. Deppisch, J., Bauer, H.-U., and Geisel, T. (1991), Hierarchical training and its application to dynamical systems and prediction of chaotic time series. Physics Letters, 158,57-62. Der, R., and Herrmann, M. (1994), Nonlinear chaos control by neural nets. Proceedings of International Conference on Artificial Neural Networks (ICANN'94), 1227-1230. Devijver, P., and Kittler, J. (1982), Pattern recognition. A statistical approach, Prentice Hall, Englewood Cliffs. Drucker, H., Burges, C., Kaufman, L., Smola, A. J., and Vapnik, V. (1997), Support vector regression machines. Advances in Neural Information Processing Systems 9, M. Mozer, M. Jordan, and T. Petsche, eds., MIT Press, Cambridge, MA. Drucker, H., Wu, D., and Vapnik, V. (1999), Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048-1054. Elman, J. (1990), Finding structure in time. Cognitive Science, 14,179-211. Elsner, J. B. (1992), Predicting time series using a neural network as a method of distinguishing chaos from noise. Journal of Physics A: Mathematical and General, 25, 843-850. Engle, R. F. (1982), Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50,987-1007. Epanechnikov, V. A. (1969), Nonparametric estimation of a multivariate probability density. Theory of Probability and its Applications, 14,153-158. Fair, R. C, and Jaffee, D. M. (1972), Methods of estimation for markets in disequilibrium. Econometrica, 40,497-514. Fan, J., and Yao, Q. (2003), Nonlinear time series, nonparametric and parametric methods, Springer-Verlag, New York. Fan, J., and Gijbels, I. (1996), Local polynominal modelling and its application, Chapman and Hall, London. Farmer, J. D., and Sidorowich, J. J. (1987), Predicting chaotic time series. Physical Review Letters, 59(8), 845-848. Farmer, J. D., and Sidorowich, J. J. (1988), Exploiting chaos to predict the future and reduce noise. Evolution, Learning and Cognition, Y. C. Lee, eds., World Scientific Press, 277-330. Fernandez, R. (1999), Predicting time series with a local support vector regression machine. Advanced Course on Artificial Intelligence 99. Flake, G. W., and Lawrence, S. (2002), Efficient SVM regression training with SMO. Machine Learning, 46(1-3), 271-290. Fletcher, R. (1987), Practical methods of optimization, Jon Wiley and Sons. Fraser, A. M., and Dimitriadis, A. (1994), Forecasting probability densities by using hidden Markov models. Time Series Prediction: Forecasting the Future and Understanding the Past, A. S. Weigend and N. A. Gershenfeld, eds., Addison-Wesley, MA, 265-282. Friedman, J. (1991), Multivariate adaptive regression splines. Annals of Statistics, 19, 1-142. Friedman, J. H. (1994), An overview of predictive learning and function approximation. From Statistics to Neural Networks, V. Cherkassky, J. H. Friedman, and H. Wechsler, eds., Springer-Verlag, 1-61. Funahashi, K. (1989), On the approximate realization of continuous mappings by neural networks. Neural Networks, 2,183-192. Geladi, P., and Kowalski, B. R. (1986), Partial least squares regression: a tutorial. Analytica Chimica Acta, 185(1), 1-17. Geman, S., Bienestock, E., and Doursat, R. (1992), Neural networks and the bias/variance dilemma. Neural Computation, 4,1-58. Gers, F. A., Eck, D., and Schmidhuber, J. (2001), Applying LSTM to time series predictable through time-window approaches. Proceeding of International Conference on Artificial Neural Networks (ICANN 2001), Vienna, Austria, 669-675. Girosi, F. (1997), An equivalence between sparse approximation and support vector machines. MIT Artificial Intelligence Laboratory. Goldfeld, S. M., and Quandt, R. (1972), Nonlinear methods in econometrics, North- Holland Publishing Co., Amsterdam. Goldfeld, S. M., and Quandt, R. (1973), A Markov model for switching regressions. Journal of Econometrics, 1,3-16. Gorr, W. L. (1994), Research prospective on neural network forecasting. International Journal of Forecasting, 10(1), 1-4. Gray, S. F. (]996), Modelling the conditional distribution of interest rates as a regimeswitching process. Journal of Financial Economics, 42,27-62. Grosse, E. (1989), LOESS: Multivariate smoothing by moving least squares. Approximation Theory, C. K. Chui and L. L. Schumaker, eds., Academic Press, 299-302. Hamilton, J. D. (1990), Analysis of time series subject to changes in regime, Journal of Econometrics, 45,39-70. Hamilton, J. D., and Susmel, R. (1994), Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics, 64,307-333. Hardie, W. (1990), Applied nonparametric regression, Cambridge University Press, Cambridge. Haykin, S. (1999), Neural networks. A comprehensive foundation, Macmillan College Publishing Company, N. J. Hill, T., Marquez, L., O'Connor, M., and Remus, W. (1994), Artificial neural network models for forecasting and decision making. International Journal of Forecasting, 10, 5-15. Hinton, G. E. (1989), Connectionist learning procedures. Artificial Intelligence, 40, 185-234. Hornik, K., Stinchcombe, M., and White, H. (1989), Multilayer feedforward networks are universal approximations. Neural Networks, 2,359-366. Hu, M. J. C. (1964), Application of the adaline system to weather forecasting, Technical Report 6775-1, Stanford Electronic Lab, Stanford, CA. Huber, P. J. (1964), Robust estimation of a location parameter. Annals of Mathematical Statistics, B(35), 73-101. Huber, P. J. (1981), Robust statistics, Wiley, New York. Htibner, U., Weiss, C. 0., Abraham, N. B., and Tang, D. (1994), Lorenz-like chaos in NH 3-FIR lasers. Time Series Prediction: Forecasting the Future and Understanding the Past, A. S. Weigend and N. A. Gershenfeld, eds., Addison-Wesley, MA, 73-104. Ikeda, K. (1979), Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system. Optics Communications, 30(2), 257-261. Inoue, H., Fukunaga, Y., and Narihisa, H. (2001), Efficient hybrid neural network for chaotic time series prediction. Proceedings of International Conference on Artificial Neural Networks (ICANN 2001), 712-718. Jacobs, R. A. (1997), Bias/Variance analyses of mixtures-of-experts architectures. Neural Computation, 9,369-383. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991), Adaptive mixtures of local experts. Neural Computation, 3,79-87. Jang, J.-S. R. (1993), Anfis: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 23(3), 665-685. Jones, R., Lee, Y., Barnes, C., Flake, G., Lee, K., and Lewis, P. (1990), Function approximation and time series prediction with neural networks. Proceedings of International Joint Conference on Neural Networks (IJCNN1990), 649-665. Jordan, M. I. (1986), Attractor dynamics and parallelism in a connectionist sequential machine. Eighth Annual Conference of the Cognitive Science Society, Englewood Cliffs, NJ: Erlbaum, 531-546. Jordan, M. I., and Xu, L. (1995), Convergence results for the EM approach to mixtures of experts architectures. Neural Networks, 8,1409-1431. Kasabov, N., and Song, Q. (2002), DENFIS: dynamic evolving neural-fuzzy inference system and its application for time series prediction. IEEE Transactions on Fuzzy Systems, 10(2), 144-154. Kohonen, T. (1982), Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43,59-69. Lapedes, A., and Farber, R. (1988), How neural nets work. Evolution learning and cognition, Y. C. Lee, eds., World Scientific, 331-346. LeCun, Y. A., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Muller, U. A., Sackinger, E., Simard, P., and Vapnik, V. N. (1995), Learning algorithms for classification: a comparison of learning algorithms for handwritten digit recognition. Neural Networks: The Statistical Mechanics Perspective, J. H. Oh, C. Kwon, and S. Cho, eds., World Scientific, 261-276. Liehr, S., Pawelzik, K., Kohlmorgen, J., Lemm, S., and Muller, K.-R. (1999), Hidden Markov mixtures of experts for prediction of non-stationary dynamics. Proceedings of Neural Networks for Signal Processing IX, IEEE, NJ, 195-204. Liporace, L. A. (1982), Maximum likelihood estimation for multivariate observations of Markov source. IEEE Transactions on Information Theory, 28(5), 729-734. Lippmann, R. (1989), Pattern classification using neural networks. IEEE Communications Magazine, 27(11), 47-64. Littmann, E., and Ritter, H. (1996), Learning and generalization in cascade network architectures. Neural Computation, 8,1521-1539. Lorenz, E. N. (1963), Deterministic non-periodic flows. Journal of Atmospheric Science, 20,130-141. Lowe, D., and Webb, A. R. (1994), Time series prediction by adaptive networks: A dynamical systems perspective. Artificial Neural Networks, Forecasting Time Series, V. R. Vemuri and R. D. Rogers, eds., IEEE Computer Society Press, 12-19. Mackey, M. C., and Glass, L. (1977), Oscillations and chaos in physiological control systems. Science, 197,287-289. Martinez, T., Berkovich, S., and Schulten, G. (1993), "Neural-gas" network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4,558-569. Mattera, D., and Haykin, S. (1999), Support vector machines for dynamic reconstruction of a chaotic system. Advances in Kernel Methods — Support Vector Learning, B. Scholkopf, C. Burges, and A. Smola, eds., MIT Press, 211-242. McCullagh, P., and Nelder, J. A. (1989), Generalised linear models, monographs on statistics and applied probability, Chapman and Hall, London. MeNames, J. (1999), Innovations in local modeling for time series prediction, Ph.D. thesis, Stanford University. McNames, J., Suykens, J., and Vandewalle, J. (1999), Wining entry of the K. U. Leuven time-series prediction competition. International Journal of Bifurcation and Chaos, 9(8), 1485-1500. Meir, R. ()994), Bias, variance and the combination of estimators; the case of linear least squares. Department of Electrical Engineering, Technion, Haifa, Israel. R. L., Machado, R. J., and Renteria, R. P. (1999), Time-series forecasting through wavelets transformation and a mixture of expert models. Neurocomputing, 28(1-3), 145-156. Moody, J., and Darken, C. J. (1989), Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2), 281-294. Moran, P. A. P. (1953), The statistical analysis of the Canadian Lynx cycle I: Structure and prediction. Australian Journal of Zoology, 1,163-173. Mukherjee, S., Osuna, E., and Girosi, F. (1997), Nonlinear prediction of chaotic time series using support vector machines. Proceeding of IEEE NNSP 97, 511-519. Muller, K., Smola, A., Ratsch, G., Scholkopf, B., Kohlmorgen, J., and Vapnik, V. (1997), Predicting time series with support vector machines. Proceedings of International Conference on Artificial Neural Networks (ICANN'9 7) , 999-1004. Muller, K., Smola, A., Misch, G., SchOlkopf, B., Kohlmorgen, J., and Vapnik, V. (1999), Using support vector machines for time series prediction. Advances in Kernel Methods Support Vector Learning, B. SchOlkopf, C. J. C. Burges, and A. J. Smola, eds., MIT Press, Cambridge, MA, 243-254. Muller, K., Mika, S., Ratsch, G., Tsuda, K., and SchOlkopf, B. (2001), An introduction to kernel-based learning alg

    Research of mixture of experts model for time series prediction

    No full text
    xxiv, 237 leaves :ill. ; 30 cm. Includes bibliographical references. University of Otago department: Information Science. "15 November 2005".For the prediction of chaotic time series, a dichotomy has arisen between local approaches and global approaches. Local approaches hold the reputation of simplicity and feasibility, but they generally do not produce a compact description of the underlying system and are computationally intensive. Global approaches have the advantage of requiring less computation and are able to yield a global representation of the studied time series. However, due to the complexity of the time series process, it is often not easy to construct a global model to perform the prediction precisely. In addition to these approaches, a combination of the global and local techniques, called mixture of experts (ME), is also possible, where a smaller number of models work cooperatively to implement the prediction. This thesis reports on research about ME models for chaotic time series prediction. Based on a review of the techniques in time series prediction, a HMM-based ME model called "Timeline" Hidden Markov Experts (THME) is developed, where the trajectory of the time series is divided into some regimes in the state space and regression models called local experts are applied to learn the mapping on the regimes separately. The dynamics for the expert combination is a HMM, however, the transition probabilities are designed to be time-varying and conditional on the "real time" information of the time series. For the learning of the "time-line" HMM, a modified Baum—Welch algorithm is developed and the convergence of the algorithm is proved. Different versions of the model, based on MLP, RBF and SVM experts, are constructed and applied to a number of chaotic time series on both one-step-ahead and multi-step-ahead predictions. Experiments show that in general THME achieves better generalization performance than the corresponding single models in one-step-ahead prediction and comparable to some published benchmarks in multi-step-ahead prediction. Various properties of THME, such as the feature selection for trajectory dividing, the clustering techniques for regime extraction, the "time-line" HMM for expert combination and the performance of the model when it has different number of experts, are investigated. A number of interesting future directions for this work are suggested, which include the feature selection for regime extraction, the model selection for transition probability modelling, the extension to distribution prediction and the application on other time series.UnpublishedAronszajn, N. (1950), Theory of reproducing kernels. Transactions of the American Mathematical Society, 68,337-404. Atiya, A. F., El-Shoura, S. M., Shaheen, S. I., and El-Sherif, M. S. (1999), A comparison between neural-network forecasting techniques — Case study: river flow forecasting. IEEE Transactions on Neural Networks, 10(2), 402-409. Atkeson, C. G. (1992), Memory-based approaches to approximating continuous functions. Nonlinear Modeling and Forecasting, M. Casdagli and S. Eubank, eds., Addison-Wesley, New York, 503-521. Atkeson, C. G., Moore, A. W., and Schaal, S. (1997), Locally weighted learning. Articial Intelligence Review, 11,11-73. Aupetit, M., Couturier, P., and Massotte, P. (2000), Function approximation with continuous self-organizing maps using neighboring influence interpolation. Proceedings of ICSC Symposia on Neural Computation (NC'2000), Berlin, Germany. Bakker, R., Schouten, J. C., Giles, C. L., Takens, F., and Van den Bleek, C. M. (2000), Learning chaotic attractors by neural networks. Neural Computation, 12(10), 2355– 2383. Barron, A. R. (1993), Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930-945. Baum, L. E. (1972), An inequality and associated maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains. Inequalities, 3,1-8. Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970), A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41,164-171. Bellman, R. E. (1961), Adaptive control processes : A guided tour, Princeton University Press, Princeton, N.J. Bengio, Y., and Frasconi, P. (1995), An input output HMM architecture. Advances in Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, eds., MIT Press, Cambridge, MA, 427-434. Bengio, S., Fessant, F., and Collobert, D. (1996), Use of modular architectures for timeseries prediction. Neural Processing Letters, 3(2), 101-106. Bengio, Y., and Frasconi, P. (1996), Input/Output HMMs for sequence processing. IEEE Transactions on Neural Networks, 7(5), 1231-1249. Bengio, Y., Lauzon, V., and Ducharme, R. (2001), Experiments on the application of IOHMMs to model financial return series. IEEE Transactions on Neural Networks, 12(1), 113-123. Bersini, H., Birattari, M., and Bontempi, G. (1998), Adaptive memory-based regression methods. Proceedings of the 1998 IEEE International Joint Conference on Neural Networks, 2102-2106. Bezdek, J. (1981), Pattern recognition with fuzzy objective function algorithms, Plenum Press, New York. Bilmes, J. A. (1998), A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, Berkeley, CA. Bishop, C. M. (1990), Curvature-driven smoothing in back-propagation neural networks. Proceedings of International Neural Networks Conference (INNC'90), 749– 752. Bone, R., and Crucianu, M. (2002), Multi-step-ahead prediction with neural networks: A review. Wines rencontres internationales: Approches Connexionnistes en Sciences, Boulogne sur Mer, France, 97-106. Bontempi, G., Birattari, M., and Bersini, H. (1999), Local learning for iterated time series prediction. Machine Learning: Proceedings of the Sixteenth International Conference, San Francisco, CA, 32-38. Box, G. E. P., and Jenkins, G. M. (1970), Time series analysis, forecasting and control, Holden-Day. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, P. J. (1984), Classification and regression trees, Wadsworth International Group, CA. Bridle, J. (1989), Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. Neurocomputing: Algorithms, Architectures and Applications, F. Fogelman Souli'e and J. H'erault, eds., Springer-Verlag, 227-236. Broomhead, D. S., and Lowe, D. (1988), Multivariable function interpolation and adaptive networks. Complex Systems, 2,321-355. Burges, C. C. J. (1998), A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2,121-167. Cao, L. (2003), Support vector machines experts for time series forecasting. Neurocomputing, 51,321-339. Casdagli, M. (1989), Nonlinear prediction of chaotic time series. Physica D, 35,335– 356. Chan, K.-S., and Tong, H. (2001), Chaos: a statistical perspective, Springer-Verlag, New York. Carroll, T. L. (1998), Multiple Attractors and Periodic Transients in Synchronized Chaotic Circuits. Physices Letter A, 238,365-368. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., and Freeman, D. (1988a), AutoClass: a Bayesian classification system. Proceedings of the Fifth International Conference on Machine Learning, 54-64. Cheeseman, P., Stutz, J., Self, M., Kelly, J., Taylor, W., and Freeman, D. (1988b), Bayesian classification. Proceedings of the Seventh National Conference of Artificial Intelligence, 607-611. Cheeseman, P., and Stutz, J. (1996), Bayesian Classification (AutoClass): theory and results. Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., American Association for Artificial Intelligence Press/MIT Press, Menlo Park, CA, USA, 153-180. Chen, H., and Liu, R.-W. (1992), Adaptive distributed orthogonalization processing for principal components analysis. Proceedings of International Conference on Acoustics, Speech and Signal Processing, San Francisco, CA, 293-296. Chen, R. (1995), Threshold variable selection in open-loop threshold autoregressive models. Journal of Time Series Analysis, 16,461-481. Chen, S., Billings, S. A., and Grant, P. M. (1990), Non-linear system identification using neural networks. International Journal of Control, 51,1191-1214. Chudy, L., and Farkas, I. (1998), Prediction of chaotic time-series using dynamic cell structures and local linear models. Neural Network World, 8(5), 481-489. Cleveland, W. (1979), Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74,829-836. Cleveland, W., and Devlin, S. (1988), Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83, 596-609. [39] Cortes, C., and Vapnik, V. (1995), Support-vector networks. Machine Learning, 20(3), 273-297. Cottrell, B. M., Girard, Y., Mangeas, M., and Muller, C. (1995), Neural modeling for time series: a statistical stepwise method for weight elimination. IEEE Trans on Neural Networks, 6(6), 1355-1364. Crowder, R. (1990), Predicting the Mackey-Glass time series with cascade correlation learning. The Connectionists Models Summer School, 117-123. Cybenko, G. (1989), Approximation by superpositions of a sigmoid function. Mathematics of Control, Signals and Systems, 2,303-314. Dangelmayr, G., Gada]eta, S., Hundley, D., and Kirby, M. (1999), Time series prediction by estimating Markov probabilities through topology preserving maps. SPIE Vol. 3812, Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation II, 86-93. De Groot, C., and Wurtz, D. (1991), Analysis of univariate time series with connectionist nets. A case study of two classical examples. Neurocomputing, 3(4), 177– 192. Dempster, A., Laird, N., and Rubin, D. (1977), Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B(39), 1-38. Deppisch, J., Bauer, H.-U., and Geisel, T. (1991), Hierarchical training and its application to dynamical systems and prediction of chaotic time series. Physics Letters, 158,57-62. Der, R., and Herrmann, M. (1994), Nonlinear chaos control by neural nets. Proceedings of International Conference on Artificial Neural Networks (ICANN'94), 1227-1230. Devijver, P., and Kittler, J. (1982), Pattern recognition. A statistical approach, Prentice Hall, Englewood Cliffs. Drucker, H., Burges, C., Kaufman, L., Smola, A. J., and Vapnik, V. (1997), Support vector regression machines. Advances in Neural Information Processing Systems 9, M. Mozer, M. Jordan, and T. Petsche, eds., MIT Press, Cambridge, MA. Drucker, H., Wu, D., and Vapnik, V. (1999), Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048-1054. Elman, J. (1990), Finding structure in time. Cognitive Science, 14,179-211. Elsner, J. B. (1992), Predicting time series using a neural network as a method of distinguishing chaos from noise. Journal of Physics A: Mathematical and General, 25, 843-850. Engle, R. F. (1982), Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50,987-1007. Epanechnikov, V. A. (1969), Nonparametric estimation of a multivariate probability density. Theory of Probability and its Applications, 14,153-158. Fair, R. C, and Jaffee, D. M. (1972), Methods of estimation for markets in disequilibrium. Econometrica, 40,497-514. Fan, J., and Yao, Q. (2003), Nonlinear time series, nonparametric and parametric methods, Springer-Verlag, New York. Fan, J., and Gijbels, I. (1996), Local polynominal modelling and its application, Chapman and Hall, London. Farmer, J. D., and Sidorowich, J. J. (1987), Predicting chaotic time series. Physical Review Letters, 59(8), 845-848. Farmer, J. D., and Sidorowich, J. J. (1988), Exploiting chaos to predict the future and reduce noise. Evolution, Learning and Cognition, Y. C. Lee, eds., World Scientific Press, 277-330. Fernandez, R. (1999), Predicting time series with a local support vector regression machine. Advanced Course on Artificial Intelligence 99. Flake, G. W., and Lawrence, S. (2002), Efficient SVM regression training with SMO. Machine Learning, 46(1-3), 271-290. Fletcher, R. (1987), Practical methods of optimization, Jon Wiley and Sons. Fraser, A. M., and Dimitriadis, A. (1994), Forecasting probability densities by using hidden Markov models. Time Series Prediction: Forecasting the Future and Understanding the Past, A. S. Weigend and N. A. Gershenfeld, eds., Addison-Wesley, MA, 265-282. Friedman, J. (1991), Multivariate adaptive regression splines. Annals of Statistics, 19, 1-142. Friedman, J. H. (1994), An overview of predictive learning and function approximation. From Statistics to Neural Networks, V. Cherkassky, J. H. Friedman, and H. Wechsler, eds., Springer-Verlag, 1-61. Funahashi, K. (1989), On the approximate realization of continuous mappings by neural networks. Neural Networks, 2,183-192. Geladi, P., and Kowalski, B. R. (1986), Partial least squares regression: a tutorial. Analytica Chimica Acta, 185(1), 1-17. Geman, S., Bienestock, E., and Doursat, R. (1992), Neural networks and the bias/variance dilemma. Neural Computation, 4,1-58. Gers, F. A., Eck, D., and Schmidhuber, J. (2001), Applying LSTM to time series predictable through time-window approaches. Proceeding of International Conference on Artificial Neural Networks (ICANN 2001), Vienna, Austria, 669-675. Girosi, F. (1997), An equivalence between sparse approximation and support vector machines. MIT Artificial Intelligence Laboratory. Goldfeld, S. M., and Quandt, R. (1972), Nonlinear methods in econometrics, North- Holland Publishing Co., Amsterdam. Goldfeld, S. M., and Quandt, R. (1973), A Markov model for switching regressions. Journal of Econometrics, 1,3-16. Gorr, W. L. (1994), Research prospective on neural network forecasting. International Journal of Forecasting, 10(1), 1-4. Gray, S. F. (]996), Modelling the conditional distribution of interest rates as a regimeswitching process. Journal of Financial Economics, 42,27-62. Grosse, E. (1989), LOESS: Multivariate smoothing by moving least squares. Approximation Theory, C. K. Chui and L. L. Schumaker, eds., Academic Press, 299-302. Hamilton, J. D. (1990), Analysis of time series subject to changes in regime, Journal of Econometrics, 45,39-70. Hamilton, J. D., and Susmel, R. (1994), Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics, 64,307-333. Hardie, W. (1990), Applied nonparametric regression, Cambridge University Press, Cambridge. Haykin, S. (1999), Neural networks. A comprehensive foundation, Macmillan College Publishing Company, N. J. Hill, T., Marquez, L., O'Connor, M., and Remus, W. (1994), Artificial neural network models for forecasting and decision making. International Journal of Forecasting, 10, 5-15. Hinton, G. E. (1989), Connectionist learning procedures. Artificial Intelligence, 40, 185-234. Hornik, K., Stinchcombe, M., and White, H. (1989), Multilayer feedforward networks are universal approximations. Neural Networks, 2,359-366. Hu, M. J. C. (1964), Application of the adaline system to weather forecasting, Technical Report 6775-1, Stanford Electronic Lab, Stanford, CA. Huber, P. J. (1964), Robust estimation of a location parameter. Annals of Mathematical Statistics, B(35), 73-101. Huber, P. J. (1981), Robust statistics, Wiley, New York. Htibner, U., Weiss, C. 0., Abraham, N. B., and Tang, D. (1994), Lorenz-like chaos in NH 3-FIR lasers. Time Series Prediction: Forecasting the Future and Understanding the Past, A. S. Weigend and N. A. Gershenfeld, eds., Addison-Wesley, MA, 73-104. Ikeda, K. (1979), Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system. Optics Communications, 30(2), 257-261. Inoue, H., Fukunaga, Y., and Narihisa, H. (2001), Efficient hybrid neural network for chaotic time series prediction. Proceedings of International Conference on Artificial Neural Networks (ICANN 2001), 712-718. Jacobs, R. A. (1997), Bias/Variance analyses of mixtures-of-experts architectures. Neural Computation, 9,369-383. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991), Adaptive mixtures of local experts. Neural Computation, 3,79-87. Jang, J.-S. R. (1993), Anfis: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 23(3), 665-685. Jones, R., Lee, Y., Barnes, C., Flake, G., Lee, K., and Lewis, P. (1990), Function approximation and time series prediction with neural networks. Proceedings of International Joint Conference on Neural Networks (IJCNN1990), 649-665. Jordan, M. I. (1986), Attractor dynamics and parallelism in a connectionist sequential machine. Eighth Annual Conference of the Cognitive Science Society, Englewood Cliffs, NJ: Erlbaum, 531-546. Jordan, M. I., and Xu, L. (1995), Convergence results for the EM approach to mixtures of experts architectures. Neural Networks, 8,1409-1431. Kasabov, N., and Song, Q. (2002), DENFIS: dynamic evolving neural-fuzzy inference system and its application for time series prediction. IEEE Transactions on Fuzzy Systems, 10(2), 144-154. Kohonen, T. (1982), Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43,59-69. Lapedes, A., and Farber, R. (1988), How neural nets work. Evolution learning and cognition, Y. C. Lee, eds., World Scientific, 331-346. LeCun, Y. A., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Muller, U. A., Sackinger, E., Simard, P., and Vapnik, V. N. (1995), Learning algorithms for classification: a comparison of learning algorithms for handwritten digit recognition. Neural Networks: The Statistical Mechanics Perspective, J. H. Oh, C. Kwon, and S. Cho, eds., World Scientific, 261-276. Liehr, S., Pawelzik, K., Kohlmorgen, J., Lemm, S., and Muller, K.-R. (1999), Hidden Markov mixtures of experts for prediction of non-stationary dynamics. Proceedings of Neural Networks for Signal Processing IX, IEEE, NJ, 195-204. Liporace, L. A. (1982), Maximum likelihood estimation for multivariate observations of Markov source. IEEE Transactions on Information Theory, 28(5), 729-734. Lippmann, R. (1989), Pattern classification using neural networks. IEEE Communications Magazine, 27(11), 47-64. Littmann, E., and Ritter, H. (1996), Learning and generalization in cascade network architectures. Neural Computation, 8,1521-1539. Lorenz, E. N. (1963), Deterministic non-periodic flows. Journal of Atmospheric Science, 20,130-141. Lowe, D., and Webb, A. R. (1994), Time series prediction by adaptive networks: A dynamical systems perspective. Artificial Neural Networks, Forecasting Time Series, V. R. Vemuri and R. D. Rogers, eds., IEEE Computer Society Press, 12-19. Mackey, M. C., and Glass, L. (1977), Oscillations and chaos in physiological control systems. Science, 197,287-289. Martinez, T., Berkovich, S., and Schulten, G. (1993), "Neural-gas" network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4,558-569. Mattera, D., and Haykin, S. (1999), Support vector machines for dynamic reconstruction of a chaotic system. Advances in Kernel Methods — Support Vector Learning, B. Scholkopf, C. Burges, and A. Smola, eds., MIT Press, 211-242. McCullagh, P., and Nelder, J. A. (1989), Generalised linear models, monographs on statistics and applied probability, Chapman and Hall, London. MeNames, J. (1999), Innovations in local modeling for time series prediction, Ph.D. thesis, Stanford University. McNames, J., Suykens, J., and Vandewalle, J. (1999), Wining entry of the K. U. Leuven time-series prediction competition. International Journal of Bifurcation and Chaos, 9(8), 1485-1500. Meir, R. ()994), Bias, variance and the combination of estimators; the case of linear least squares. Department of Electrical Engineering, Technion, Haifa, Israel. R. L., Machado, R. J., and Renteria, R. P. (1999), Time-series forecasting through wavelets transformation and a mixture of expert models. Neurocomputing, 28(1-3), 145-156. Moody, J., and Darken, C. J. (1989), Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2), 281-294. Moran, P. A. P. (1953), The statistical analysis of the Canadian Lynx cycle I: Structure and prediction. Australian Journal of Zoology, 1,163-173. Mukherjee, S., Osuna, E., and Girosi, F. (1997), Nonlinear prediction of chaotic time series using support vector machines. Proceeding of IEEE NNSP 97, 511-519. Muller, K., Smola, A., Ratsch, G., Scholkopf, B., Kohlmorgen, J., and Vapnik, V. (1997), Predicting time series with support vector machines. Proceedings of International Conference on Artificial Neural Networks (ICANN'9 7) , 999-1004. Muller, K., Smola, A., Misch, G., SchOlkopf, B., Kohlmorgen, J., and Vapnik, V. (1999), Using support vector machines for time series prediction. Advances in Kernel Methods Support Vector Learning, B. SchOlkopf, C. J. C. Burges, and A. J. Smola, eds., MIT Press, Cambridge, MA, 243-254. Muller, K., Mika, S., Ratsch, G., Tsuda, K., and SchOlkopf, B. (2001), An introduction to kernel-based learning alg
    corecore