902 research outputs found
Neural Networks for Complex Data
Artificial neural networks are simple and efficient machine learning tools.
Defined originally in the traditional setting of simple vector data, neural
network models have evolved to address more and more difficulties of complex
real world problems, ranging from time evolving data to sophisticated data
structures such as graphs and functions. This paper summarizes advances on
those themes from the last decade, with a focus on results obtained by members
of the SAMM team of Universit\'e Paris
A Two-stage Architecture for Stock Price Forecasting by Integrating Self-Organizing Map and Support Vector Regression
Stock price prediction has attracted much attention from both practitioners and researchers. However, most studies in this area ignored the non-stationary nature of stock price series. That is, stock price series do not exhibit identical statistical properties at each point of time. As a result, the relationships between stock price series and their predictors are quite dynamic. It is challenging for any single artificial technique to effectively address this problematic characteristics in stock price series. One potential solution is to hybridize different artificial techniques. Towards this end, this study employs a two-stage architecture for better stock price prediction. Specifically, the self-organizing map (SOM) is first used to decompose the whole input space into regions where data points with similar statistical distributions are grouped together, so as to contain and capture the non-stationary property of financial series. After decomposing heterogeneous data points into several homogenous regions, support vector regression (SVR) is applied to forecast financial indices. The proposed technique is empirically tested using stock price series from seven major financial markets. The results show that the performance of stock price prediction can be significantly enhanced by using the two-stage architecture in comparison with a single SVR model
Decision support with data-analysis methods in a nuclear power plant
Early fault detection is an important issue in nuclear industry. Methods based on self-organizing map (SOM) in dynamic systems are discussed and developed to help operators and plant experts in their decision making and used together with other methods. Visualization issues are in an important role in this research. Prototype systems are built to be able to test the basic principles. Five different studies are presented in detail. This report summarizes the test case 4 (TC4) "Decision support at a nuclear power plant" in NoTeS and NoTeS2 projects in TEKES MASI research program
An academic review: applications of data mining techniques in finance industry
With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Mooreās Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance
Financial time series analysis with competitive neural networks
Lāobjectif principal de meĢmoire est la modeĢlisation des donneĢes temporelles non stationnaires. Bien que les modeĢles statistiques classiques tentent de corriger les donneĢes non stationnaires en diffeĢrenciant et en ajustant pour la tendance, je tente de creĢer des grappes localiseĢes de donneĢes de seĢries temporelles stationnaires graĢce aĢ lāalgorithme du Ā« self-organizing map Ā». Bien que de nombreuses techniques aient eĢteĢ deĢveloppeĢes pour les seĢries chronologiques aĢ lāaide du Ā« self- organizing map Ā», je tente de construire un cadre matheĢmatique qui justifie son utilisation dans la preĢvision des seĢries chronologiques financieĢres. De plus, je compare les meĢthodes de preĢvision existantes aĢ lāaide du SOM avec celles pour lesquelles un cadre matheĢmatique a eĢteĢ deĢveloppeĢ et qui nāont pas eĢteĢ appliqueĢes dans un contexte de preĢvision. Je compare ces meĢthodes avec la meĢthode ARIMA bien connue pour la preĢvision des seĢries chronologiques. Le deuxieĢme objectif de meĢmoire est de deĢmontrer la capaciteĢ du Ā« self-organizing map Ā» aĢ regrouper des donneĢes vectorielles, puisquāelle a eĢteĢ deĢveloppeĢe aĢ lāorigine comme un reĢseau neuronal avec lāobjectif de regroupement. Plus preĢciseĢment, je deĢmontrerai ses capaciteĢs de regroupement sur les donneĢes du Ā« limit order book Ā» et preĢsenterai diverses meĢthodes de visualisation de ses sorties.The main objective of this Masterās thesis is in the modelling of non-stationary time series data. While classical statistical models attempt to correct non- stationary data through differencing and de-trending, I attempt to create localized clusters of stationary time series data through the use of the self-organizing map algorithm. While numerous techniques have been developed that model time series using the self-organizing map, I attempt to build a mathematical framework that justifies its use in the forecasting of financial times series. Additionally, I compare existing forecasting methods using the SOM with those for which a framework has been developed and which have not been applied in a forecasting context. I then compare these methods with the well known ARIMA method of time series forecasting. The second objective of this thesis is to demonstrate the self-organizing mapās ability to cluster data vectors as it was originally developed as a neural network approach to clustering. Specifically I will demonstrate its clustering abilities on limit order book data and present various visualization methods of its output
A two-stage architecture for stock price forecasting by integrating self-organizing map and support vector regression
Author name used in the publication: JJ Po-An Hsieh2008-2009 > Academic research: refereed > Publication in refereed journalAccepted ManuscriptPublishe
Data-Driven Methods for Data Center Operations Support
During the last decade, cloud technologies have been evolving at
an impressive pace, such that we are now living in a cloud-native
era where developers can leverage on an unprecedented landscape
of (possibly managed) services for orchestration, compute, storage,
load-balancing, monitoring, etc. The possibility to have on-demand
access to a diverse set of configurable virtualized resources allows
for building more elastic, flexible and highly-resilient distributed
applications. Behind the scenes, cloud providers sustain the heavy
burden of maintaining the underlying infrastructures, consisting in
large-scale distributed systems, partitioned and replicated among
many geographically dislocated data centers to guarantee scalability,
robustness to failures, high availability and low latency. The larger the
scale, the more cloud providers have to deal with complex interactions
among the various components, such that monitoring, diagnosing and
troubleshooting issues become incredibly daunting tasks.
To keep up with these challenges, development and operations
practices have undergone significant transformations, especially in
terms of improving the automations that make releasing new software,
and responding to unforeseen issues, faster and sustainable at scale.
The resulting paradigm is nowadays referred to as DevOps. However,
while such automations can be very sophisticated, traditional DevOps
practices fundamentally rely on reactive mechanisms, that typically
require careful manual tuning and supervision from human experts.
To minimize the risk of outagesāand the related costsāit is crucial to
provide DevOps teams with suitable tools that can enable a proactive
approach to data center operations.
This work presents a comprehensive data-driven framework to address
the most relevant problems that can be experienced in large-scale
distributed cloud infrastructures. These environments are indeed characterized
by a very large availability of diverse data, collected at each
level of the stack, such as: time-series (e.g., physical host measurements,
virtual machine or container metrics, networking components
logs, application KPIs); graphs (e.g., network topologies, fault graphs
reporting dependencies among hardware and software components,
performance issues propagation networks); and text (e.g., source code,
system logs, version control system history, code review feedbacks).
Such data are also typically updated with relatively high frequency,
and subject to distribution drifts caused by continuous configuration
changes to the underlying infrastructure. In such a highly dynamic scenario,
traditional model-driven approaches alone may be inadequate
at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust
data-driven methods to support their decisions based on historical
information. For instance, effective anomaly detection capabilities may
also help in conducting more precise and efficient root-cause analysis.
Also, leveraging on accurate forecasting and intelligent control
strategies would improve resource management.
Given their ability to deal with high-dimensional, complex data,
Deep Learning-based methods are the most straightforward option for
the realization of the aforementioned support tools. On the other hand,
because of their complexity, this kind of models often requires huge
processing power, and suitable hardware, to be operated effectively
at scale. These aspects must be carefully addressed when applying
such methods in the context of data center operations. Automated
operations approaches must be dependable and cost-efficient, not to
degrade the services they are built to improve.
i
Selection of representative feature training sets with self-organized maps for optimized time series modeling and prediction: application to forecasting daily drought conditions with ARIMA and neural network models
While the simulation of stochastic time series is challenging due to their inherently complex nature, this is compounded by the arbitrary and widely accepted feature data usage methods frequently applied during the model development phase. A pertinent context where these practices are reflected is in the forecasting of drought events. This chapter considers optimization of feature data usage by sampling daily data sets via self-organizing maps to select representative training and testing subsets and accordingly, improve the performance of effective drought index (EDI) prediction models. The effect would be observed through a comparison of artificial neural network (ANN) and an autoregressive integrated moving average (ARIMA) models incorporating the SOM approach through an inspection of commonly used performance indices for the city of Brisbane. This study shows that SOM-ANN ensemble models demonstrate competitive predictive performance for EDI values to those produced by ARIMA models
- ā¦