Search CORE

16 research outputs found

Technical report of data mining

Author: Albertoni Riccardo
Bertone Alessio
De Martino Monica
Demsar U.
Dunkars M.
Hauska H.
Publication venue
Publication date
Field of study

No abstract availabl

PUblication MAnagement

Mining temporal reservoir data using sliding window technique

Author: Ku-Mahamud Ku Ruhana
Md Norwawi Norita
Wan Ishak Wan Hussain
Publication venue: 'Coimbatore Institute of Information Technology'
Publication date: 01/06/2011
Field of study

Decision on reservoir water release is crucial during both intense and less intense rainfall seasons. Even though reservoir water release is guided by the procedures, decision usually made based on the past experiences. Past experiences are recorded either hourly, daily, or weekly in the reservoir operation log book. In a few years this log book will become knowledge-rich repository, but very difficult and time consuming to be referred. In addition, the temporal relationship between the data cannot be easily identified.In this study window sliding technique is applied to extract information from the reservoir operational database: a digital version of the reservoir operation log book.Several data sets were constructed based on different sliding window size. Artificial neural network was used as modelling tool.The findings indicate that eight days is the significant time lags between upstream rainfall and reservoir water level.The best artificial neural network model is 24-15-3

UUM Repository

NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA

Author: Al-Naymat Ghazi
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2009
Field of study

Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002

CiteSeerX

Sydney eScholarship

Reporting flock patterns

Author: Benkert Marc
Gudmundsson Joachim
Hübner Florian
Publication venue: Universität Karlsruhe (TH)
Publication date: 01/01/2006
Field of study

Data representing moving objects is rapidly getting more available, especially in the area of wildlife GPS tracking. It is a central belief that information is hidden in large data sets in the form of interesting patterns. One of the most common spatio-temporal patterns sought after is flocks. A flock is a large enough subset of objects moving along paths close to each other for a certain pre-defined time. We give a new definition that we argue is more realistic than the previous ones, and by the use of techniques from computational geometry we present fast algorithms to detect and report flocks. The algorithms are analysed both theoretically and experimentally

KITopen

A fresh engineering approach for the forecast of financial index volatility and hedging strategies

Author: Ma Pu Yun
Publication venue: École de technologie supérieure
Publication date
Field of study

This thesis attempts a new light on a problem of importance in Financial Engineering. Volatility is a commonly accepted measure of risk in the investment field. The daily volatility is the determining factor in evaluating option prices and in conducting different hedging strategies. The volatility estimation and forecast are still far from successfully complete for industry acceptance, judged by their generally lower than 50% forecasting accuracy. By judiciously coordinating the current engineering theory and analytical techniques such as wavelet transform, evolutionary algorithms in a Time Series Data Mining framework, and the Markov chain based discrete stochastic optimization methods, this work formulates a systematic strategy to characterize and forecast crucial as well as critical financial time series. Typical forecast features have been extracted from different index volatility data sets which exhibit abrupt drops, jumps and other embedded nonlinear characteristics so that accuracy of forecasting can be markedly improved in comparison with those of the currently prevalent methods adopted in the industry. The key aspect of the presented approach is "transformation and sequential deployment": i) transform the data from being non-observable to observable i.e., from variance into integrated volatility; ii) conduct the wavelet transform to determine the optimal forecasting horizon; iii) transform the wavelet coefficients into 4-lag recursive data sets or viewed differently as a Markov chain; iv) apply certain genetic algorithms to extract a group of rules that characterize different patterns embedded or hidden in the data and attempt to forecast the directions/ranges of the one-step ahead events; and v)apply genetic programming to forecast the values of the one-step ahead events. By following such a step by step approach, complicated problems of time series forecasting become less complex and readily resolvable for industry application. To implement such an approach, the one year, two year and five year S&PlOO historical data are used as training sets to derive a group of 100 rules that best describe their respective signal characteristics. These rules are then used to forecast the subsequent out-of-sample time series data. This set of tests produces an average of over 75% of correct forecasting rate that surpasses any other publicly available forecast results on any type of financial indices. Genetic programming was then applied on the out of sample data set to forecast the actual value of the one step-ahead event. The forecasting accuracy reaches an average of 70%, which is a marked improvement over other current forecasts. To validate the proposed approach, indices of S&P500 as well as S&P 100 data are tested with the discrete stochastic optimization method, which is based on Markov chain theory and involves genetic algorithms. Results are further validated by the bootstrapping operation. All these trials showed a good reliability of the proposed methodology in this research work. Finally, the thus established methodology has been shown to have broad applications in option pricing, hedging, risk management, VaR determination, etc

Espace ÉTS

Approximation Algorithms for Geometric Networks

Author: Andersson Mattias
Publication venue: Department of Computer Science, Lund University
Publication date: 01/01/2007
Field of study

The main contribution of this thesis is approximation algorithms for several computational geometry problems. The underlying structure for most of the problems studied is a geometric network. A geometric network is, in its abstract form, a set of vertices, pairwise connected with an edge, such that the weight of this connecting edge is the Euclidean distance between the pair of points connected. Such a network may be used to represent a multitude of real-life structures, such as, for example, a set of cities connected with roads. Considering the case that a specific network is given, we study three separate problems. In the first problem we consider the case of interconnected `islands' of well-connected networks, in which shortest paths are computed. In the second problem the input network is a triangulation. We efficiently simplify this triangulation using edge contractions. Finally, we consider individual movement trajectories representing, for example, wild animals where we compute leadership individuals. Next, we consider the case that only a set of vertices is given, and the aim is to actually construct a network. We consider two such problems. In the first one we compute a partition of the vertices into several subsets where, considering the minimum spanning tree (MST) for each subset, we aim to minimize the largest MST. The other problem is to construct a

t

-spanner of low weight fast and simple. We do this by first extending the so-called gap theorem. In addition to the above geometric network problems we also study a problem where we aim to place a set of different sized rectangles, such that the area of their corresponding bounding box is minimized, and such that a grid may be placed over the rectangles. The grid should not intersect any rectangle, and each cell of the grid should contain at most one rectangle. All studied problems are such that they do not easily allow computation of optimal solutions in a feasible time. Instead we consider approximation algorithms, where near-optimal solutions are produced in polynomial time. In addition to the above geometric network problems we also study a problem where we aim to place a set of different sized rectangles, such that the area of their corresponding bounding box is minimized, and such that a grid may be placed over the rectangles. The grid should not intersect any rectangle, and each cell of the grid should contain at most one rectangle. All studied problems are such that they do not easily allow computation of optimal solutions in a feasible time. Instead we consider approximation algorithms, where near-optimal solutions are produced in polynomial time

Lund University Publications

Malmö University Electronic Publishing

Temporal, Spatial, and Spatio-temporal Data Mining, First International Workshop, TSDM 2000, Lyon, France, September 12, 2000: Revised Papers

Author: Hornsby Kathleen, Editor
Roddick John F., Editor
Publication venue: DigitalCommons@UMaine
Publication date: 01/01/2000
Field of study

This volume contains updated versions of the ten papers presented at the First International Workshop on Temporal, Spatial and Spatio-Temporal Data Mining (TSDM 2000) held in conjunction with the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000) in Lyons, France in September, 2000. The aim of the workshop was to bring together experts in the analysis of temporal and spatial data mining and knowledge discovery in temporal, spatial or spatio-temporal database systems as well as knowledge engineers and domain experts from allied disciplines. The workshop focused on research and practice of knowledge discovery from datasets containing explicit or implicit temporal, spatial or spatio-temporal information. The ten original papers in this volume represent those accepted by peer review following an international call for papers. All papers submitted were refereed by an international team of data mining researchers listed below. We would like to thank the team for their expert and useful help with this process. Following the workshop, authors were invited to amend their papers to enable the feedback received from the conference to be included in the final papers appearing in this volume. A workshop report was compiled by Kathleen Hornsby which also discusses the panel session that was held.https://digitalcommons.library.umaine.edu/fac_monographs/1245/thumbnail.jp

University of Maine

Extraction de relations spatio-temporelles à partir des données environnementales et de la santé

Author: Alatrista-Salas Hugo
Publication venue: HAL CCSD
Publication date: 04/10/2013
Field of study

Thanks to the new technologies (smartphones, sensors, etc.), large amounts of spatiotemporal data are now available. The associated database can be called spatiotemporal databases because each row is described by a spatial information (e.g. a city, a neighborhood, a river, etc.) and temporal information (e.g. the date of an event). This huge data is often complex and heterogeneous and generates new needs in knowledge extraction methods to deal with these constraints (e.g. follow phenomena in time and space).Many phenomena with complex dynamics are thus associated with spatiotemporal data. For instance, the dynamics of an infectious disease can be described as the interactions between humans and the transmission vector as well as some spatiotemporal mechanisms involved in its development. The modification of one of these components can trigger changes in the interactions between the components and finally develop the overall system behavior.To deal with these new challenges, new processes and methods must be developed to manage all available data. In this context, the spatiotemporal data mining is define as a set of techniques and methods used to obtain useful information from large volumes of spatiotemporal data. This thesis follows the general framework of spatiotemporal data mining and sequential pattern mining. More specifically, two generic methods of pattern mining are proposed. The first one allows us to extract sequential patterns including spatial characteristics of data. In the second one, we propose a new type of patterns called spatio-sequential patterns. This kind of patterns is used to study the evolution of a set of events describing an area and its near environment.Both approaches were tested on real datasets associated to two spatiotemporal phenomena: the pollution of rivers in France and the epidemiological monitoring of dengue in New Caledonia. In addition, two measures of quality and a patterns visualization prototype are also available to assist the experts in the selection of interesting patters.Face à l'explosion des nouvelles technologies (mobiles, capteurs, etc.), de grandes quantités de données localisées dans l'espace et dans le temps sont désormais disponibles. Les bases de données associées peuvent être qualifiées de bases de données spatio-temporelles car chaque donnée est décrite par une information spatiale (e.g. une ville, un quartier, une rivière, etc.) et temporelle (p. ex. la date d'un événement). Cette masse de données souvent hétérogènes et complexes génère ainsi de nouveaux besoins auxquels les méthodes d'extraction de connaissances doivent pouvoir répondre (e.g. suivre des phénomènes dans le temps et l'espace). De nombreux phénomènes avec des dynamiques complexes sont ainsi associés à des données spatio-temporelles. Par exemple, la dynamique d'une maladie infectieuse peut être décrite par les interactions entre les humains et le vecteur de transmission associé ainsi que par certains mécanismes spatio-temporels qui participent à son évolution. La modification de l'un des composants de ce système peut déclencher des variations dans les interactions entre les composants et finalement, faire évoluer le comportement global du système. Pour faire face à ces nouveaux enjeux, de nouveaux processus et méthodes doivent être développés afin d'exploiter au mieux l'ensemble des données disponibles. Tel est l'objectif de la fouille de données spatio-temporelles qui correspond à l'ensemble de techniques et méthodes qui permettent d'obtenir des connaissances utiles à partir de gros volumes de données spatio-temporelles. Cette thèse s'inscrit dans le cadre général de la fouille de données spatio-temporelles et l'extraction de motifs séquentiels. Plus précisément, deux méthodes génériques d'extraction de motifs sont proposées. La première permet d'extraire des motifs séquentiels incluant des caractéristiques spatiales. Dans la deuxième, nous proposons un nouveau type de motifs appelé "motifs spatio-séquentiels". Ce type de motifs permet d'étudier l'évolution d'un ensemble d'événements décrivant une zone et son entourage proche. Ces deux approches ont été testées sur deux jeux de données associées à des phénomènes spatio-temporels : la pollution des rivières en France et le suivi épidémiologique de la dengue en Nouvelle Calédonie. Par ailleurs, deux mesures de qualité ainsi qu'un prototype de visualisation de motifs sont été également proposés pour accompagner les experts dans la sélection des motifs d'intérêts

Mining climate data for shire level wheat yield predictions in Western Australia

Author: Vagh Yunous
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2013
Field of study

Climate change and the reduction of available agricultural land are two of the most important factors that affect global food production especially in terms of wheat stores. An ever increasing world population places a huge demand on these resources. Consequently, there is a dire need to optimise food production. Estimations of crop yield for the South West agricultural region of Western Australia have usually been based on statistical analyses by the Department of Agriculture and Food in Western Australia. Their estimations involve a system of crop planting recommendations and yield prediction tools based on crop variety trials. However, many crop failures arise from adherence to these crop recommendations by farmers that were contrary to the reported estimations. Consequently, the Department has sought to investigate new avenues for analyses that improve their estimations and recommendations. This thesis explores a new approach in the way analyses are carried out. This is done through the introduction of new methods of analyses such as data mining and online analytical processing in the strategy. Additionally, this research attempts to provide a better understanding of the effects of both gradual variation parameters such as soil type, and continuous variation parameters such as rainfall and temperature, on the wheat yields. The ultimate aim of the research is to enhance the prediction efficiency of wheat yields. The task was formidable due to the complex and dichotomous mixture of gradual and continuous variability data that required successive information transformations. It necessitated the progressive moulding of the data into useful information, practical knowledge and effective industry practices. Ultimately, this new direction is to improve the crop predictions and to thereby reduce crop failures. The research journey involved data exploration, grappling with the complexity of Geographic Information System (GIS), discovering and learning data compatible software tools, and forging an effective processing method through an iterative cycle of action research experimentation. A series of trials was conducted to determine the combined effects of rainfall and temperature variations on wheat crop yields. These experiments specifically related to the South Western Agricultural region of Western Australia. The study focused on wheat producing shires within the study area. The investigations involved a combination of macro and micro analyses techniques for visual data mining and data mining classification techniques, respectively. The research activities revealed that wheat yield was most dependent upon rainfall and temperature. In addition, it showed that rainfall cyclically affected the temperature and soil type due to the moisture retention of crop growing locations. Results from the regression analyses, showed that the statistical prediction of wheat yields from historical data, may be enhanced by data mining techniques including classification. The main contribution to knowledge as a consequence of this research was the provision of an alternate and supplementary method of wheat crop prediction within the study area. Another contribution was the division of the study area into a GIS surface grid of 100 hectare cells upon which the interpolated data was projected. Furthermore, the proposed framework within this thesis offers other researchers, with similarly structured complex data, the benefits of a general processing pathway to enable them to navigate their own investigations through variegated analytical exploration spaces. In addition, it offers insights and suggestions for future directions in other contextual research explorations

Research Online @ ECU

Continuous Nearest Neighbor Search in the Presence of Obstacles

Author: Baihua Zheng
Cho H.-J.
Chun Chen
Estivill-Castro V.
Feng J.
Gang Chen
Henrich A.
Iwerks G. S.
Liu F.
Mouratidis K.
Nutanong S.
Park S. H.
Qing Li
Sellis T.
Sistla P.
Song Z.
Tao Y.
Tung A. K. H.
Tung A. K. H.
Wang X.
Wu W.
Xia C.
Xu H.
Yunjun Gao
Zaiane O. R.
Zhang J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2011
Field of study

Crossref

Institutional Knowledge at Singapore Management University