60,876 research outputs found

    An Investigation in Efficient Spatial Patterns Mining

    Get PDF
    The technical progress in computerized spatial data acquisition and storage results in the growth of vast spatial databases. Faced with large amounts of increasing spatial data, a terminal user has more difficulty in understanding them without the helpful knowledge from spatial databases. Thus, spatial data mining has been brought under the umbrella of data mining and is attracting more attention. Spatial data mining presents challenges. Differing from usual data, spatial data includes not only positional data and attribute data, but also spatial relationships among spatial events. Further, the instances of spatial events are embedded in a continuous space and share a variety of spatial relationships, so the mining of spatial patterns demands new techniques. In this thesis, several contributions were made. Some new techniques were proposed, i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree), maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributes’ Generalization Sequences), and fuzzy association prediction. Three algorithms were put forward on co-location patterns mining: the fuzzy co-location mining algorithm, the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique- based maximal prevalence co-location mining algorithm (order-clique-based algorithm). An attribute-oriented induction algorithm based on attributes’ generalization sequences (AOI-ags algorithm) is further given, which unified the attribute thresholds and the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association prediction algorithm is designed. Also a cell-based spatial object fusion algorithm is proposed. Two fuzzy clustering methods using domain knowledge were proposed: Natural Method and Graph-Based Method, both of which were controlled by a threshold. The threshold was confirmed by polynomial regression. Finally, a prototype system on spatial co-location patterns’ mining was developed, and shows the relative efficiencies of the co-location techniques proposed The techniques presented in the thesis focus on improving the feasibility, usefulness, effectiveness, and scalability of related algorithm. In the design of fuzzy co-location Abstract mining algorithm, a new data structure, the binary partition tree, used to improve the process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to partition the prevalent event set search space into subsets, where each sub-problem can be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is guaranteed since it does not require expensive spatial joins or instance joins for identifying co-location table instances. In the order-clique-based algorithm, the co-location table instances do not need be stored after computing the Pi value of corresponding colocation, which dramatically reduces the executive time and space of mining maximal colocations. Some technologies, for example, partitions, equivalence partition trees, prune optimization strategies and interestingness, were used to improve the efficiency of the AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the “growing window” and the proximity computation pruning were introduced to reduce both I/O and CPU costs in computing the fuzzy semantic proximity between time-series. For new techniques and algorithms, theoretical analysis and experimental results on synthetic data sets and real-world datasets were presented and discussed in the thesis

    Time-Series Data Mining:A Review

    Get PDF
    Data mining refers to the extraction of knowledge by analyzing the data from different perspectives and accumulates them to form useful information which could help the decision makers to take appropriate decisions. Classification and clustering has been the two broad areas in data mining. As the classification is a supervised learning approach, the clustering is an unsupervised learning approach and hence can be performed without the supervision of the domain experts. The basic concept is to group the objects in such a way so that the similar objects are closer to each. Time series data is observation of the data over a period of time. The estimation of the parameter, outlier detection and transformation of the data are some ofthe basic issues in handling the time series data. An approach is given for clustering the data based on the membership values assigned to each data point compressing the effect of outlier or noise present in the data. The Possibilistic Fuzzy C-Means (PFCM) with Error Prediction (EP) are done for the clustering and noise identification in the time-series data

    Fuzzy Decision Support Applied to Machine Maintenance

    Get PDF
    This research work focuses on the optimal algorithms of decision making and forecasting respectively, in order to achieve a better prediction. Decision making techniques and forecasting methods are investigated due to the poor accuracy of forecasting in comparison with real world data. The uncertainty of real world data leads to the use of type-1 fuzzy sets, type-2 fuzzy sets, fuzzy decision tree and fuzzy time-series for fuzzy data-mining - to which they are applied for the look-ahead based interval-valued fuzzy decision tree with optimal perimeter of the neighbourhood (LAIVFDT-OPN) model, and high-order type-2 fuzzy time series (HO-T2FTS) model. In the experiment with a real world business, a ‘computerised maintenance integration management system’ (CMIMS) is constructed as a simulation model for a case study. The CMIMS model consists of the LAIVFDT-OPN and HO-T2FTS models. It is also applied to a set of real world data from a factory in Thailand. Due to the significant uncertainty involved in machine maintenance, most tasks in machine diagnosis are still carried out manually by technicians. In this research, a prototype of CMIMS employing fuzzy data mining to diagnose machine maintenance is constructed. Considering the special features of machine maintenance data, fuzzy decision trees and fuzzy time series are adopted in the proposal method. To represent the uncertain fuzzy memberships, interval-valued fuzzy decision trees are proposed and an optimal neighbourhood perimeter is defined for look-ahead fuzzy decision trees. Based on the existing first-order type-2 time-series and high-order type-1 fuzzy time series, an improved high-order type-2 fuzzy time series method is put forward. In this case study, the CMIMS model can be used to analyse and evaluate uncertain data. It also can be employed to facilitate decision making in machine equipment status, and forecast machine maintenance plan in the future in stead of technicians. Our results demonstrated that the proposal method is effective in fuzzy decision support for machine maintenance

    Fuzzy support vector machine for classification of time series data: A simulation study

    Get PDF
    Support vector machine (SVM) has become one of most developed methods for classification, focusing on cross-sectional analysis. However, classification of time series data is an important issue in statistics and data mining. Classification of time series data using SVMs that focus on cross-sectional data leads to improper classification, and hence, the SVM needs to be extended for handling time series dataset. As with cross-section data, the problem of imbalanced data is also common in time series data. Fuzzy method has been proven to be capable of overcoming the case of imbalanced data. In this paper, we developed a Fuzzy Support Vector Machine (FSVM) model to classify time series data with imbalanced class. The proposed method puts the fuzzy membership function on the constraint function. Through simulation studies, this research aims to assess the performance of the developed FSVM in classifying time series data. Based on the classification accuracy criteria, we prove that the proposed FSVM method outperforms the standard SVM method for the classification of multiclass time series data

    An investigation in efficient spatial patterns mining

    Get PDF
    The technical progress in computerized spatial data acquisition and storage results in the growth of vast spatial databases. Faced with large amounts of increasing spatial data, a terminal user has more difficulty in understanding them without the helpful knowledge from spatial databases. Thus, spatial data mining has been brought under the umbrella of data mining and is attracting more attention. Spatial data mining presents challenges. Differing from usual data, spatial data includes not only positional data and attribute data, but also spatial relationships among spatial events. Further, the instances of spatial events are embedded in a continuous space and share a variety of spatial relationships, so the mining of spatial patterns demands new techniques. In this thesis, several contributions were made. Some new techniques were proposed, i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree), maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributes’ Generalization Sequences), and fuzzy association prediction. Three algorithms were put forward on co-location patterns mining: the fuzzy co-location mining algorithm, the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique- based maximal prevalence co-location mining algorithm (order-clique-based algorithm). An attribute-oriented induction algorithm based on attributes’ generalization sequences (AOI-ags algorithm) is further given, which unified the attribute thresholds and the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association prediction algorithm is designed. Also a cell-based spatial object fusion algorithm is proposed. Two fuzzy clustering methods using domain knowledge were proposed: Natural Method and Graph-Based Method, both of which were controlled by a threshold. The threshold was confirmed by polynomial regression. Finally, a prototype system on spatial co-location patterns’ mining was developed, and shows the relative efficiencies of the co-location techniques proposed The techniques presented in the thesis focus on improving the feasibility, usefulness, effectiveness, and scalability of related algorithm. In the design of fuzzy co-location Abstract mining algorithm, a new data structure, the binary partition tree, used to improve the process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to partition the prevalent event set search space into subsets, where each sub-problem can be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is guaranteed since it does not require expensive spatial joins or instance joins for identifying co-location table instances. In the order-clique-based algorithm, the co-location table instances do not need be stored after computing the Pi value of corresponding colocation, which dramatically reduces the executive time and space of mining maximal colocations. Some technologies, for example, partitions, equivalence partition trees, prune optimization strategies and interestingness, were used to improve the efficiency of the AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the “growing window” and the proximity computation pruning were introduced to reduce both I/O and CPU costs in computing the fuzzy semantic proximity between time-series. For new techniques and algorithms, theoretical analysis and experimental results on synthetic data sets and real-world datasets were presented and discussed in the thesis.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    An investigation in efficient spatial patterns mining

    Get PDF
    The technical progress in computerized spatial data acquisition and storage results in the growth of vast spatial databases. Faced with large amounts of increasing spatial data, a terminal user has more difficulty in understanding them without the helpful knowledge from spatial databases. Thus, spatial data mining has been brought under the umbrella of data mining and is attracting more attention. Spatial data mining presents challenges. Differing from usual data, spatial data includes not only positional data and attribute data, but also spatial relationships among spatial events. Further, the instances of spatial events are embedded in a continuous space and share a variety of spatial relationships, so the mining of spatial patterns demands new techniques. In this thesis, several contributions were made. Some new techniques were proposed, i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree), maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributes’ Generalization Sequences), and fuzzy association prediction. Three algorithms were put forward on co-location patterns mining: the fuzzy co-location mining algorithm, the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique- based maximal prevalence co-location mining algorithm (order-clique-based algorithm). An attribute-oriented induction algorithm based on attributes’ generalization sequences (AOI-ags algorithm) is further given, which unified the attribute thresholds and the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association prediction algorithm is designed. Also a cell-based spatial object fusion algorithm is proposed. Two fuzzy clustering methods using domain knowledge were proposed: Natural Method and Graph-Based Method, both of which were controlled by a threshold. The threshold was confirmed by polynomial regression. Finally, a prototype system on spatial co-location patterns’ mining was developed, and shows the relative efficiencies of the co-location techniques proposed The techniques presented in the thesis focus on improving the feasibility, usefulness, effectiveness, and scalability of related algorithm. In the design of fuzzy co-location Abstract mining algorithm, a new data structure, the binary partition tree, used to improve the process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to partition the prevalent event set search space into subsets, where each sub-problem can be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is guaranteed since it does not require expensive spatial joins or instance joins for identifying co-location table instances. In the order-clique-based algorithm, the co-location table instances do not need be stored after computing the Pi value of corresponding colocation, which dramatically reduces the executive time and space of mining maximal colocations. Some technologies, for example, partitions, equivalence partition trees, prune optimization strategies and interestingness, were used to improve the efficiency of the AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the “growing window” and the proximity computation pruning were introduced to reduce both I/O and CPU costs in computing the fuzzy semantic proximity between time-series. For new techniques and algorithms, theoretical analysis and experimental results on synthetic data sets and real-world datasets were presented and discussed in the thesis.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Fuzzy clustering of time series data: A particle swarm optimization approach

    Get PDF
    With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because of different applications, the problem of clustering the time series data has become highly popular and many algorithms have been proposed in this field. Recently Swarm Intelligence (SI) as a family of nature inspired algorithms has gained huge popularity in the field of pattern recognition and clustering. In this paper, a technique for clustering time series data using a particle swarm optimization (PSO) approach has been proposed, and Pearson Correlation Coefficient as one of the most commonly-used distance measures for time series is considered. The proposed technique is able to find (near) optimal cluster centers during the clustering process. To reduce the dimensionality of the search space and improve the performance of the proposed method, a singular value decomposition (SVD) representation of cluster centers is considered. Experimental results over three popular data sets indicate the superiority of the proposed technique in comparing with fuzzy C-means and fuzzy K-medoids clustering techniques

    Forest fire prediction using fuzzy prototypical knowledge discovery

    Get PDF
    An application of Zadeh’s prototype theory in the Knowledge Acquisition process, is presented here, and as a practical example, to define a method for predicting the evolution of the forest fire occurrence-danger rate in INCEND-IA: A KBS for prediction and decision support in fighting against forest fires. This method then allows us to interpret any real cyclical situation using a previously discovered paradigm and define the current period. The FPKD (Fuzzy Prototypical Knowledge Discovery) is presented as a mechanism with the aim of generating Prototypes of Data (A new set of data sufficiently representative to be able to summarize or assimilate the behavior of any of the remaining data); but the concept of prototype is a fuzzy concept and Zadeh’s Theory provides an appropriate framework for its application. Data Mining techniques have been used (decision trees, time series, clustering...). Thus, it is possible to calculate the grade of compatibility of a real situation with the prototypes and define the current period using these affinity values, with the objective of predicting the evolution of the following daysI Workshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI

    Analysis of FMRI Exams Through Unsupervised Learning and Evaluation Index

    Get PDF
    In the last few years, the clustering of time series has seen significant growth and has proven effective in providing useful information in various domains of use. This growing interest in time series clustering is the result of the effort made by the scientific community in the context of time data mining. For these reasons, the first phase of the thesis focused on the study of the data obtained from fMRI exams carried out in task-based and resting state mode, using and comparing different clustering algorithms: SelfOrganizing map (SOM), the Growing Neural Gas (GNG) and Neural Gas (NG) which are crisp-type algorithms, a fuzzy algorithm, the Fuzzy C algorithm, was also used (FCM). The evaluation of the results obtained by using clustering algorithms was carried out using the Davies Bouldin evaluation index (DBI or DB index). Clustering evaluation is the second topic of this thesis. To evaluate the validity of the clustering, there are specific techniques, but none of these is already consolidated for the study of fMRI exams. Furthermore, the evaluation of evaluation techniques is still an open research field. Eight clustering validation indexes (CVIs) applied to fMRI data clustering will be analysed. The validation indices that have been used are Pakhira Bandyopadhyay Maulik Index (crisp and fuzzy), Fukuyama Sugeno Index, Rezaee Lelieveldt Reider Index, Wang Sun Jiang Index, Xie Beni Index, Davies Bouldin Index, Soft Davies Bouldin Index. Furthermore, an evaluation of the evaluation indices will be carried out, which will take into account the sub-optimal performance obtained by the indices, through the introduction of new metrics. Finally, a new methodology for the evaluation of CVIs will be introduced, which will use an ANFIS model
    • 

    corecore