75,026 research outputs found
Cluster analysis for physical oceanographic data and oceanographic surveys in Turkish seas
Cluster analysis is a useful data mining method to obtain detailed information on the physical state of the ocean. The primary objective of this study is the development of a new spatio-temporal density-based algorithm for clustering physical oceanographic data. This study extends the regular spatial cluster analysis to deal with spatial data at different epochs. It also presents the sensitivity of the new algorithm to different parameter settings. The purpose of the sensitivity analysis presented in this paper is to identify the response of the algorithm to variations in input parameter values and boundary conditions. In order to demonstrate the usage of the new algorithm, this paper presents two oceanographic applications that cluster the sea-surface temperature (SST) and the sea-surface height residual (SSH) data which records the satellite observations of the Turkish Seas. It also evaluates and justifies the clustering results by using a cluster validation technique
Artificial Intelligence in geospatial analysis: applications of self-organizing maps in the context of geographic information science.
A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Geographic Information SystemsThe size and dimensionality of available geospatial repositories increases every day, placing additional pressure on existing analysis tools, as they are expected to extract more knowledge from these databases. Most of these tools were created in a data poor environment and thus rarely address concerns of efficiency, dimensionality and automatic exploration. In addition, traditional statistical techniques present several assumptions that are not realistic in the geospatial data domain. An example of this is the statistical independence between observations required by most classical statistics methods, which conflicts with the well-known spatial dependence that exists in geospatial data.
Artificial intelligence and data mining methods constitute an alternative to explore and extract knowledge from geospatial data, which is less assumption dependent. In this thesis, we study the possible adaptation of existing general-purpose data mining tools to geospatial data analysis. The characteristics of geospatial datasets seems to be similar in many ways with other aspatial datasets for which several data mining tools have been used with success in the detection of patterns and relations. It seems, however that GIS-minded analysis and objectives require more than the results provided by these general tools and adaptations to meet the geographical information scientistâs requirements are needed. Thus, we propose several geospatial applications based on a well-known data mining method, the self-organizing map (SOM), and analyse the adaptations required in each application to fulfil those objectives and needs. Three main fields of GIScience are covered in this thesis: cartographic representation; spatial clustering and knowledge discovery; and location optimization.(...
Spatial Coordinate Trial : Converting Non-Spatial Data Dimension for DBSCAN
In big data, noise in data mining is a necessity. Its existence depends on data and algorithm, but it does not mean the algorithm caused noise. Although the advantages of the Density Based Spatial Clustering Application with Noise, DBSCAN algorithm, in executing spatial data (two-dimensional data) have been widely discussed, but it has not been convincing in executing non-spatial data. As an algorithm should perform well on any data for optimizing data mining, this research proposes a trial to convert dimensions of non-spatial data into 2 dimensions for executing with DBSCAN algorithm, and a different input value for epsilon to know about its minimum which begins arising noise in the execution. Method of analysis in trial is with considering the attributes of non-spatial data as variables that represent coordinate points, rather than cardinality. Technically, it is assumed that 2-dimensional coordinate axes as a spot point for coordinate with more than or equal 3 dimensions according to development of Cartesian coordinate system, by first paying attention to relationship of variables (attributes). This way is then called Spatial Coordinate. The different input values are with paying attention to numbers from non-zero minimum distance to the forth of epsilon where the epsilon is in integer. The results of trial and testing on clusters formed, with Silhouette Coefficient, point out that the clusters are well, strong, and quality enough. Therefore, this research gives a new way on how preprocessing non-spatial data for DBSCAN algorithm performance
An Investigation in Efficient Spatial Patterns Mining
The technical progress in computerized spatial data acquisition and storage results
in the growth of vast spatial databases. Faced with large amounts of increasing spatial
data, a terminal user has more difficulty in understanding them without the helpful
knowledge from spatial databases. Thus, spatial data mining has been brought under
the umbrella of data mining and is attracting more attention.
Spatial data mining presents challenges. Differing from usual data, spatial data includes
not only positional data and attribute data, but also spatial relationships among
spatial events. Further, the instances of spatial events are embedded in a continuous
space and share a variety of spatial relationships, so the mining of spatial patterns demands
new techniques.
In this thesis, several contributions were made. Some new techniques were proposed,
i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree),
maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributesâ
Generalization Sequences), and fuzzy association prediction. Three algorithms
were put forward on co-location patterns mining: the fuzzy co-location mining algorithm,
the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique-
based maximal prevalence co-location mining algorithm (order-clique-based algorithm).
An attribute-oriented induction algorithm based on attributesâ generalization sequences
(AOI-ags algorithm) is further given, which unified the attribute thresholds and
the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association
prediction algorithm is designed. Also a cell-based spatial object fusion algorithm
is proposed. Two fuzzy clustering methods using domain knowledge were proposed:
Natural Method and Graph-Based Method, both of which were controlled by a
threshold. The threshold was confirmed by polynomial regression. Finally, a prototype
system on spatial co-location patternsâ mining was developed, and shows the relative
efficiencies of the co-location techniques proposed
The techniques presented in the thesis focus on improving the feasibility, usefulness,
effectiveness, and scalability of related algorithm. In the design of fuzzy co-location
Abstract
mining algorithm, a new data structure, the binary partition tree, used to improve the
process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to
partition the prevalent event set search space into subsets, where each sub-problem can
be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is
guaranteed since it does not require expensive spatial joins or instance joins for identifying
co-location table instances. In the order-clique-based algorithm, the co-location table
instances do not need be stored after computing the Pi value of corresponding colocation,
which dramatically reduces the executive time and space of mining maximal colocations.
Some technologies, for example, partitions, equivalence partition trees, prune
optimization strategies and interestingness, were used to improve the efficiency of the
AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the âgrowing
windowâ and the proximity computation pruning were introduced to reduce both I/O and
CPU costs in computing the fuzzy semantic proximity between time-series.
For new techniques and algorithms, theoretical analysis and experimental results
on synthetic data sets and real-world datasets were presented and discussed in the thesis
Clustering driverâs destinations - using internal evaluation to adaptively set parameters
With advanced navigation systems becoming ubiquitous in modern cars, the availability of detailed GPS data opens up new research areas in the fields of pattern analysis and data mining. By capturing the end-of-trip GPS points of each trip made by a driver, that driverâs meaningful destinations could be identified. The knowledge of these destinations can be used for route prediction, which in turn can be used for optimizing the motor control to decrease emissions. It can also be used for developing functions for autonomous vehicles. In this thesis a way of extracting these meaningful destinations from GPS data using clustering algorithms has been developed and evaluated. The result is a clustering procedure consisting of 2 steps of clustering. First a pre-clustering to divide the data into subsets corresponding to smaller spatial areas. Then, a refining clustering step for which the parameter of the algorithm is adapted to each subset. Adaptively setting the parameter for each subset is done by testing a set of parameters and evaluating the results internally, with the Silhouette coefficient, and choosing the parameter giving the best evaluation score. The best performing configuration of our procedure, according to our external evaluation method, is in par with the performance of DBSCAN with a supervised choice of parameter setting. Further evaluation of data sets from different areas of the world are needed to draw strong conclusions of the developed procedures performance
ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities
Density-based spatial clustering of applications with noise (DBSCAN) is a
data clustering algorithm which has the high-performance rate for dataset where
clusters have the constant density of data points. One of the significant
attributes of this algorithm is noise cancellation. However, DBSCAN
demonstrates reduced performances for clusters with different densities.
Therefore, in this paper, an adaptive DBSCAN is proposed which can work
significantly well for identifying clusters with varying densities.Comment: To be published in the 4th IEEE International Conference on
Electrical Engineering and Information & Communication Technology (iCEEiCT
2018
Data Management and Mining in Astrophysical Databases
We analyse the issues involved in the management and mining of astrophysical
data. The traditional approach to data management in the astrophysical field is
not able to keep up with the increasing size of the data gathered by modern
detectors. An essential role in the astrophysical research will be assumed by
automatic tools for information extraction from large datasets, i.e. data
mining techniques, such as clustering and classification algorithms. This asks
for an approach to data management based on data warehousing, emphasizing the
efficiency and simplicity of data access; efficiency is obtained using
multidimensional access methods and simplicity is achieved by properly handling
metadata. Clustering and classification techniques, on large datasets, pose
additional requirements: computational and memory scalability with respect to
the data size, interpretability and objectivity of clustering or classification
results. In this study we address some possible solutions.Comment: 10 pages, Late
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
- âŠ