225 research outputs found
Decision Tree Classification of Spatial Data Streams Using Peano Trees of classification
Many organizations have large quantities of spatial data collected in various application areas, including remote sensing, geographical information systems (GIS), astronomy, computer cartography, environmental assessment and planning, etc. These data collections are growing rapidly and can therefore be considered as spatial data streams. For data stream classification, time is a major issue. However, these spatial data sets are too large to be classified effectively in a reasonable amount of time using existing methods. In this paper, we developed a new method for decision tree classification on spatial data streams using a data structure called Peano Count Tree (P-tree). The Peano Count Tree is a spatial data organization that provides a lossless compressed representation of a spatial data set and facilitates efficient classification and other data mining techniques. Using P-tree structure, fast calculation of measurements, such as information gain, can be achieved. We compare P-tree based decision tree induction classification and a classical decision tree induction method with respect to the speed at which the classifier can be built (and rebuilt when substantial amounts of new data arrive). Experimental results show that the P-tree method is significantly faster than existing classification methods, making it the preferred method for mining on spatial data streams
The Peano software---parallel, automaton-based, dynamically adaptive grid traversals
We discuss the design decisions, design alternatives, and rationale behind the third generation of Peano, a framework for dynamically adaptive Cartesian meshes derived from spacetrees. Peano ties the mesh traversal to the mesh storage and supports only one element-wise traversal order resulting from space-filling curves. The user is not free to choose a traversal order herself. The traversal can exploit regular grid subregions and shared memory as well as distributed memory systems with almost no modifications to a serial application code. We formalize the software design by means of two interacting automata—one automaton for the multiscale grid traversal and one for the application-specific algorithmic steps. This yields a callback-based programming paradigm. We further sketch the supported application types and the two data storage schemes realized before we detail high-performance computing aspects and lessons learned. Special emphasis is put on observations regarding the used programming idioms and algorithmic concepts. This transforms our report from a “one way to implement things” code description into a generic discussion and summary of some alternatives, rationale, and design decisions to be made for any tree-based adaptive mesh refinement software
Sistema de soporte de decisión para la gestión de fallos en equipos industriales, basado en métodos de ensamble
Los fallos en equipos industriales representan eventos críticos en el ámbito de cualquier organización. Su clasificación y caracterización representa un factor importante que apoya el proceso de toma de decisiones en las actividades de mantenimiento. La Minería de Datos ha desempeñado un rol significativo en la evaluación y clasificación de los fallos presentados. Los algoritmos basados en redes bayesianas y árboles de decisión han sido utilizados, de manera individual y en conjunto, para la construcción de modelos de clasificación híbridos, con el propósito de la evaluación y caracterización de fallos. Este trabajo propone el desarrollo de modelos híbridos usando los métodos de ensamble Grading y Vote, combinando las técnicas de redes bayesianas (BayesNet y Naive BayesUpdateable) y árboles de decisión (RandomTree). Se determina la precisión de los métodos de ensamble con los distintos algoritmos, mediante experimentos con el mismo set de datos particionado.Sociedad Argentina de Informática e Investigación Operativ
Adaptive Ttwo-phase spatial association rules mining method
Since huge amounts of spatial data can be easily collected from various applications, ranging from remote sensing technology to geographical information system, the extraction and comprehension of spatial knowledge is a more and more important task. Many excellent studies on Remote Sensed Image (RSI) have been conducted for potential relationships of crop yield. However, most of them suffer from the performance problem because their techniques for mining association rules are based on Apriori algorithm. In this paper, two efficient algorithms, two-phase spatial association rules mining and adaptive two-phase spatial association rules mining, are proposed for address the above problem. Both methods primarily conduct two phase algorithms by creating Histogram Generators for fast generating coarse-grained spatial association rules, and further mining the fine-grained spatial association rules w.r.t the coarse-grained frequently patterns obtained in the first phase. Adaptive two-phase spatial association rules mining method conducts the idea of partition on an image for efficiently quantizing out non-frequent patterns and thus facilitate the following two phase process. Such two-phase approaches save much computations and will be shown by lots of experimental results in the paper.Facultad de Informátic
Sistema de soporte de decisión para la gestión de fallos en equipos industriales, basado en métodos de ensamble
Los fallos en equipos industriales representan eventos críticos en el ámbito de cualquier organización. Su clasificación y caracterización representa un factor importante que apoya el proceso de toma de decisiones en las actividades de mantenimiento. La Minería de Datos ha desempeñado un rol significativo en la evaluación y clasificación de los fallos presentados. Los algoritmos basados en redes bayesianas y árboles de decisión han sido utilizados, de manera individual y en conjunto, para la construcción de modelos de clasificación híbridos, con el propósito de la evaluación y caracterización de fallos. Este trabajo propone el desarrollo de modelos híbridos usando los métodos de ensamble Grading y Vote, combinando las técnicas de redes bayesianas (BayesNet y Naive BayesUpdateable) y árboles de decisión (RandomTree). Se determina la precisión de los métodos de ensamble con los distintos algoritmos, mediante experimentos con el mismo set de datos particionado.Sociedad Argentina de Informática e Investigación Operativ
Sistema de soporte de decisión para la gestión de fallos en equipos industriales, basado en métodos de ensamble
Los fallos en equipos industriales representan eventos críticos en el ámbito de cualquier organización. Su clasificación y caracterización representa un factor importante que apoya el proceso de toma de decisiones en las actividades de mantenimiento. La Minería de Datos ha desempeñado un rol significativo en la evaluación y clasificación de los fallos presentados. Los algoritmos basados en redes bayesianas y árboles de decisión han sido utilizados, de manera individual y en conjunto, para la construcción de modelos de clasificación híbridos, con el propósito de la evaluación y caracterización de fallos. Este trabajo propone el desarrollo de modelos híbridos usando los métodos de ensamble Grading y Vote, combinando las técnicas de redes bayesianas (BayesNet y Naive BayesUpdateable) y árboles de decisión (RandomTree). Se determina la precisión de los métodos de ensamble con los distintos algoritmos, mediante experimentos con el mismo set de datos particionado.Sociedad Argentina de Informática e Investigación Operativ
The application of data mining techniques to interrogate Western Australian water catchment data sets
Current environmental challenges such as increasing dry land salinity, waterlogging, eutrophication and high nutrient runoff in south western regions of Western Australia may have both cultural and environmental implications in the near future. Advances in computer science disciplines, more specifically, data mining techniques and geographic information services provide the means to be able to conduct longitudinal climate studies to predict changes in the Water catchment areas of Western Australia.
The research proposes to utilise existing spatial data mining techniques in conjunction of modern open-source geospatial tools to interpret trends in Western Australian water catchment land use. This will be achieved through the development of a innovative data mining interrogation tool that measures and validates the effectiveness of data mining methods on a sample water catchment data set from the Peel Harvey region of WA. In doing so, the current and future statistical evaluation on potential dry land salinity trends can be eluded. The interrogation tool will incorporate different modern geospatial data mining techniques to discover meaningful and useful patterns specific to current agricultural problem domain of dry land salinity.
Large GIS data sets of the water catchments on Peel-Harvey region have been collected by the state government Shared Land Information Platform in conjunction with the LandGate agency. The proposed tool will provide an interface for data analysis of water catchment data sets by benchmarking measures using the chosen data mining techniques, such as: classical statistical methods, cluster analysis and principal component analysis.The outcome of research will be to establish an innovative data mining instrument tool for interrogating salinity issues in water catchment in Western Australia, which provides a user friendly interface for use by government agencies, such as Department of Agriculture and Food of Western Australia researchers and other agricultural industry stakeholders
Adaptive Ttwo-phase spatial association rules mining method
Since huge amounts of spatial data can be easily collected from various applications, ranging from remote sensing technology to geographical information system, the extraction and comprehension of spatial knowledge is a more and more important task. Many excellent studies on Remote Sensed Image (RSI) have been conducted for potential relationships of crop yield. However, most of them suffer from the performance problem because their techniques for mining association rules are based on Apriori algorithm. In this paper, two efficient algorithms, two-phase spatial association rules mining and adaptive two-phase spatial association rules mining, are proposed for address the above problem. Both methods primarily conduct two phase algorithms by creating Histogram Generators for fast generating coarse-grained spatial association rules, and further mining the fine-grained spatial association rules w.r.t the coarse-grained frequently patterns obtained in the first phase. Adaptive two-phase spatial association rules mining method conducts the idea of partition on an image for efficiently quantizing out non-frequent patterns and thus facilitate the following two phase process. Such two-phase approaches save much computations and will be shown by lots of experimental results in the paper.Facultad de Informátic
- …