12,566 research outputs found
Impact of Clustering Parameters on the Efficiency of the Knowledge Mining Process in Rule-based Knowledge Bases
In this work the subject of the application of clustering as a knowledge
extraction method from real-world data is discussed. The authors analyze
an influence of different clustering parameters on the quality of the created
structure of rules clusters and the efficiency of the knowledge mining process for
rules / rules clusters. The goal of the experiments was to measure the impact of
clustering parameters on the efficiency of the knowledge mining process in rulebased
knowledge bases denoted by the size of the created clusters or the size
of the representatives. Some parameters guarantee to produce shorter/longer
representatives of the created rules clusters as well as smaller/greater clusters
sizes
Enhancing the Efficiency of a Decision Support System through the Clustering of Complex Rule-Based Knowledge Bases and Modification of the Inference Algorithm
Decision support systems founded on rule-based knowledge representation should be equipped with rule management
mechanisms. Effective exploration of new knowledge in every domain of human life requires new algorithms of knowledge
organization and a thorough search of the created data structures. In this work, the author introduces an optimization of both
the knowledge base structure and the inference algorithm. Hence, a new, hierarchically organized knowledge base structure is
proposed as it draws on the cluster analysis method and a new forward-chaining inference algorithm which searches only the
so-called representatives of rule clusters. Making use of the similarity approach, the algorithm tries to discover new facts (new
knowledge) from rules and facts already known. The author defines and analyses four various representative generation
methods for rule clusters. Experimental results contain the analysis of the impact of the proposed methods on the efficiency of a
decision support system with such knowledge representation. In order to do this, four representative generation methods and
various types of clustering parameters (similarity measure, clustering methods, etc.) were examined. As can be seen, the
proposed modification of both the structure of knowledge base and the inference algorithm has yielded satisfactory results
A hierarchical Mamdani-type fuzzy modelling approach with new training data selection and multi-objective optimisation mechanisms: A special application for the prediction of mechanical properties of alloy steels
In this paper, a systematic data-driven fuzzy modelling methodology is proposed, which allows to construct Mamdani fuzzy models considering both accuracy (precision) and transparency (interpretability) of fuzzy systems. The new methodology employs a fast hierarchical clustering algorithm to generate an initial fuzzy model efficiently; a training data selection mechanism is developed to identify appropriate and efficient data as learning samples; a high-performance Particle Swarm Optimisation (PSO) based multi-objective optimisation mechanism is developed to further improve the fuzzy model in terms of both the structure and the parameters; and a new tolerance analysis method is proposed to derive the confidence bands relating to the final elicited models. This proposed modelling approach is evaluated using two benchmark problems and is shown to outperform other modelling approaches. Furthermore, the proposed approach is successfully applied to complex high-dimensional modelling problems for manufacturing of alloy steels, using ārealā industrial data. These problems concern the prediction of the mechanical properties of alloy steels by correlating them with the heat treatment process conditions as well as the weight percentages of the chemical compositions
Finding groups in data: Cluster analysis with ants
Wepresent in this paper a modification of Lumer and Faietaās algorithm for data clustering. This approach
mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically
clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus
on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on
the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine,
and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more
conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant
clustering algorithms have received special attention, especially because they still require much
investigation to improve performance, stability and other key features that would make such algorithms
mature tools for data mining.
As a case study, this paper focus on the behavior of clustering procedures in those new approaches.
The proposed algorithm and its modifications are evaluated in a number of well-known benchmark
datasets. Empirical results clearly show that ant-based clustering algorithms performs well when
compared to another techniques
An initial state of design and development of intelligent knowledge discovery system for stock exchange database
Data mining is a challenging matter in research field for the last few years.Researchers are using different techniques in data mining.This paper discussed the initial state of Design and Development Intelligent Knowledge Discovery System for Stock Exchange (SE) Databases. We divide our problem in two modules.In first module we define Fuzzy Rule Base System to determined vague information in stock exchange databases.After normalizing massive amount of data we will apply our proposed approach, Mining Frequent Patterns with Neural Networks.Future prediction (e.g., political condition, corporation factors, macro economy factors, and psychological factors of investors) perform an important rule in Stock Exchange, so in our prediction model we will be able to predict results more precisely.In second module we will generate clustering algorithm. Generally our clustering algorithm consists of two steps including training and running steps.The training step is conducted for generating the neural network knowledge based on clustering.In running step, neural network knowledge based is used for supporting the Module in order to generate learned complete data, transformed data and interesting clusters that will help to generate interesting rules
Efficient classification using parallel and scalable compressed model and Its application on intrusion detection
In order to achieve high efficiency of classification in intrusion detection,
a compressed model is proposed in this paper which combines horizontal
compression with vertical compression. OneR is utilized as horizontal
com-pression for attribute reduction, and affinity propagation is employed as
vertical compression to select small representative exemplars from large
training data. As to be able to computationally compress the larger volume of
training data with scalability, MapReduce based parallelization approach is
then implemented and evaluated for each step of the model compression process
abovementioned, on which common but efficient classification methods can be
directly used. Experimental application study on two publicly available
datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the
classification using the compressed model proposed can effectively speed up the
detection procedure at up to 184 times, most importantly at the cost of a
minimal accuracy difference with less than 1% on average
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
Outliers in rules - the comparision of LOF, COF and KMEANS algorithms
bases. The subject of outlier mining is very important nowadays. Outliers in rules mean unusual rules which are rare in comparison to others and should be explored further by the domain expert. In the research the authors use the outlier detection methods to find a given (1%, 5%, 10%) number of outliers in rules. Then, they analyze which of seven various quality indices, that they used for all rules and after removing selected outliers, improve the quality of rule clusters. In the experimental stage the authors used six different knowledge bases. The results show that the optimal results were achieved for COF outlier detection algorithm as the one for which, among all analyzed quality indices, the cluster quality improved most frequently
- ā¦