206,310 research outputs found

    Semantic Association Rule Mining in Text using Domain Ontology

    Get PDF
    Online news websites are now valuable archives for both current and old news regarding various issues, particularly those that relate to the political and historical contexts of a country. These news platforms have become an important medium for all forms of political activities such as branding, campaigns, and communication. Online newspapers make large volume of textual data available, which are rich in political and historical inferences that can be leveraged for national development. In this paper we report a procedure for ontology-based association rule mining for knowledge extraction from text. Ordinarily, association rule mining algorithms have the limitations of generating many non-interesting rules, huge number of discovered rules, and low algorithm performance. This research demonstrates a procedure for improving the performance of association rule mining in text mining by using domain ontology. To do this, a study context of Nigerian politics based on information extracted from a Nigerian online newspaper was selected, and a methodology that combined natural language processing methods, ontology-based keywords extraction, and the modified Generating Association Rules based on Weighting scheme (GARW) was applied. The result obtained from the study revealed that compared to non-ontology based association rule mining approaches, our procedure provides significant rule reduction in the number of generated rules, and produced rules which are more semantically related to the problem context. The study validates the capability of domain ontology to improve the performance of association rule mining algorithms, particularly when dealing with unstructured textual data

    A framework for an Integrated Mining of Heterogeneous data in decision support systems

    Get PDF
    The volume of information available on the Internet and corporate intranets continues to increase along with the corresponding increase in the data (structured and unstructured) stored by many organizations. Over the past years, data mining techniques have been used to explore large volume of data (structured) in order to discover knowledge, often in form of a decision support system. For effective decision making, there is need to discover knowledge from both structured and unstructured data for completeness and comprehensiveness. The aim of this paper is to present a framework to discover this kind of knowledge and to present a report on the work-in-progress on an on going research work. The proposed framework is composed of three basic phases: extraction and integration, data mining and finally the relevance of such a system to the business decision support system. In the first phase, both the structured and unstructured data are combined to form an XML database (combined data warehouse (CDW)). Efficiency is enhanced by clustering of unstructured data (documents) using SOM (Self Organized Maps) clustering algorithm, extracting keyphrases based on training and TF/IDF (Term Frequency/Inverse Document Frequency) by using the KEA (Keyphrases Extraction Algorithm) toolkit. In the second phase, association rule mining technique is applied to discover knowledge from the combined data warehouse. The final phase reflects the changes that such a system will bring about to the marketing decision support system. The paper also describes a developed system which evaluates the association rules mined from structured data that forms the first phase of the research work. The proposed system is expected to improve the quality of decisions, and this will be evaluated by using standard metrics for evaluating the interestingness of association rule which is based on statistical independence and correlation analysis

    Rough-Set-and-Genetic-Algorithm based data mining and Rule Quality Measure to hypothesize distance protective relay operation characteristics from relay event report

    Get PDF
    Protective relay performance analysis is only feasible by first formulating the hypothesis of expected relay operations beforehand. Traditionally, the process involved in discovering the relay operation characteristics is bogged down by the issues of differing knowledge of protection experts, meticulous manual understanding of complex relay event report and the need to have supplementary data from diverse intelligent electronic devices. This paper investigates the implementation of a novel data mining approach of integrated-Rough-Set-and-Genetic-Algorithm based rule discovery and Rule Quality Measure to hypothesize expected relay behavior in the form of an association rule from digital protective relay’s resident event report. Firstly, the data mining approach of the integrated-Rough-Set-and-Genetic-Algorithm is used to discover the relay CD-decision algorithm. Subsequently, the Rule Quality Measure, combined with rule interestingness and importance judgment, deduces the relay CD-decision algorithm to the desired relay CD-association rule. The relay CD-association rule in its singularity form essentially describes the logical pattern of the correlating descriptions of conditions (i.e., attribute set C for various multifunctional protection elements) and the decision class (i.e., attribute D for trip assertion status). Using the area under the ROC curve measurements, the CD-decision algorithm has been verified to be able to predict as well as discriminate future unknown-trip-state relay events in unsupervised learning. This evaluation is necessary to allow the eventual deduction of the single relay CD-association rule to take place. The discovered CD-association rule, and thus the desired hypothesis, has been proven to be an exact manifestation of the relay operation characteristics hidden in the event report

    Application of data mining techniques to protein-protein interaction prediction

    Get PDF
    Protein-protein interactions are key to understanding biological processes and disease mechanisms in organisms. There is a vast amount of data on proteins waiting to be explored. In this paper, we describe application of data mining techniques, namely association rule mining and ID3 classification, to the problem of predicting protein-protein interactions. We have combined available interaction data and protein domain decomposition data to infer new interactions. Preliminary results show that our approach helps us find plausible rules to understand biological processes. © Springer-Verlag Berlin Heidelberg 2003

    Safety Aware Reinforcement Learning by Identifying Comprehensible Constraints in Expert Demonstrations

    Get PDF
    When used in real-world environments, agents must meet high safety requirements as errors have direct consequences. Besides the safety aspect, the explainability of the systems is of particular importance. Therefore, not only should errors be avoided during the learning process, but also the decision process should be made transparent. Existing approaches are limited to solving a single problem. For real-world use, however, several criteria must be fulfilled at the same time. In this paper we derive comprehensible rules from expert demonstrations which can be used to monitor the agent. The developed approach uses state of the art classification and regression trees for deriving safety rules combined with concepts in the field of association rule mining. The result is a compact and comprehensible rule set that explains the expert’s behavior and ensures safety. We evaluate our framework in common OpenAI environments. Results show that the elaborated approach is able to identify safety-relevant rules and imitate expert behavior especially in edge cases. Evaluations on higher dimensional observation spaces and continuous action spaces highlight the transferability of the approach to new tasks while maintaining compactness and comprehensibility of the rule set

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
    corecore