132,499 research outputs found

    Toward Optimal Feature Selection in Naive Bayes for Text Categorization

    Full text link
    Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (MDMD) and MD−χ2MD-\chi^2 methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figure

    Methodological issues in using sequential representations in the teaching of writing

    Get PDF
    This study looks at a specific application of Ainsworth’s conceptual framework for learning with multiple representations in the context of using multiple sequential graphic organizers that are student‐generated for a process‐writing task. Process writing refers to writing that consists of multiple drafts. It may be a process of re‐writing without feedback or re‐writing based on feedback where the teacher or peers will provide feedback on the original draft and then the students will revise their writing based on the feedback given. The objective was to explore how knowledge of students’ cognitive processes when using multiple organizers can inform the teaching of writing. The literature review analyzes the interaction of the design, function and task components of the framework; culminating in instructional approaches for using multiple organizers for classes with students of different writing abilities. Extended implications for designers of concept mapping tools based on these approaches are provided

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Reusable Knowledge-based Components for Building Software Applications: A Knowledge Modelling Approach

    Get PDF
    In computer science, different types of reusable components for building software applications were proposed as a direct consequence of the emergence of new software programming paradigms. The success of these components for building applications depends on factors such as the flexibility in their combination or the facility for their selection in centralised or distributed environments such as internet. In this article, we propose a general type of reusable component, called primitive of representation, inspired by a knowledge-based approach that can promote reusability. The proposal can be understood as a generalisation of existing partial solutions that is applicable to both software and knowledge engineering for the development of hybrid applications that integrate conventional and knowledge based techniques. The article presents the structure and use of the component and describes our recent experience in the development of real-world applications based on this approach

    Advances in knowledge-based software engineering

    Get PDF
    The underlying hypothesis of this work is that a rigorous and comprehensive software reuse methodology can bring about a more effective and efficient utilization of constrained resources in the development of large-scale software systems by both government and industry. It is also believed that correct use of this type of software engineering methodology can significantly contribute to the higher levels of reliability that will be required of future operational systems. An overview and discussion of current research in the development and application of two systems that support a rigorous reuse paradigm are presented: the Knowledge-Based Software Engineering Environment (KBSEE) and the Knowledge Acquisition fo the Preservation of Tradeoffs and Underlying Rationales (KAPTUR) systems. Emphasis is on a presentation of operational scenarios which highlight the major functional capabilities of the two systems
    • 

    corecore