346 research outputs found

    A multistrategy approach for digital text

    Get PDF
    The goal of the research described here is to develop a multistrategy classifier system that can be used for document categorization. The system automatically discovers classification patterns by applying several empirical learning methods to different representations for preclassified documents. The learners work in a parallel manner, where each learner carries out its own feature selection based on evolutionary techniques and then obtains a classification model. In classifying documents, the system combines the predictions of the learners by applying evolutionary techniques as well. The system relies on a modular, flexible architecture that makes no assumptions about the design of learners or the number of learners available and guarantees the independence of the thematic domain

    A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    Full text link
    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality, scalability, and issues which are related to diversity and inconsistency in content of different web pages. Due to the wide range of domains and the dynamic environments that the Semantic Annotation systems must be performed on, the problem of automating annotation process is one of the significant challenges in this domain. To overcome this problem, different machine learning approaches such as supervised learning, unsupervised learning and more recent ones like, semi-supervised learning and active learning have been utilized. In this paper we present an inclusive layered classification of Semantic Annotation challenges and discuss the most important issues in this field. Also, we review and analyze machine learning applications for solving semantic annotation problems. For this goal, the article tries to closely study and categorize related researches for better understanding and to reach a framework that can map machine learning techniques into the Semantic Annotation challenges and requirements

    Meta learning of bounds on the Bayes classifier error

    Full text link
    Meta learning uses information from base learners (e.g. classifiers or estimators) as well as information about the learning problem to improve upon the performance of a single base learner. For example, the Bayes error rate of a given feature space, if known, can be used to aid in choosing a classifier, as well as in feature selection and model selection for the base classifiers and the meta classifier. Recent work in the field of f-divergence functional estimation has led to the development of simple and rapidly converging estimators that can be used to estimate various bounds on the Bayes error. We estimate multiple bounds on the Bayes error using an estimator that applies meta learning to slowly converging plug-in estimators to obtain the parametric convergence rate. We compare the estimated bounds empirically on simulated data and then estimate the tighter bounds on features extracted from an image patch analysis of sunspot continuum and magnetogram images.Comment: 6 pages, 3 figures, to appear in proceedings of 2015 IEEE Signal Processing and SP Education Worksho

    A semantic partition based text mining model for document classification.

    Get PDF

    Automated Construction of Relational Attributes ACORA: A Progress Report

    Get PDF
    Data mining research has not only development a large number of algorithms, but also enhanced our knowledge and understanding of their applicability and performance. However, the application of data mining technology in business environments is still no very common, despite the fact that organizations have access to large amounts of data and make decisions that could profit from data mining on a daily basis. One of the reasons is the mismatch between data representation for data storage and data analysis. Data are most commonly stored in multi-table relational databases whereas data mining methods require that the data be represented as a simple feature vector. This work presents a general framework for feature construction from multiple relational tables for data mining applications. The second part describes our prototype implementation ACORA (Automated Construction of Relational Features).Information Systems Working Papers Serie

    Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach

    Full text link
    Theory revision integrates inductive learning and background knowledge by combining training examples with a coarse domain theory to produce a more accurate theory. There are two challenges that theory revision and other theory-guided systems face. First, a representation language appropriate for the initial theory may be inappropriate for an improved theory. While the original representation may concisely express the initial theory, a more accurate theory forced to use that same representation may be bulky, cumbersome, and difficult to reach. Second, a theory structure suitable for a coarse domain theory may be insufficient for a fine-tuned theory. Systems that produce only small, local changes to a theory have limited value for accomplishing complex structural alterations that may be required. Consequently, advanced theory-guided learning systems require flexible representation and flexible structure. An analysis of various theory revision systems and theory-guided learning systems reveals specific strengths and weaknesses in terms of these two desired properties. Designed to capture the underlying qualities of each system, a new system uses theory-guided constructive induction. Experiments in three domains show improvement over previous theory-guided systems. This leads to a study of the behavior, limitations, and potential of theory-guided constructive induction.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl
    • …
    corecore