38,457 research outputs found

    Text Classification Using Association Rules, Dependency Pruning and Hyperonymization

    Full text link
    We present new methods for pruning and enhancing item- sets for text classification via association rule mining. Pruning methods are based on dependency syntax and enhancing methods are based on replacing words by their hyperonyms of various orders. We discuss the impact of these methods, compared to pruning based on tfidf rank of words.Comment: 16 pages, 2 figures, presented at DMNLP 201

    A Survey of Cellular Automata: Types, Dynamics, Non-uniformity and Applications

    Full text link
    Cellular automata (CAs) are dynamical systems which exhibit complex global behavior from simple local interaction and computation. Since the inception of cellular automaton (CA) by von Neumann in 1950s, it has attracted the attention of several researchers over various backgrounds and fields for modelling different physical, natural as well as real-life phenomena. Classically, CAs are uniform. However, non-uniformity has also been introduced in update pattern, lattice structure, neighborhood dependency and local rule. In this survey, we tour to the various types of CAs introduced till date, the different characterization tools, the global behaviors of CAs, like universality, reversibility, dynamics etc. Special attention is given to non-uniformity in CAs and especially to non-uniform elementary CAs, which have been very useful in solving several real-life problems.Comment: 43 pages; Under review in Natural Computin

    Indeterministic Handling of Uncertain Decisions in Duplicate Detection

    Get PDF
    In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way
    • 

    corecore