38,457 research outputs found
Text Classification Using Association Rules, Dependency Pruning and Hyperonymization
We present new methods for pruning and enhancing item- sets for text
classification via association rule mining. Pruning methods are based on
dependency syntax and enhancing methods are based on replacing words by their
hyperonyms of various orders. We discuss the impact of these methods, compared
to pruning based on tfidf rank of words.Comment: 16 pages, 2 figures, presented at DMNLP 201
Recommended from our members
Proceedings of QG2010: The Third Workshop on Question Generation
These are the peer-reviewed proceedings of "QG2010, The Third Workshop on Question Generation". The workshop included a special track for "QGSTEC2010: The First Question Generation Shared Task and Evaluation Challenge".
QG2010 was held as part of The Tenth International Conference on Intelligent Tutoring Systems (ITS2010)
A Survey of Cellular Automata: Types, Dynamics, Non-uniformity and Applications
Cellular automata (CAs) are dynamical systems which exhibit complex global
behavior from simple local interaction and computation. Since the inception of
cellular automaton (CA) by von Neumann in 1950s, it has attracted the attention
of several researchers over various backgrounds and fields for modelling
different physical, natural as well as real-life phenomena. Classically, CAs
are uniform. However, non-uniformity has also been introduced in update
pattern, lattice structure, neighborhood dependency and local rule. In this
survey, we tour to the various types of CAs introduced till date, the different
characterization tools, the global behaviors of CAs, like universality,
reversibility, dynamics etc. Special attention is given to non-uniformity in
CAs and especially to non-uniform elementary CAs, which have been very useful
in solving several real-life problems.Comment: 43 pages; Under review in Natural Computin
Indeterministic Handling of Uncertain Decisions in Duplicate Detection
In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way
- âŠ