48,651 research outputs found
Evolving text classification rules with genetic programming
We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications
Recommended from our members
GPERF : a perfect hash function generator
gperf is a widely available perfect hash function generator written in C++. It automates a common system software operation: keyword recognition. gperf translates an n element user-specified keyword list keyfile into source code containing a k element lookup table and a pair of functions, phash and in_word_set. phash uniquely maps keywords in keyfile onto the range 0 .. k - 1, where k >/= n. If k = n, then phash is considered a minimal perfect hash function. in_word_set uses phash to determine whether a particular string of characters str occurs in the keyfile, using at most one string comparison.This paper describes the user-interface, options, features, algorithm design and implementation strategies incorporated in gperf. It also presents the results from an empirical comparison between gperf-generated recognizers and other popular techniques for reserved word lookup
Lightweight Call-Graph Construction for Multilingual Software Analysis
Analysis of multilingual codebases is a topic of increasing importance. In
prior work, we have proposed the MLSA (MultiLingual Software Analysis)
architecture, an approach to the lightweight analysis of multilingual
codebases, and have shown how it can be used to address the challenge of
constructing a single call graph from multilingual software with mutual calls.
This paper addresses the challenge of constructing monolingual call graphs in a
lightweight manner (consistent with the objective of MLSA) which nonetheless
yields sufficient information for resolving language interoperability calls. A
novel approach is proposed which leverages information from a
compiler-generated AST to provide the quality of call graph necessary, while
the program itself is written using an Island Grammar that parses the AST
providing the lightweight aspect necessary. Performance results are presented
for a C/C++ implementation of the approach, PAIGE (Parsing AST using Island
Grammar Call Graph Emitter) showing that despite its lightweight nature, it
outperforms Doxgen, is robust to changes in the (Clang) AST, and is not
restricted to C/C++.Comment: 10 page
- …