48,651 research outputs found

    Evolving text classification rules with genetic programming

    Get PDF
    We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications

    Lightweight Call-Graph Construction for Multilingual Software Analysis

    Full text link
    Analysis of multilingual codebases is a topic of increasing importance. In prior work, we have proposed the MLSA (MultiLingual Software Analysis) architecture, an approach to the lightweight analysis of multilingual codebases, and have shown how it can be used to address the challenge of constructing a single call graph from multilingual software with mutual calls. This paper addresses the challenge of constructing monolingual call graphs in a lightweight manner (consistent with the objective of MLSA) which nonetheless yields sufficient information for resolving language interoperability calls. A novel approach is proposed which leverages information from a compiler-generated AST to provide the quality of call graph necessary, while the program itself is written using an Island Grammar that parses the AST providing the lightweight aspect necessary. Performance results are presented for a C/C++ implementation of the approach, PAIGE (Parsing AST using Island Grammar Call Graph Emitter) showing that despite its lightweight nature, it outperforms Doxgen, is robust to changes in the (Clang) AST, and is not restricted to C/C++.Comment: 10 page
    corecore