50 research outputs found

    Evolving rules for document classification

    Get PDF
    We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful to a human analyst they may have a number of other uses beyond classification and provide a basis for text mining applications

    Using IR techniques to improve Automated Text Classification

    Get PDF
    This paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively. The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures)

    (Q)SAR Modelling of Nanomaterial Toxicity - A Critical Review

    Get PDF
    There is an increasing recognition that nanomaterials pose a risk to human health, and that the novel engineered nanomaterials (ENMs) in the nanotechnology industry and their increasing industrial usage poses the most immediate problem for hazard assessment, as many of them remain untested. The large number of materials and their variants (different sizes and coatings for instance) that require testing and ethical pressure towards non-animal testing means that expensive animal bioassay is precluded, and the use of (quantitative) structure activity relationships ((Q)SAR) models as an alternative source of hazard information should be explored. (Q)SAR modelling can be applied to fill the critical knowledge gaps by making the best use of existing data, prioritize physicochemical parameters driving toxicity, and provide practical solutions to the risk assessment problems caused by the diversity of ENMs. This paper covers the core components required for successful application of (Q)SAR technologies to ENMs toxicity prediction, and summarizes the published nano-(Q)SAR studies and outlines the challenges ahead for nano-(Q)SAR modelling. It provides a critical review of (1) the present status of the availability of ENMs characterization/toxicity data, (2) the characterization of nanostructures that meets the need of (Q)SAR analysis, (3) the summary of published nano-(Q)SAR studies and their limitations, (4) the in silico tools for (Q)SAR screening of nanotoxicity and (5) the prospective directions for the development of nano-(Q)SAR models

    An Effective Document Classification System Based on Concept Probability Vector

    No full text

    XML Documents Within a Legal Domain: Standards and Tools for the Italian Legislative Environment

    No full text
    corecore