6 research outputs found

    Using skipgrams and POS-based feature selection for patent classiïŹcation

    Get PDF
    Contains fulltext : 116289.pdf (publisher's version ) (Open Access)19 p

    A framework of automatic subject term assignment for text categorization: An indexing conception-based approach

    Full text link
    The purpose of this study is to examine whether the understandings of subject-indexing processes con-ducted by human indexers have a positive impact on the effectiveness of automatic subject term assign-ment through text categorization (TC). More specifically, human indexers ’ subject-indexing approaches, or con-ceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal arti-cle dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing concep

    Particle Swarm Optimization

    Get PDF
    Particle swarm optimization (PSO) is a population based stochastic optimization technique influenced by the social behavior of bird flocking or fish schooling.PSO shares many similarities with evolutionary computation techniques such as Genetic Algorithms (GA). The system is initialized with a population of random solutions and searches for optima by updating generations. However, unlike GA, PSO has no evolution operators such as crossover and mutation. In PSO, the potential solutions, called particles, fly through the problem space by following the current optimum particles. This book represents the contributions of the top researchers in this field and will serve as a valuable tool for professionals in this interdisciplinary field

    Automatisches Klassifizieren : Verfahren zur Erschliessung elektronischer Dokumente

    Get PDF
    Automatic classification of text documents refers to the computerized allocation of class numbers from existing classification schemes to natural language texts by means of suitable algorithms. Based upon a comprehensive literature review, this thesis establishes an informed and up-to-date view of the applicability of automatic classification for the subject approach to electronic documents, particularly to Web resources. Both methodological aspects and the experiences drawn from relevant projects and applications are covered. Concerning methodology, the present state-of-the-art comprises a number of statistical approaches that rely on machine learning; these methods use pre-classified example documents for establishing a model - the "classifier" - which is then used for classifying new documents. However, the four large-scale projects conducted in the 1990s by the Universities of Lund, Wolverhampton and Oldenburg, and by OCLC (Dublin, OH), still used rather simple and more traditional methodological approaches. These projects are described and analyzed in detail. As they made use of traditional library classifications their results are significant for LIS, even if no permanent quality services have resulted from these endeavours. The analysis of other relevant applications and projects reveals a number of attempts to use automatic classification for document processing in the fields of patent and media documentation. Here, semi-automatic solutions that support human classifiers are preferred, due to the yet unsatisfactory classification results obtained by fully automated systems. Other interesting implementations include Web portals, search engines and (commercial) information services, whereas only little interest has been shown in the automatic classification of books and bibliographic records. In the concluding part of the study the author discusses the most significant applications and projects, and also addresses several problems and issues in the context of automatic classification

    Multi-classification of Patent Applications with Winnow

    No full text
    The Winnow family of learning algorithms can cope well with large numbers of features and is tolerant to variations in document length, which makes it suitable for classifying large collections of large documents, like patent applications
    corecore