15 research outputs found

    Incorporating Prior Knowledge into Task Decomposition for Large-Scale Patent Classification

    Full text link
    Abstract. With the adoption of min-max-modular support vector machines (SVMs) to solve large-scale patent classification problems, a novel, simple method for incorporating prior knowledge into task decomposition is proposed and investigated. Two kinds of prior knowledge described in patent texts are considered: time information, and hierarchical structure information. Through experiments using the NTCIR-5 Japanese patent database, patents are found to have time-varying features that considerably affect classification. The experimen-tal results demonstrate that applying min-max modular SVMs with the proposed method gives performance superior to that of conventional SVMs in terms of training time, generalization accuracy, and scalability.

    Expansion Finding for Given Acronyms Using Conditional Random Fields

    No full text

    Jefferson : nordisk tidskrift för Blues

    Get PDF
    In languages with high word in ation such as Arabic, stemming improves text retrieval performance by reducing words variants.We propose a change in the corpus-based stemming approach proposed by Xu and Croft for English and Spanish languages in order to stem Arabic words. We generate the conflation classes by clustering 3-gram representations of the words found in only 10% of the data in the first stage. In the second stage, these clusters are refined using different similarity measures and thresholds. We conducted retrieval experiments using row data, Light-10 stemmer and 8 different variations of the similarity measures and thresholds and compared the results. The experiments show that 3-gram stemming using the dice distance for clustering and the EM similarity measure for refinement performs better than using no stemming; but slightly worse than Light-10 stemmer. Our method potentially could outperform Light-10 stemmer if more text is sampled in the first stage
    corecore