1 research outputs found

    Dimensionality Reduction with Multilingual Resource

    No full text
    Query and document representation is a key problem for information retrieval and filtering. The vector space model (VSM) has been widely used in this domain. But the VSM suffers from high dimensionality. The vectors built from documents always have high dimensionality and contain too much noise. In this paper, we present a novel method that reduces the dimensionality using multilingual resource. We introduce a new metric called TC to measure the term consistency constraints. We deduce a TC matrix from the multilingual corpus and then use this matrix together with the termby-document matrix to do the Latent Semantic Indexing (LSI). By adopting different TC threshold, we can truncate the TC matrix into small size and thus lower the computational cost of LSI. The experimental results show that this dimensionality reduction method improves the retrieval performance significantly.
    corecore