11,763 research outputs found

    Automatic multi-label subject indexing in a multilingual environment

    Get PDF
    This paper presents an approach to automatically subject index fulltext documents with multiple labels based on binary support vector machines(SVM). The aim was to test the applicability of SVMs with a real world dataset. We have also explored the feasibility of incorporating multilingual background knowledge, as represented in thesauri or ontologies, into our text document representation for indexing purposes. The test set for our evaluations has been compiled from an extensive document base maintained by the Food and Agriculture Organization (FAO) of the United Nations (UN). Empirical results show that SVMs are a good method for automatic multi- label classification of documents in multiple languages

    A Word Sense-Oriented User Interface for Interactive Multilingual Text Retrieval

    Get PDF
    In this paper we present an interface for supporting a user in an interactive cross-language search process using semantic classes. In order to enable users to access multilingual information, different problems have to be solved: disambiguating and translating the query words, as well as categorizing and presenting the results appropriately. Therefore, we first give a brief introduction to word sense disambiguation, cross-language text retrieval and document categorization and finally describe recent achievements of our research towards an interactive multilingual retrieval system. We focus especially on the problem of browsing and navigation of the different word senses in one source and possibly several target languages. In the last part of the paper, we discuss the developed user interface and its functionalities in more detail

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Evaluating Multilingual Gisting of Web Pages

    Get PDF
    We describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of information in restricted or alternative form.Comment: 7 pages, uses psfig and aaai style

    Multilingual interactive experiments with Flickr

    Get PDF
    This paper presents a proposal for iCLEF 2006, the interactive track of the CLEF cross-language evaluation campaign. In the past, iCLEF has addressed applications such as information retrieval and question answering. However, for 2006 the focus has turned to text-based image retrieval from Flickr. We describe Flickr, the challenges this kind of collection presents to cross-language researchers, and suggest initial iCLEF tasks

    Searching and organizing images across languages

    Get PDF
    With the continual growth of users on the Web from a wide range of countries, supporting such users in their search of cultural heritage collections will grow in importance. In the next few years, the growth areas of Internet users will come from the Indian sub-continent and China. Consequently, if holders of cultural heritage collections wish their content to be viewable by the full range of users coming to the Internet, the range of languages that they need to support will have to grow. This paper will present recent work conducted at the University of Sheffield (and now being implemented in BRICKS) on how to use automatic translation to provide search and organisation facilities for a historical image search engine. The system allows users to search for images in seven different languages, providing means for the user to examine translated image captions and browse retrieved images organised by categories written in their native language

    Classification of Under-Resourced Language Documents Using English Ontology

    Get PDF
    Automatic documents classification is an important task due to the rapid growth of the number of electronic documents, which aims automatically assign the document to a predefined category based on its contents. The use of automatic document classification has been plays an important role in information extraction, summarization, text retrieval, question answering, e-mail spam detection, web page content filtering, automatic message routing , etc.Most existing methods and techniques in the field of document classification are keyword based, but due to lack of semantic consideration of this technique, it incurs low performance. In contrast, documents also be classified by taking their semantics using ontology as a knowledge base for classification; however, it is very challenging of building ontology with under-resourced language. Hence, this approach is only limited to resourced language (i.e. English) support. As a result, under-resourced language written documents are not benefited such ontology based classification approach. This paper describes the design of automatic document classification of under-resourced language written documents. In this work, we propose an approach that performs classification of under-resourced language written documents on top of English ontology. We used a bilingual dictionary with Part of Speech feature for word-by-word text translation to enable the classification of document without any language barrier. The design has a concept-mapping component, which uses lexical and semantic features to map the translated sense along the ontology concepts. Beside this, the design also has a categorization component, which determines a category of a given document based on weight of mapped concept. To evaluate the performance of the proposed approach 20-test documents for Amharic and Tigrinya and 15-test document for Afaan Oromo in each news category used. In order to observe the effect of incorporated features (i.e. lemma based index term selection, pre-processing strategies during concept mapping, lexical and semantics based concept mapping) five experimental techniques conducted. The experimental result indicated that the proposed approach with incorporation of all features and components achieved an average F-measure of 92.37%, 86.07% and 88.12% for Amharic, Afaan Oromo and Tigrinya documents respectively. Keywords: under-resourced language, Multilingual, Documents or text Classification, knowledge base, Ontology based text categorization, multilingual text classification, Ontology. DOI: 10.7176/CEIS/10-6-02 Publication date:July 31st 201
    • …
    corecore