11,763 research outputs found
Automatic multi-label subject indexing in a multilingual environment
This paper presents an approach to automatically subject index fulltext documents with multiple labels based on binary support vector machines(SVM). The aim was to test the applicability of SVMs with a real world dataset. We have also explored the feasibility of incorporating multilingual background knowledge, as represented in thesauri or ontologies, into our text document representation for indexing purposes. The test set for our evaluations has been compiled from an extensive document base maintained by the Food and Agriculture Organization (FAO) of the United Nations (UN). Empirical results show that SVMs are a good method for automatic multi- label classification of documents in multiple languages
A Word Sense-Oriented User Interface for Interactive Multilingual Text Retrieval
In this paper we present an interface for supporting a user in an interactive cross-language search process using semantic classes. In order to enable users to access multilingual information, different problems have to be solved: disambiguating and translating the query words, as well as categorizing and presenting the results appropriately. Therefore, we first give a brief introduction to word sense disambiguation, cross-language text retrieval and document categorization and finally describe recent achievements of our research towards an interactive multilingual retrieval system. We focus especially on the problem of browsing and navigation of the different word senses in one source and possibly several target languages. In the last part of the paper, we discuss the developed user interface and its functionalities in more detail
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Evaluating Multilingual Gisting of Web Pages
We describe a prototype system for multilingual gisting of Web pages, and
present an evaluation methodology based on the notion of gisting as decision
support. This evaluation paradigm is straightforward, rigorous, permits fair
comparison of alternative approaches, and should easily generalize to
evaluation in other situations where the user is faced with decision-making on
the basis of information in restricted or alternative form.Comment: 7 pages, uses psfig and aaai style
Multilingual interactive experiments with Flickr
This paper presents a proposal for iCLEF 2006, the interactive track
of the CLEF cross-language evaluation campaign. In the past, iCLEF has
addressed applications such as information retrieval and question answering. However, for 2006 the focus
has turned to text-based image retrieval from Flickr. We describe
Flickr, the challenges this kind of collection presents to
cross-language researchers, and suggest initial iCLEF tasks
Searching and organizing images across languages
With the continual growth of users on the Web
from a wide range of countries, supporting
such users in their search of cultural heritage
collections will grow in importance. In the
next few years, the growth areas of Internet
users will come from the Indian sub-continent
and China. Consequently, if holders of cultural
heritage collections wish their content to be
viewable by the full range of users coming to
the Internet, the range of languages that they
need to support will have to grow. This paper
will present recent work conducted at the
University of Sheffield (and now being
implemented in BRICKS) on how to use
automatic translation to provide search and
organisation facilities for a historical image
search engine. The system allows users to
search for images in seven different languages,
providing means for the user to examine
translated image captions and browse retrieved
images organised by categories written in their
native language
Classification of Under-Resourced Language Documents Using English Ontology
Automatic documents classification is an important task due to the rapid growth of the number of electronic documents, which aims automatically assign the document to a predefined category based on its contents. The use of automatic document classification has been plays an important role in information extraction, summarization, text retrieval, question answering, e-mail spam detection, web page content filtering, automatic message routing , etc.Most existing methods and techniques in the field of document classification are keyword based, but due to lack of semantic consideration of this technique, it incurs low performance. In contrast, documents also be classified by taking their semantics using ontology as a knowledge base for classification; however, it is very challenging of building ontology with under-resourced language. Hence, this approach is only limited to resourced language (i.e. English) support. As a result, under-resourced language written documents are not benefited such ontology based classification approach. This paper describes the design of automatic document classification of under-resourced language written documents. In this work, we propose an approach that performs classification of under-resourced language written documents on top of English ontology. We used a bilingual dictionary with Part of Speech feature for word-by-word text translation to enable the classification of document without any language barrier. The design has a concept-mapping component, which uses lexical and semantic features to map the translated sense along the ontology concepts. Beside this, the design also has a categorization component, which determines a category of a given document based on weight of mapped concept. To evaluate the performance of the proposed approach 20-test documents for Amharic and Tigrinya and 15-test document for Afaan Oromo in each news category used. In order to observe the effect of incorporated features (i.e. lemma based index term selection, pre-processing strategies during concept mapping, lexical and semantics based concept mapping) five experimental techniques conducted. The experimental result indicated that the proposed approach with incorporation of all features and components achieved an average F-measure of 92.37%, 86.07% and 88.12% for Amharic, Afaan Oromo and Tigrinya documents respectively. Keywords: under-resourced language, Multilingual, Documents or text Classification, knowledge base, Ontology based text categorization, multilingual text classification, Ontology. DOI: 10.7176/CEIS/10-6-02 Publication date:July 31st 201
- …