Search CORE

322 research outputs found

Stemmer for Serbian language

Author: Milošević Nikola
Publication venue
Publication date: 01/03/2012
Field of study

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form; generally a written word form. In this work is presented suffix stripping stemmer for Serbian language, one of the highly inflectional languages.Comment: 16 pages, 8 figures, code include

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

A stemming algorithm for Latvian

Author: Karlis Kreslins (7173986)
Publication venue
Publication date: 01/01/1996
Field of study

The thesis covers construction, application and evaluation of a stemming algorithm for advanced information searching and retrieval in Latvian databases. Its aim is to examine the following two questions: Is it possible to apply for Latvian a suffix removal algorithm originally designed for English? Can stemming in Latvian produce the same or better information retrieval results than manual truncation? In order to achieve these aims, the role and importance of automatic word conflation both for document indexing and information retrieval are characterised. A review of literature, which analyzes and evaluates different types of stemming techniques and retrospective development of stemming algorithms, justifies the necessity to apply this advanced IR method also for Latvian. Comparative analysis of morphological structure both for English and Latvian language determined the selection of Porter's suffix removal algorithm as a basis for the Latvian sternmer. An extensive list of Latvian stopwords including conjunctions, particles and adverbs, was designed and added to the initial sternmer in order to eliminate insignificant words from further processing. A number of specific modifications and changes related to the Latvian language were carried out to the structure and rules of the original stemming algorithm. Analysis of word stemming based on Latvian electronic dictionary and Latvian text fragments confirmed that the suffix removal technique can be successfully applied also to Latvian language. An evaluation study of user search statements revealed that the stemming algorithm to a certain extent can improve effectiveness of information retrieval

Loughborough University Institutional Repository

A Performance Evaluation of Classifiers Employ Language Dependent Tools for Indonesian Text

Author: Arifin Agus Zainal
Hariadi Mochamad
Purnomo Mauridhi Hery
Sumpeno Surya
Publication venue
Publication date: 01/01/2010
Field of study

This paper evaluates the performance of Maximum Entropy (MaxEnt), Support Vector Machine (SVM) and Na¨ıve Bayes (NB) techniques for Indonesian text classification. Performance of MaxEnt and SVM techniques are compared against baseline NB technique. We also investigate the effect of language dependent tools such as Indonesian stemming and stop words removal can have on these techniques for text classification performances. Up to now, there is no experimental report about the effect of Indonesian stemmer on the text classification accuracy. From our experiments, we conclude that maximum entropy performs better than other classifiers in general. Language dependent tools such as stemming and stop words removal have only little effect on the accuracy of text classification. However stemmed approach scored highest average accuracy and due to the dimension reduction of feature vectors used in classification, make this approach is viable step in pre-processing stage

ITS Repository

Viewing morphology as an inference process

Author: Krovetz Robert
Publication venue: Published by Elsevier B.V.
Publication date: 01/01/2000
Field of study

AbstractMorphology is the area of linguistics concerned with the internal structure of words. Information retrieval has generally not paid much attention to word structure, other than to account for some of the variability in word forms via the use of stemmers. We report on our experiments to determine the importance of morphology, and the effect that it has on performance. We found that grouping morphological variants makes a significant improvement in retrieval performance. Improvements are seen by grouping inflectional as well as derivational variants. We also found that performance was enhanced by recognizing lexical phrases. We describe the interaction between morphology and lexical ambiguity, and how resolving that ambiguity will lead to further improvements in performance

CiteSeerX

Elsevier - Publisher Connector

Arabic stemmers and their effectiveness on the information retrieval system

Author: Elkhoury Rania Fawzi
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2004
Field of study

Arabic is a semitic language that has a complex morphology. Therefore, using a stemmer algorithm in an information retrieval system is almost always beneficial; An Arabic stemmer has been implemented and included in the information retrieval system developed at the Information Science Research Institute at the University of Nevada Las Vegas. The Arabic stemmer is written in the Ruby Language and removes affixes then matches the remaining word against patterns of the same length. The retrieval experiment uses the TREC collection which consists of over a million documents. We will test the effectiveness of the Arabic stemmer using recall/precision measurement and compare the result to other stemmers

University of Nevada, Las Vegas Repository

A light weight stemmer for Bengali and its use in spelling checker

Author: Islam Md. Zahurul
Khan Mumit
Uddin Md. Nizam
Publication venue: BRAC University
Publication date
Field of study

Includes bibliographical references (page 6).Stemming is an operation that splits a word into the constituent root part and affix without doing complete morphological analysis. It is used to improve the performance of spelling checkers and information retrieval applications, where morphological analysis would be too computationally expensive. For spelling checkers specifically, using stemming may drastically reduce the dictionary size, often a bottleneck for mobile and embedded devices. This paper presents a computationally inexpensive stemming algorithm for Bengali, which handles suffix removal in a domain independent way. The evaluation of the proposed algorithm in a Bengali spelling checker indicates that it can be effectively used in information retrieval applications in general.Md. Zahurul IslamMd. Nizam UddinMumit Kha

BRAC University Institutional Repository

Analysis of translated query in Quranic Malay and English translation documents with stemmer

Author: Nie
Popovic
Porter
Yunus
Publication venue: 'EDP Sciences'
Publication date: 01/01/2017
Field of study

Crossref

How effective is stemming and decompounding for German text retrieval?

Author: Braschler Martin
Ripplinger Bärbel
Publication venue: Springer
Publication date: 01/01/2004
Field of study

Erworben im Rahmen der Schweizer Nationallizenzen (http://www.nationallizenzen.ch

ZHAW digitalcollection