Search CORE

19,332 research outputs found

Implemented Stemming Algorithms for Information Retrieval Applications

Author: Demilie Wubetu Barud
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 30/04/2020
Field of study

Now a day’s text documents are advancing over internet, e-mails and web pages. As the use of internet is exponentially growing, the need of massive data storage is increasing from time to time. Normally many of the documents contain morphological variables, so stemming which is a preprocessing technique gives a mapping of different morphological variants of words into their base word called the stem. Stemming process is used in information retrieval applications accordingly as a way to improve retrieval performance based on the assumption that terms with the same stem usually have similar meaning. To do stemming operation on bulky documents, we require normally more computation time and power, to cope up with the need to search for a particular word in the data. In this paper, various stemming algorithms are analyzed with the benefits and limitation of the recent stemming methods or approaches. Keywords: - Natural Language Processing Applications, Information Retrieval, Information Retrieval Applications (IRAs), Stemming Approaches DOI: 10.7176/IKM/10-3-01 Publication date: April 30th 202

International Institute for Science, Technology and Education (IISTE): E-Journals

Implemented Stemming Algorithms for Information Retrieval Applications

Author: Demilie Wubetu Barud
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 30/04/2020
Field of study

International Institute for Science, Technology and Education (IISTE): E-Journals

Using the Web 1T 5-Gram Database for Attribute Selection in Formal Concept Analysis to Correct Overstemmed Clusters

Author: Hall Guymon
Publication venue: Digital Scholarship@UNLV
Publication date: 01/05/2014
Field of study

Information retrieval is the process of finding information from an unstructured collection of data. The process of information retrieval involves building an index, commonly called an inverted file. As part of the inverted file, information retrieval algorithms often stem words to a common root. Stemming involves reducing a document term to its root. There are many ways to stem a word: affix removal and successor variety are two common categories of stemmers. The Porter Stemming Algorithm is a suffix removal stemmer that operates as a rule-based process on English words. We can think of stemming as a way to cluster related words together according to one common stem. However, sometimes Porter includes words in a cluster that are un-related. This experiment attempts to correct these stemming errors through the use of Formal Concept Analysis (FCA). FCA is the process of formulating formal concepts from a given formal context. A formal context consists of a set of objects, G, a set of attributes, M, and a binary relation I that indicates the attributes possessed by each object. A formal concept is formed by computing the closure of a subset of objects and attributes. Attribute selection is of critical importance in FCA; using the Cranfield document collection, this experiment attempted to view attributes as a function of word-relatedness and crafted a comparison measure between each word in the stemmed cluster using the Google Web 1T 5-gram data set. Using FCA to correct the clusters, the results showed a varying level of success for precision and recall values dependent upon the error threshold allowed

University of Nevada, Las Vegas Repository

Development of a stemmer for the isiXhosa language

Author
Publication venue: Faculty of Science & Agriculture
Publication date: 01/01/2016
Field of study

IsiXhosa language is one of the eleven official languages and the second most widely spoken language in South Africa. However, in terms of computational linguistics, the language did not get attention and natural language related work is almost non-existent. Document retrieval using unstructured queries requires some kind of language processing, and an efficient retrieval of documents can be achieved if we use a technique called stemming. The area that involves document storage and retrieval is called Information Retrieval (IR). Basically, IR systems make use of a Stemmer to index document representations and also terms in users’ queries to retrieve matching documents. In this dissertation, we present the developed Stemmer that can be used in both conditions. The Stemmer is used in IR systems, like Google to retrieve documents written in isiXhosa. In the Eastern Cape Province of South Africa many public schools take isiXhosa as a subject and also a number of Universities in South Africa teach isiXhosa. Therefore, for a language important such as this, it is important to make valuable information that is available online accessible to users through the use of IR systems. In our efforts to develop a Stemmer for the isiXhosa language, an investigation on how others have developed Stemmers for other languages was carried out. From the investigation we came to realize that the Porter stemming algorithm in particular was the main algorithm that many of other Stemmers make use of as a reference. We found that Porter’s algorithm could not be used in its totality in the development of the isiXhosa Stemmer because of the morphological complexity of the language. We developed an affix removal that is embedded with rules that determine which order should be followed in stripping the affixes. The rule is that, the word under consideration is checked against the exceptions, if it’s not in the exceptions list then the stripping continue in the following order; Prefix removal, Suffix removal and finally save the result as stem. The Stemmer was successfully developed and was tested and evaluated in a sample data that was randomly collected from the isiXhosa text books and isiXhosa dictionary. From the results obtained we concluded that the Stemmer can be used in IR systems as it showed 91 percent accuracy. The errors were 9 percent and therefore these results are within the accepted range and therefore the Stemmer can be used to help in retrieval of isiXhosa documents. This is only a noun Stemmer and in the future it can be extended to also stem verbs as well. The Stemmer can also be used in the development of spell-checkers of isiXhosa

South East Academic Libraries System (SEALS)

Introduction to \u3cem\u3eRecent Developments in Economic Methodology\u3c/em\u3e

Author: Davis John B.
Publication venue: e-Publications@Marquette
Publication date: 01/01/2006
Field of study

epublications@Marquette

Recent Developments in Cultural Heritage Image Databases: Directions for User-Centered Design

Author: Stephenson Christie
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/1999
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Bilingually motivated segmentation and generation of word translations using relatively small translation data sets

Author: Gomes Luis
Lopes Jose Gabriel P.
Mahesh Kavitha Karimbi
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

Ending Poverty in Our Generation: Save the Children's Vision for a Post-2015 Framework

Author
Publication venue: Save the Children
Publication date: 01/01/2013
Field of study

The Millennium development Goals -- one of the most resonant and unifying agreements in political history -- reach a turning point in 2015, the deadline for their realisation. We must do everything in our power to achieve them. But we also need to find an agreed way forward on work that will remain to be accomplished. This report sets out save the Children's vision for a new development framework -- consisting of ten goals, plus targets and indicators -- that will support the creation of a world where all people everywhere realise their human rights within a generation.Recognising that the global consultation is ongoing, and many voices are still to be heard, we do not present this as a final position. Rather, it as an indicator of our priorities and -- we hope -- a contribution to the process of crystallising the eventual solution

IssueLab