Search CORE

6,481 research outputs found

Acronyms as an integral part of multi–word term recognition - A token of appreciation

Author: Spasic Irena
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Term conflation is the process of linking together different variants of the same term. In automatic term recognition approaches, all term variants should be aggregated into a single normalized term representative, which is associated with a single domain–specific concept as a latent variable. In a previous study, we described FlexiTerm, an unsupervised method for recognition of multi–word terms from a domain–specific corpus. It uses a range of methods to normalize three types of term variation – orthographic, morphological and syntactic variation. Acronyms, which represent a highly productive type of term variation, were not supported. In this study, we describe how the functionality of FlexiTerm has been extended to recognize acronyms and incorporate them into the term conflation process. The main contribution of this study is not acronym recognition per se, but rather its integration with other types of term variation into the term conflation process. We evaluated the effects of term conflation in the context of information retrieval as one of its most prominent applications. On average, relative recall increased by 32 percent points, whereas index compression factor increased by 7 percent points. Therefore, evidence suggests that integration of acronyms provides non–trivial improvement of term conflation

Crossref

Online Research @ Cardiff

DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 01/02/2010
Field of study

For the first participation of Dublin City University (DCU) in the FIRE 2010 evaluation campaign, information retrieval (IR) experiments on English, Bengali, Hindi, and Marathi documents were performed to investigate term conation (different stemming approaches and indexing word prefixes), blind relevance feedback, and manual and automatic query translation. The experiments are based on BM25 and on language modeling (LM) for IR. Results show that term conation always improves mean average precision (MAP) compared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi, the corpus-based stemmer achieves a higher MAP. For Bengali, the LM retrieval model achieves a much higher MAP than BM25 (0.4944 vs. 0.4526). In all experiments using BM25, blind relevance feedback yields considerably higher MAP in comparison to experiments without it. Bilingual IR experiments (English!Bengali and English!Hindi) are based on query translations obtained from native speakers and the Google translate web service. For the automatically translated queries, MAP is slightly (but not significantly) lower compared to experiments with manual query translations. The bilingual English!Bengali (English!Hindi) experiments achieve 81.7%-83.3% (78.0%-80.6%) of the best corresponding monolingual experiments

Irish Universities

DCU Online Research Access Service

Lovins Stemmer dan Porter Stemmer Algorithm pada Text Retrieval dalam Bahasa Inggris

Author: Wulandari Maulidya
Publication venue: Universitas Telkom
Publication date: 01/01/2010
Field of study

ABSTRAKSI: Sistem information retrieval dikatakan ideal jika sistem dapat menemukan seluruh dokumen yang relevan saja tanpa dokumen yang tidak relevan, namun adanya varian morfologik yang bermacam-macam seperti accelerate, accelerated, acceleration tidak dianggap satu morfologi yaitu accelerate, sehingga teknik stemming perlu diterapkan pada sistem information retrieval untuk mengubah varian morfologik dari term-term menjadi sebuah bentuk kata sehingga dianggap ekuivalen oleh sistem selain itu juga mampu mengurangi ukuran index file retrieval dan untuk meningkatkan keakuratan retrieval (efektifness). Pada tugas akhir ini penulis engimplementasikan lovins stemmer dan Porter stemmer serta melakukan modifikasi dengan menggabungkan Lovin-Porter stemmer, saat stem hasil dari algoritma porter dibandingkan dengan term awal hasilnya sama maka dilakukan proses stemming menggunakan algoritma lovin (dapat diistilahkan penggabungan secara seri). Pada tugas ini dilakukan pula analisis pengaruh penerapan stemming. Hasil penelitian menunjukkan bahwa dengan stemming sistem mampu mereduksi term yang dihasilkan sehingga mampu mengurangi ukuran index file walau akurasi term yang dihasilkan tidak 100% menemukan bentuk root dari morfologi setiap kata. Dari sudut pandang performansi sistem bisa dikatakan penerapan algoritma modifikasi Lovin-Porter lebih baik dibanding algoritma Porter dan Lovin dimana nilai precision yang dihasilkan rata-rata menunjukkan peningkatan.Kata Kunci : Stemming, Information Retrieval, Precision, Recall, Algoritma Lovin, Algoritma Porter, Modifikasi Algoritma Lovin-PorterABSTRACT: Information Retrieval system ideal if the system can find all relevant document without retrieved non relavant. Variation morphology in English word can be likes accelerate, accelerated, acceleration. The fact they are one conflation of accelerate, this is the stemming needed on the information retrieval system. For changed variation morphology from terms to be a root word and readed as one class conflation with the system and to increasing efectiveness, we the author with affix removal stemmer algorithm lovin,porter and the modification of lovin and porter (Lovpot), from the testing with measure of accuracy. In this paper, writer implements Lovin’s stemmer and Porter’s stemmer also doing modification by combining Lovin-Porter stemmer, if stem result from porter algorithm is compared with earlier term. That’s had a conclusion that’s accuracy term stemmed its not 100% found the morfology of root word. And with performance and strength measure lovpot algorithm performance is better than porter and lovin cause the precision average of lovpot show increasing.Keyword: Stemming, Information Retrieval, Precision, Recall, Algoritma Lovin, Algoritma Porter, Modifikasi Algoritma Lovin-Porte

Open Library

Experiences in Automatic Keywording of Particle Physics Literature

Author: Dallman David
Montejo Ráez Arturo
Publication venue: Union of Concerned Scientists
Publication date: 01/01/2001
Field of study

Attributing keywords can assist in the classification and retrieval of documents in the particle physics literature. As information services face a future with less available manpower and more and more documents being written, the possibility of keyword attribution being assisted by automatic classification software is explored. A project being carried out at CERN (the European Laboratory for Particle Physics) for the development and integration of automatic keywording is described

E-LIS

Recommended from our members

What can co-speech gestures in aphasia tell us about the relationship between language and gesture?: A single case study of a participant with Conduction Aphasia

Author: Cocks N.
Dipper L.
Morgan G.
Rowe M.
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2011
Field of study

Cross-linguistic evidence suggests that language typology influences how people gesture when using ‘manner-of-motion’ verbs (Kita 2000; Kita & Özyürek 2003) and that this is due to ‘online’ lexical and syntactic choices made at the time of speaking (Kita, Özyürek, Allen, Brown, Furman & Ishizuka, 2007). This paper attempts to relate these findings to the co-speech iconic gesture used by an English speaker with conduction aphasia (LT) and five controls describing a Sylvester and Tweety1 cartoon. LT produced co-speech gesture which showed distinct patterns which we relate to different aspects of her language impairment, and the lexical and syntactic choices she made during her narrative

City Research Online

Crossref

espace@Curtin

Recommended from our members

Minimally supervised induction of morphology through bitexts

Author: Moon Taesun, Ph. D.
Publication venue
Publication date: 01/12/2008
Field of study

textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems. Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis. While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic

Texas ScholarWorks

Spoken query processing for interactive information retrieval

Author: Barnett
Crestani
Crestani
Crestani
Crestani
Crestani
Deerwester
Fabio Crestani
Garofolo
Harman
Harman
Markowitz
Porter
Silipo
Singhal
Singhal
Tombros
Tombros
van Rijsbergen
Voorhees
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

It has long been recognised that interactivity improves the effectiveness of information retrieval systems. Speech is the most natural and interactive medium of communication and recent progress in speech recognition is making it possible to build systems that interact with the user via speech. However, given the typical length of queries submitted to information retrieval systems, it is easy to imagine that the effects of word recognition errors in spoken queries must be severely destructive on the system's effectiveness. The experimental work reported in this paper shows that the use of classical information retrieval techniques for spoken query processing is robust to considerably high levels of word recognition errors, in particular for long queries. Moreover, in the case of short queries, both standard relevance feedback and pseudo relevance feedback can be effectively employed to improve the effectiveness of spoken query processing

Crossref

University of Strathclyde Institutional Repository