7 research outputs found

    Implemented Stemming Algorithms for Information Retrieval Applications

    Get PDF
    Now a day’s text documents are advancing over internet, e-mails and web pages. As the use of internet is exponentially growing, the need of massive data storage is increasing from time to time.  Normally many of the documents contain morphological variables, so stemming which is a preprocessing technique gives a mapping of different morphological variants of words into their base word called the stem. Stemming process is used in information retrieval applications accordingly as a way to improve retrieval performance based on the assumption that terms with the same stem usually have similar meaning.  To do stemming operation on bulky documents, we require normally more computation time and power, to cope up with the need to search for a particular word in the data. In this paper, various stemming algorithms are analyzed with the benefits and limitation of the recent stemming methods or approaches. Keywords: - Natural Language Processing Applications, Information Retrieval, Information Retrieval Applications (IRAs), Stemming Approaches DOI: 10.7176/IKM/10-3-01 Publication date: April 30th 202

    Implemented Stemming Algorithms for Information Retrieval Applications

    Get PDF
    Now a day’s text documents are advancing over internet, e-mails and web pages. As the use of internet is exponentially growing, the need of massive data storage is increasing from time to time.  Normally many of the documents contain morphological variables, so stemming which is a preprocessing technique gives a mapping of different morphological variants of words into their base word called the stem. Stemming process is used in information retrieval applications accordingly as a way to improve retrieval performance based on the assumption that terms with the same stem usually have similar meaning.  To do stemming operation on bulky documents, we require normally more computation time and power, to cope up with the need to search for a particular word in the data. In this paper, various stemming algorithms are analyzed with the benefits and limitation of the recent stemming methods or approaches. Keywords: - Natural Language Processing Applications, Information Retrieval, Information Retrieval Applications (IRAs), Stemming Approaches DOI: 10.7176/JIEA/10-3-01 Publication date: April 30th 202

    Hindi language text search: a literature review

    Get PDF
    The literature review focuses on the major problems of Hindi text searching over the web. The review reveals the availability of a number of techniques and search engines that have been developed to facilitate Hindi text searching. Among many problems, a dominant one is when a text formed by combinatorial characters or words is searched

    Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

    Get PDF
    Peer reviewe

    Wiktionary: The Metalexicographic and the Natural Language Processing Perspective

    Get PDF
    Dictionaries are the main reference works for our understanding of language. They are used by humans and likewise by computational methods. So far, the compilation of dictionaries has almost exclusively been the profession of expert lexicographers. The ease of collaboration on the Web and the rising initiatives of collecting open-licensed knowledge, such as in Wikipedia, caused a new type of dictionary that is voluntarily created by large communities of Web users. This collaborative construction approach presents a new paradigm for lexicography that poses new research questions to dictionary research on the one hand and provides a very valuable knowledge source for natural language processing applications on the other hand. The subject of our research is Wiktionary, which is currently the largest collaboratively constructed dictionary project. In the first part of this thesis, we study Wiktionary from the metalexicographic perspective. Metalexicography is the scientific study of lexicography including the analysis and criticism of dictionaries and lexicographic processes. To this end, we discuss three contributions related to this area of research: (i) We first provide a detailed analysis of Wiktionary and its various language editions and dictionary structures. (ii) We then analyze the collaborative construction process of Wiktionary. Our results show that the traditional phases of the lexicographic process do not apply well to Wiktionary, which is why we propose a novel process description that is based on the frequent and continual revision and discussion of the dictionary articles and the lexicographic instructions. (iii) We perform a large-scale quantitative comparison of Wiktionary and a number of other dictionaries regarding the covered languages, lexical entries, word senses, pragmatic labels, lexical relations, and translations. We conclude the metalexicographic perspective by finding that the collaborative Wiktionary is not an appropriate replacement for expert-built dictionaries due to its inconsistencies, quality flaws, one-fits-all-approach, and strong dependence on expert-built dictionaries. However, Wiktionary's rapid and continual growth, its high coverage of languages, newly coined words, domain-specific vocabulary and non-standard language varieties, as well as the kind of evidence based on the authors' intuition provide promising opportunities for both lexicography and natural language processing. In particular, we find that Wiktionary and expert-built wordnets and thesauri contain largely complementary entries. In the second part of the thesis, we study Wiktionary from the natural language processing perspective with the aim of making available its linguistic knowledge for computational applications. Such applications require vast amounts of structured data with high quality. Expert-built resources have been found to suffer from insufficient coverage and high construction and maintenance cost, whereas fully automatic extraction from corpora or the Web often yields resources of limited quality. Collaboratively built encyclopedias present a viable solution, but do not cover well linguistically oriented knowledge as it is found in dictionaries. That is why we propose extracting linguistic knowledge from Wiktionary, which we achieve by the following three main contributions: (i) We propose the novel multilingual ontology OntoWiktionary that is created by extracting and harmonizing the weakly structured dictionary articles in Wiktionary. A particular challenge in this process is the ambiguity of semantic relations and translations, which we resolve by automatic word sense disambiguation methods. (ii) We automatically align Wiktionary with WordNet 3.0 at the word sense level. The largely complementary information from the two dictionaries yields an aligned resource with higher coverage and an enriched representation of word senses. (iii) We represent Wiktionary according to the ISO standard Lexical Markup Framework, which we adapt to the peculiarities of collaborative dictionaries. This standardized representation is of great importance for fostering the interoperability of resources and hence the dissemination of Wiktionary-based research. To this end, our work presents a foundational step towards the large-scale integrated resource UBY, which facilitates a unified access to a number of standardized dictionaries by means of a shared web interface for human users and an application programming interface for natural language processing applications. A user can, in particular, switch between and combine information from Wiktionary and other dictionaries without completely changing the software. Our final resource and the accompanying datasets and software are publicly available and can be employed for multiple different natural language processing applications. It particularly fills the gap between the small expert-built wordnets and the large amount of encyclopedic knowledge from Wikipedia. We provide a survey of previous works utilizing Wiktionary, and we exemplify the usefulness of our work in two case studies on measuring verb similarity and detecting cross-lingual marketing blunders, which make use of our Wiktionary-based resource and the results of our metalexicographic study. We conclude the thesis by emphasizing the usefulness of collaborative dictionaries when being combined with expert-built resources, which bears much unused potential

    Everything Flows

    Get PDF
    This collection of essays explores the metaphysical thesis that the living world is not ontologically made up of substantial particles or things, as has often been assumed, but is rather constituted by processes. The biological domain is organized as an interdependent hierarchy of processes, which are stabilized and actively maintained at different timescales. Even entities that intuitively appear to be paradigms of things, such as organisms, are actually better understood as processes. Unlike previous attempts to articulate processual views of biology, which have tended to use Alfred North Whitehead’s panpsychist metaphysics as a foundation, this book takes a naturalistic approach to metaphysics. It submits that the main motivations for replacing an ontology of substances with one of processes are to be looked for in the empirical findings of science. Biology provides compelling reasons for thinking that the living realm is fundamentally dynamic and that the existence of things is always conditional on the existence of processes. The phenomenon of life cries out for theories that prioritize processes over things, and it suggests that the central explanandum of biology is not change but rather stability—or, more precisely, stability attained through constant change. This multicontributor volume brings together philosophers of science and metaphysicians interested in exploring the consequences of a processual philosophy of biology. The contributors draw on an extremely wide range of biological case studies and employ a process perspective to cast new light on a number of traditional philosophical problems such as identity, persistence, and individuality

    African Studies Abstracts Online: number 25, 2009

    Get PDF
    ASA Online provides a quarterly overview of journal articles and edited works on Africa in the field of the social sciences and the humanities available in the ASC library. Issue 25 (2009). African Studies Centre, Leiden.ASC – Publicaties niet-programma gebonde
    corecore