198 research outputs found

    Conceptual Refactoring for Creative Information Retrieval

    Get PDF
    Information retrieval (IR) is an effective mechanism for text management that has received widespread adoption in the world at large. But it is not a particularly creative mechanism, in the sense of creating new conceptual structures or refactoring existing ones to pull in documents that describe, in novel and inventive ways, a user’s information needs. Since language is a dynamic and highly creative medium of expression, the concepts that one seeks will therefore represent a moving target for IR systems. We argue that only by thinking creatively, and viewing concepts as fluid meaning structures capable of dynamic reorganization, can an IR system effectively retrieve documents that express themselves creatively

    On link predictions in complex networks with an application to ontologies and semantics

    Get PDF
    It is assumed that ontologies can be represented and treated as networks and that these networks show properties of so-called complex networks. Just like ontologies “our current pictures of many networks are substantially incomplete” (Clauset et al., 2008, p. 3ff.). For this reason, networks have been analyzed and methods for identifying missing edges have been proposed. The goal of this thesis is to show how treating and understanding an ontology as a network can be used to extend and improve existing ontologies, and how measures from graph theory and techniques developed in social network analysis and other complex networks in recent years can be applied to semantic networks in the form of ontologies. Given a large enough amount of data, here data organized according to an ontology, and the relations defined in the ontology, the goal is to find patterns that help reveal implicitly given information in an ontology. The approach does not, unlike reasoning and methods of inference, rely on predefined patterns of relations, but it is meant to identify patterns of relations or of other structural information taken from the ontology graph, to calculate probabilities of yet unknown relations between entities. The methods adopted from network theory and social sciences presented in this thesis are expected to reduce the work and time necessary to build an ontology considerably by automating it. They are believed to be applicable to any ontology and can be used in either supervised or unsupervised fashion to automatically identify missing relations, add new information, and thereby enlarge the data set and increase the information explicitly available in an ontology. As seen in the IBM Watson example, different knowledge bases are applied in NLP tasks. An ontology like WordNet contains lexical and semantic knowl- edge on lexemes while general knowledge ontologies like Freebase and DBpedia contain information on entities of the non-linguistic world. In this thesis, examples from both kinds of ontologies are used: WordNet and DBpedia. WordNet is a manually crafted resource that establishes a network of representations of word senses, connected to the word forms used to express these, and connect these senses and forms with lexical and semantic relations in a machine-readable form. As will be shown, although a lot of work has been put into WordNet, it can still be improved. While it already contains many lexical and semantical relations, it is not possible to distinguish between polysemous and homonymous words. As will be explained later, this can be useful for NLP problems regarding word sense disambiguation and hence QA. Using graph- and network-based centrality and path measures, the goal is to train a machine learning model that is able to identify new, missing relations in the ontology and assign this new relation to the whole data set (i.e., WordNet). The approach presented here will be based on a deep analysis of the ontology and the network structure it exposes. Using different measures from graph theory as features and a set of manually created examples, a so-called training set, a supervised machine learning approach will be presented and evaluated that will show what the benefit of interpreting an ontology as a network is compared to other approaches that do not take the network structure into account. DBpedia is an ontology derived from Wikipedia. The structured information given in Wikipedia infoboxes is parsed and relations according to an underlying ontology are extracted. Unlike Wikipedia, it only contains the small amount of structured information (e.g., the infoboxes of each page) and not the large amount of unstructured information (i.e., the free text) of Wikipedia pages. Hence DBpedia is missing a large number of possible relations that are described in Wikipedia. Also compared to Freebase, an ontology used and maintained by Google, DBpedia is quite incomplete. This, and the fact that Wikipedia is expected to be usable to compare possible results to, makes DBpedia a good subject of investigation. The approach used to extend DBpedia presented in this thesis will be based on a thorough analysis of the network structure and the assumed evolution of the network, which will point to the locations of the network where information is most likely to be missing. Since the structure of the ontology and the resulting network is assumed to reveal patterns that are connected to certain relations defined in the ontology, these patterns can be used to identify what kind of relation is missing between two entities of the ontology. This will be done using unsupervised methods from the field of data mining and machine learning

    An investigation into lemmatization in Southern Sotho

    Get PDF
    Lemmatization refers to the process whereby a lexicographer assigns a specific place in a dictionary to a word which he regards as the most basic form amongst other related forms. The fact that in Bantu languages formative elements can be added to one another in an often seemingly interminable series till quite long words are produced, evokes curiosity as far as lemmatization is concerned. Being aware of the productive nature of Southern Sotho it is interesting to observe how lexicographers go about handling the question of morphological complexities they are normally faced with in the process of arranging lexical items. This study has shown that some difficulties are encountered as far as adhering to the traditional method of alphabetization is concerned. It does not aim at proposing solutions but does point out some considerations which should be borne in mind in the process of lemmatization.African LanguagesM.A. (African Languages

    Dimensions of meaning: The analysis of lexical ambiguity in “Funny Tweets” @JokesMemesFacts on Twitter X

    Get PDF
    ENGLISH: This research aimed to understand humor which is a linguistically creative language, especially in the realm of semantics, dimensions of meaning; homonyms, and polysemy. Moreover, the trend of new language and terms among Twitter X residents has emerged recently due to the COVID-19 pandemic, leading people spend their time on social media for social criticism, expressing sadness, or simply for entertainment. As the main theory of this research, Murphy’s theory (2010) related to lexical ambiguity; including homonymy homograph, homophone, homonymy absolute, and polysemy, also Leech’s theory (1981) related to semantic meaning; including conceptual, connotative, social, affective, reflected, collocative, and thematic meaning, were used to analyze the data. The researcher used a descriptive qualitative approach in this study. The result shows that homonymy is the one of lexical ambiguity that occurs more often than polysemy in phenomena of language that contain humor. Absolute homonymy is the most common type of homonymy, followed by homophones, then homographs which are the rarest. The results revealed there are quite a lot of funny tweets containing lexical ambiguity on Twitter X, which can make people who read confused or misinterpret the true meaning, intent, and motif. INDONESIA: Penelitian ini bertujuan untuk memahami humor yang merupakan bahasa kreatif linguistik, khususnya dalam ranah semantik, dimensi makna; homonim, dan polisemi. Apalagi tren bahasa dan istilah baru yang dilakukan warga Twitter X akhir-akhir ini muncul akibat pandemi COVID-19 yang membuat masyarakat menghabiskan waktunya bermain media sosial baik untuk kritik sosial, meratapi kesedihan, atau sekadar hiburan. Sebagai teori utama penelitian ini, teori Murphy (2010) terkait dengan ambiguitas leksikal; meliputi homograf homonimi, homofon, homonimi absolut, dan polisemi, serta teori Leech (1981) terkait makna semantik; meliputi makna konseptual, konotatif, sosial, afektif, refleksi, kolokatif, dan tematik, digunakan untuk menganalisis data. Peneliti menggunakan pendekatan deskriptif kualitatif dalam penelitian ini. Hasil penelitian menunjukkan bahwa homonimi merupakan salah satu ambiguitas leksikal yang lebih sering terjadi dibandingkan polisemi dalam fenomena bahasa yang mengandung humor. Homonimi mutlak merupakan jenis homonimi yang paling umum, disusul homofon, kemudian homograf menjadi yang paling sedikit terjadi. Hasil penelitian mengungkapkan cukup banyak tweet lucu yang mengandung ambiguitas leksikal di Twitter X, sehingga dapat membuat orang yang membacanya bingung atau salah mengartikan makna dan maksud sebenarnya. ARABIC: يهدف هذا البحث إلى فهم الدعابة باعتبارها لغة إبداعية لغوياً، لا سيما في مجال الدلالات، وأبعاد المعنى؛ المترادفات، والإشتراك اللفظي. علاوة على ذلك، ظهر مؤخرًا اتجاه اللغة والمصطلحات الجديدة من قبل مستخدمي تويتر بسبب جائحة كوفيد-19، مما يجعل الناس يقضون وقتهم في استخدام وسائل التواصل الاجتماعي من حيث النقد الاجتماعي أو الحزن أو مجرد الترفيه. باعتبارها النظرية الرئيسية لهذا البحث، تتعلق نظرية مورفي (2010) بالغموض المعجمي؛ بما في ذلك التجانس المتجانس، والتجانس المتجانس، والتجانس المطلق، وتعدد المعاني، وكذلك نظرية ليتش (1981) المتعلقة بالمعنى الدلالي؛ بما في ذلك المعنى المفاهيمي والدلالي والاجتماعي والعاطفي والمنعكس والجماعي والموضوعي، تم استخدامه لتحليل البيانات. استخدم الباحث المنهج الوصفي النوعي في هذه الدراسة. وتبين النتيجة أن التجانس هو الغموض المعجمي الذي يحدث في كثير من الأحيان أكثر من تعدد المعاني في الظواهر اللغوية التي تحتوي على الفكاهة. التجانس المطلق هو النوع الأكثر شيوعًا من التجانس، يليه المتجانسات، ثم المتجانسات وهي الأكثر ندرة. كشفت النتائج أن هناك الكثير من التغريدات المضحكة التي تحتوي على غموض معجمي على تويتر، مما قد يجعل الأشخاص الذين يقرؤون في حيرة من أمرهم أو يسيئون تفسير المعنى الحقيقي والقص

    Cultural Factors in Semantic Extension : a Typological Perspective on Chinese Polysemy

    Get PDF
    This article offers a typological approach to Chinese polysemy. Cultural factors are verified to have profound effects on propelling semantic extension mechanism, through analyzing multiple instances in Chinese and English. Chinese polysemy reveals abundant inherent individual characteristics, however in respect of semantic extension, it relates to many universals as well that are comprehensible to a large extent

    Computational approaches to semantic change

    Get PDF
    Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least  understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans. A major challenge presently is to integrate the hard-earned  knowledge and expertise of traditional historical linguistics with  cutting-edge methodology explored primarily in computational linguistics. The idea for the present volume came out of a concrete response to this challenge.  The 1st International Workshop on Computational Approaches to Historical Language Change (LChange'19), at ACL 2019, brought together scholars from both fields. This volume offers a survey of this exciting new direction in the study of semantic change, a discussion of the many remaining challenges that we face in pursuing it, and considerably updated and extended versions of a selection of the contributions to the LChange'19 workshop, addressing both more theoretical problems —  e.g., discovery of "laws of semantic change" — and practical applications, such as information retrieval in longitudinal text archives

    Computational approaches to semantic change

    Get PDF
    Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least  understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans. A major challenge presently is to integrate the hard-earned  knowledge and expertise of traditional historical linguistics with  cutting-edge methodology explored primarily in computational linguistics. The idea for the present volume came out of a concrete response to this challenge.  The 1st International Workshop on Computational Approaches to Historical Language Change (LChange'19), at ACL 2019, brought together scholars from both fields. This volume offers a survey of this exciting new direction in the study of semantic change, a discussion of the many remaining challenges that we face in pursuing it, and considerably updated and extended versions of a selection of the contributions to the LChange'19 workshop, addressing both more theoretical problems —  e.g., discovery of "laws of semantic change" — and practical applications, such as information retrieval in longitudinal text archives

    Computational approaches to semantic change

    Get PDF
    Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least  understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans. A major challenge presently is to integrate the hard-earned  knowledge and expertise of traditional historical linguistics with  cutting-edge methodology explored primarily in computational linguistics. The idea for the present volume came out of a concrete response to this challenge.  The 1st International Workshop on Computational Approaches to Historical Language Change (LChange'19), at ACL 2019, brought together scholars from both fields. This volume offers a survey of this exciting new direction in the study of semantic change, a discussion of the many remaining challenges that we face in pursuing it, and considerably updated and extended versions of a selection of the contributions to the LChange'19 workshop, addressing both more theoretical problems —  e.g., discovery of "laws of semantic change" — and practical applications, such as information retrieval in longitudinal text archives
    corecore