47 research outputs found

    A Semantic Relatedness Measure Based on Combined Encyclopedic, Ontological and Collocational Knowledge

    Full text link
    We describe a new semantic relatedness measure combining the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation index. Our measure achieves the currently highest results on the WS-353 test: a Spearman rho coefficient of 0.79 (vs. 0.75 in (Gabrilovich and Markovitch, 2007)) when applying the measure directly, and a value of 0.87 (vs. 0.78 in (Agirre et al., 2009)) when using the prediction of a polynomial SVM classifier trained on our measure. In the appendix we discuss the adaptation of ESA to 2011 Wikipedia data, as well as various unsuccessful attempts to enhance ESA by filtering at word, sentence, and section level.Comment: 6 pages, 6 figures, accepted for publication at IJCNLP2011 Conferenc

    Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness

    Full text link
    We propose and study a novel supervised approach to learning statistical semantic relatedness models from subjectively annotated training examples. The proposed semantic model consists of parameterized co-occurrence statistics associated with textual units of a large background knowledge corpus. We present an efficient algorithm for learning such semantic models from a training sample of relatedness preferences. Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts. Moreover, the approach facilitates the fitting of semantic models for specific users or groups of users. We present the results of extensive range of experiments from small to large scale, indicating that the proposed method is effective and competitive with the state-of-the-art.Comment: 37 pages, 8 figures A short version of this paper was already published at ECML/PKDD 201

    Gradient Metaphoricity of the Preposition in: A Corpus-based Approach to Chinese Academic Writing in English

    Get PDF
    In Cognitive Linguistics, a conceptual metaphor is a systematic set of correspondences between two domains of experience (Kövecses 2020: 2). In order to have an extensive understanding of metaphors, metaphoricity (Müller and Tag 2010; Dunn 2011; Jensen and Cuffari 2014; Nacey and Jensen 2017) has been emphasized to address one of the properties of metaphors in language usage: gradience (Hanks 2006; Dunn 2011, 2014), which indicates that metaphorical expressions can be measured. Despite many noteworthy contributions, studies of metaphoricity are often accused of subjectivity (Müller 2008; Jensen and Cuffari 2014; Jensen 2017), this is why this study uses a big corpus as a database. Therefore, the main aim of this dissertation is to measure the gradient senses of the preposition in in an objective way, thus mapping the highly systematic semantic extension. Based on these gradient senses, the semantic and syntactic features of the preposition in produced by advanced Chinese English-major learners are investigated, combining quantitative and qualitative research methods. A quantitative analysis of the literal and other ten metaphorical senses of the preposition in is made at first. In accounting for the five factors influencing image schemata of each sense: “scale of Landmark”, “visibility”, “path”, “inclusion” and “boundary”, the formula of measuring the gradability of metaphorical degree is deduced: Metaphoricity=[[#Visibility] +[#Path] +[#Inclusion] +[#Boundary]]*[#Scale of Landmark]. The result is that the primary sense has the highest value:12, and all other extended senses have values down to zero. The more shared features with proto-scene, the higher the value of the metaphorical sense, and the less metaphorical the sense. EVENT and PERSON are the “least metaphoric” (value = 9-11); SITUATION, NUMBER, CONTENT and FIELD are “weak metaphoric” (value = 6-8); Also included are SEGMENTATION, TIME and MANNER (value = 3-5), and they are “strong metaphoric”; PURPOSE shares the least feature with proto-scene, and it has the lowest value, so it is “most metaphoric” (value = 0-2). Then, a corpus-based approach is employed, which offers a model for employing a corpus-based approach in Cognitive Linguistics. It compares two compiled sub-corpora: Chinese Master Academic Writing Corpus and Chinese Doctorate Academic Writing Corpus. The findings show that, on the semantic level, Chinese English-major students overuse in with a low level of metaphoricity, even advanced learners use the most metaphorical in rarely. In terms of syntactic behaviours, the most frequent nouns in [in+noun] construction are weakly metaphoric, whilst the nouns in the construction [in the noun of] are EVENT sense, which is least metaphorical. Moreover, action verbs tend to be used in the construction [verb+in] and [in doing sth.] in both master and doctorate groups. In the qualitative study, the divergent usages of the preposition in are explored. The preposition in is often substituted with other prepositions, such as on and at. The fundamental reason for the Chinese learners’ weakness is the negative transfer from their mother tongue (Wang 2001; Gong 2007; Zhang 2010). Although in and its Chinese equivalence zai...li (在...里) share the same proto-scene, there are discrepancies: the metaphorical senses of the preposition in are TIME, PURPOSE, NUMBER, CONTENT, FIELD, EVENT, SITUATION, SEGMENTATION, MANNER, PERSON, while those of zai...li (在...里) are only five: TIME, CONTENT, EVENT, SITUATION and PERSON. Thus the image schemata of each sense cannot be correspondingly mapped onto each other in different languages. This study also provides evidence for the universality and variation of spatial metaphors on the ground of cultural models. Philosophically, it supports the standpoint of Embodiment philosophy that abstract concepts are constructed on the basis of spatial metaphors that are grounded in the physical and cultural experience

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    Ontology Localization

    Get PDF
    Nuestra meta principal en esta tesis es proponer una solución para construir una ontología multilingüe, a través de la localización automática de una ontología. La noción de localización viene del área de Desarrollo de Software que hace referencia a la adaptación de un producto de software a un ambiente no nativo. En la Ingeniería Ontológica, la localización de ontologías podría ser considerada como un subtipo de la localización de software en el cual el producto es un modelo compartido de un dominio particular, por ejemplo, una ontología, a ser usada por una cierta aplicación. En concreto, nuestro trabajo introduce una nueva propuesta para el problema de multilingüismo, describiendo los métodos, técnicas y herramientas para la localización de recursos ontológicos y cómo el multilingüismo puede ser representado en las ontologías. No es la meta de este trabajo apoyar una única propuesta para la localización de ontologías, sino más bien mostrar la variedad de métodos y técnicas que pueden ser readaptadas de otras áreas de conocimiento para reducir el costo y esfuerzo que significa enriquecer una ontología con información multilingüe. Estamos convencidos de que no hay un único método para la localización de ontologías. Sin embargo, nos concentramos en soluciones automáticas para la localización de estos recursos. La propuesta presentada en esta tesis provee una cobertura global de la actividad de localización para los profesionales ontológicos. En particular, este trabajo ofrece una explicación formal de nuestro proceso general de localización, definiendo las entradas, salidas, y los principales pasos identificados. Además, en la propuesta consideramos algunas dimensiones para localizar una ontología. Estas dimensiones nos permiten establecer una clasificación de técnicas de traducción basadas en métodos tomados de la disciplina de traducción por máquina. Para facilitar el análisis de estas técnicas de traducción, introducimos una estructura de evaluación que cubre sus aspectos principales. Finalmente, ofrecemos una vista intuitiva de todo el ciclo de vida de la localización de ontologías y esbozamos nuestro acercamiento para la definición de una arquitectura de sistema que soporte esta actividad. El modelo propuesto comprende los componentes del sistema, las propiedades visibles de esos componentes, las relaciones entre ellos, y provee además, una base desde la cual sistemas de localización de ontologías pueden ser desarrollados. Las principales contribuciones de este trabajo se resumen como sigue: - Una caracterización y definición de los problemas de localización de ontologías, basado en problemas encontrados en áreas relacionadas. La caracterización propuesta tiene en cuenta tres problemas diferentes de la localización: traducción, gestión de la información, y representación de la información multilingüe. - Una metodología prescriptiva para soportar la actividad de localización de ontologías, basada en las metodologías de localización usadas en Ingeniería del Software e Ingeniería del Conocimiento, tan general como es posible, tal que ésta pueda cubrir un amplio rango de escenarios. - Una clasificación de las técnicas de localización de ontologías, que puede servir para comparar (analíticamente) diferentes sistemas de localización de ontologías, así como también para diseñar nuevos sistemas, tomando ventaja de las soluciones del estado del arte. - Un método integrado para construir sistemas de localización de ontologías en un entorno distribuido y colaborativo, que tenga en cuenta los métodos y técnicas más apropiadas, dependiendo de: i) el dominio de la ontología a ser localizada, y ii) la cantidad de información lingüística requerida para la ontología final. - Un componente modular para soportar el almacenamiento de la información multilingüe asociada a cada término de la ontología. Nuestra propuesta sigue la tendencia actual en la integración de la información multilingüe en las ontologías que sugiere que el conocimiento de la ontología y la información lingüística (multilingüe) estén separados y sean independientes. - Un modelo basado en flujos de trabajo colaborativos para la representación del proceso normalmente seguido en diferentes organizaciones, para coordinar la actividad de localización en diferentes lenguajes naturales. - Una infraestructura integrada implementada dentro del NeOn Toolkit por medio de un conjunto de plug-ins y extensiones que soporten el proceso colaborativo de localización de ontologías

    Towards a textual theory of metonymy: a semiotic approach to the nature and role of metonymy in text

    Get PDF
    This thesis argues that the scope of metonymy throughout history remains severely reduced to a process of word substitution and the signifying potential of the trope is limited to lexical representation. The study therefore proposes a semiotic approach to take the trope beyond this limitation and to develop a textual theory to the trope. A background study related to how metonymy is treated in previous studies is therefore necessary. This review of literature covers a long period starting from ancient Greece and going up to the present day. Chapters one and two of this thesis, which give this general background, show that the hypothesis is to a large extent valid. The thesis then examines another related hypothesis which is that metonymy is semiotic in nature and a semiotic approach to metonymy will solve the problem of reductionism in the treatment of this trope. Chapter three is devoted to an examination of this hypothesis. It shows that a semiotic approach to metonymy is not only possible but also crucial. The semiotic approach to metonymy basically concerns the treatment of metonymy as a sign which cuts across three domains of representation. These are the domain of words, the domain of concepts and the domain of things or objects. The last domain is itself treated from a semiotic perspective to stand for the domain of context at large. on the basis of this semiotic approach to metonymy a textual model of metonymic relations in text is constructed. this model is put to the test in chapter four. here the metonymic relations of form for form, form for concept, form for thing, thing for form and concept for form are brought to bear on the formal and semantic connectedness of text. in chapter five the metonymic relations of concept for concept, concept for thing, thing for thing and thing for concept are used to explain how these metonymic relations interact to provide a linkage between language, cognition and context

    Compounding in Namagowab and English: (exploring meaning creation in compounds)

    Get PDF
    This essay investigates compounding in Namagowab and English, which belong to two widely divergent groups of languages, the Khoesan and Indo-European, respectively. The first motive is to investigate how and why new words are created from existing ones. The reading and data interpretation seeks an understanding of word formation and an overview of semantic compositionality, structure and productivity, within the broad context of cognitive, lexicalist and distributed morphology paradigms. This coupled with history reading about the languages and its people, is used to speculate about why compounds feature in lexical creation. Compounding is prevalent in both languages and their distance in terms of phylogenetic relationships should allow limited generalizing about these processes of formation. Word lists taken from dictionaries in both languages were analyzed by entering the words in Excel spreadsheets so that various attributes of these words, such as word type, compound class (Noun, Verb, Preposition, Adjective and Adverb) and constituent class could be counted, and described with formulae, and compound and constituent meaning analyzed. The conclusion was that socio historical factors such as language contact, and aspects of cognition such as memory and transparency, account for compounding in a language in addition to typology
    corecore