295 research outputs found

    Distributional Measures of Semantic Distance: A Survey

    Full text link
    The ability to mimic human notions of semantic distance has widespread applications. Some measures rely only on raw text (distributional measures) and some rely on knowledge sources such as WordNet. Although extensive studies have been performed to compare WordNet-based measures with human judgment, the use of distributional measures as proxies to estimate semantic distance has received little attention. Even though they have traditionally performed poorly when compared to WordNet-based measures, they lay claim to certain uniquely attractive features, such as their applicability in resource-poor languages and their ability to mimic both semantic similarity and semantic relatedness. Therefore, this paper presents a detailed study of distributional measures. Particular attention is paid to flesh out the strengths and limitations of both WordNet-based and distributional measures, and how distributional measures of distance can be brought more in line with human notions of semantic distance. We conclude with a brief discussion of recent work on hybrid measures

    Specializing distributional vectors of allwords for lexical entailment

    Get PDF
    Semantic specialization methods fine-tune distributional word vectors using lexical knowledge from external resources (e.g., WordNet) to accentuate a particular relation between words. However, such post-processing methods suffer from limited coverage as they affect only vectors of words seen in the external resources. We present the first postprocessing method that specializes vectors of all vocabulary words – including those unseen in the resources – for the asymmetric relation of lexical entailment (LE) (i.e., hyponymyhypernymy relation). Leveraging a partially LE-specialized distributional space, our POSTLE (i.e., post-specialization for LE) model learns an explicit global specialization function, allowing for specialization of vectors of unseen words, as well as word vectors from other languages via cross-lingual transfer. We capture the function as a deep feedforward neural network: its objective re-scales vector norms to reflect the concept hierarchy while simultaneously attracting hyponymyhypernymy pairs to better reflect semantic similarity. An extended model variant augments the basic architecture with an adversarial discriminator. We demonstrate the usefulness and versatility of POSTLE models with different input distributional spaces in different scenarios (monolingual LE and zero-shot cross-lingual LE transfer) and tasks (binary and graded LE). We report consistent gains over state-of-the-art LE-specialization methods, and successfully LE-specialize word vectors for languages without any external lexical knowledge

    Mapping Persian Words to WordNet Synsets

    Get PDF
    Lexical ontologies are one of the main resources for developing natural language processing and semantic web applications. Mapping lexical ontologies of different languages is very important for inter-lingual tasks. On the other hand mapping approaches can be implied to build lexical ontologies for a new language based on pre-existing resources of other languages. In this paper we propose a semantic approach for mapping Persian words to Princeton WordNet Synsets. As there is no lexical ontology for Persian, our approach helps not only in building one for this language but also enables semantic web applications on Persian documents. To do the mapping, we calculate the similarity of Persian words and English synsets using their features such as super-classes and subclasses, domain and related words. Our approach is an improvement of an existing one applying in a new domain, which increases the recall noticeably

    Identifying Semantic Divergences Across Languages

    Get PDF
    Cross-lingual resources such as parallel corpora and bilingual dictionaries are cornerstones of multilingual natural language processing (NLP). They have been used to study the nature of translation, train automatic machine translation systems, as well as to transfer models across languages for an array of NLP tasks. However, the majority of work in cross-lingual and multilingual NLP assumes that translations recorded in these resources are semantically equivalent. This is often not the case---words and sentences that are considered to be translations of each other frequently divergein meaning, often in systematic ways. In this thesis, we focus on such mismatches in meaning in text that we expect to be aligned across languages. We term such mismatches as cross-lingual semantic divergences. The core claim of this thesis is that translation is not always meaning preserving which leads to cross-lingual semantic divergences that affect multilingual NLP tasks. Detecting such divergences requires ways of directly characterizing differences in meaning across languages through novel cross-lingual tasks, as well as models that account for translation ambiguity and do not rely on expensive, task-specific supervision. We support this claim through three main contributions. First, we show that a large fraction of data in multilingual resources (such as parallel corpora and bilingual dictionaries) is identified as semantically divergent by human annotators. Second, we introduce cross-lingual tasks that characterize differences in word meaning across languages by identifying the semantic relation between two words. We also develop methods to predict such semantic relations, as well as a model to predict whether sentences in different languages have the same meaning. Finally, we demonstrate the impact of divergences by applying the methods developed in the previous sections to two downstream tasks. We first show that our model for identifying semantic relations between words helps in separating equivalent word translations from divergent translations in the context of bilingual dictionary induction, even when the two words are close in meaning. We also show that identifying and filtering semantic divergences in parallel data helps in training a neural machine translation system twice as fast without sacrificing quality

    Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

    Get PDF
    This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge

    Cross-Dictionary Linking at Sense Level with a Double-Layer Classifier

    Get PDF
    We present a system for linking dictionaries at the sense level, which is part of a wider programme aiming to extend current lexical resources and to create new ones by automatic means. One of the main challenges of the sense linking task is the existence of non one-to-one mappings among senses. Our system handles this issue by addressing the task as a binary classification problem using standard Machine Learning methods, where each sense pair is classified independently from the others. In addition, it implements a second, statistically-based classification layer to also model the dependence existing among sense pairs, namely, the fact that a sense in one dictionary that is already linked to a sense in the other dictionary has a lower probability of being linked to a further sense. The resulting double-layer classifier achieves global Precision and Recall scores of 0.91 and 0.80, respectively

    Crosslinguistic influence between Chinese and English in object realization

    Get PDF
    La omisión de objetos es una propiedad definida del chino mientras que en inglés y en español las normas son más estrictas en este aspecto. Este trabajo, basado en una investigación con participantes bilingües chino-inglés, español-inglés, y monolingües inglés, analiza la omisión de objetos en inglés. El objetivo es evaluar hasta qué punto el mecanismo del objeto nulo en chino influye en el desarrollo del inglés de niños bilingües chino-inglés. En este estudio comparo los objetos nulos ilícitos producidos por los participantes desde los puntos de vista cuantitativo y cualitativo. Los resultados demuestran que existe una diferencia apreciable entre el rendimiento de los participantes bilingües chino-inglés y los de otros grupos con respecto al asunto estudiado, lo cual respalda la conclusión: aunque el mecanismo de objeto nulo es una propiedad de las gramáticas en desarrollo, en el caso del desarrollo del inglés de niños bilingües chino-inglés, resulta en transferencia negativa.Filología InglesaMáster en Estudios Ingleses Avanzados: Lenguas y Culturas en Contact

    Vocabulary Teaching and Learning in the First-year Students of Electro mechanics and Nursing of the Escuela Naval de Suboficiales ARC Barranquilla (ENSB) : A case study /

    Get PDF
    The current case study has as the objective of investigating how vocabulary is taught and learned in two English classes at the Escuela Naval de Suboficiales A.R.C Barranquilla (ENSB) since students at this institution have not been progressing as expected in their English level. The participants were 42 first-year military students and two English teachers in the second and third modules of studies in the academic year of 2016. Using a mixed method approach, classroom observations, student questionnaires, and a pre- and post-test for each group were used to collect the data for this study. The findings reveal that student progress in regards to their vocabulary word knowledge. Also, it was seen that both teachers and students use some vocabulary teaching-learning strategies; however, the ones used are limited, repeated, and focus mainly on improving students’ vocabulary breadth. Students’ vocabulary depth progress is very limited. Other factors such as an insufficient use of vocabulary for communication in English seem to affect student progress negatively. The findings of this investigation guided the researcher to offer some recommendations, including using a variety of vocabulary strategies that focus on both depth and breadth vocabulary knowledge and developing both receptive and productive vocabulary. Also, students should be directly taught how to use a variety of vocabulary strategies in order to enhance their development

    Cognates in English and Spanish, an applied comparative study in lexicography

    Get PDF
    This work is an applied, comparative study in lexicography for cognate lexical items in Porteño Spanish and British English. The dictionary is intended for learners and teachers of both languages.The thesis sustained is that for a dictionary of this type to fulfil its aims, its compilation must take into account linguistic principles based on modern linguistic theory, i.e. it must include, as well as cultural differences, all relevant information from the three components of a linguistic theory of description, viz., the phonological, the syntactic and the semantic component. But, because drastic changes in lexicography are undesirable in a dictionary with a practical aim, whose would -be -users are for the most part laymen with respect to linguistic theory, this type of lexicography is seen as a compromise between traditional lexicography and modern linguistic theory.Part I consists of an Introduction and eight chapters. The first chapter is devoted to the definition of the terms contained in the initial statement of this summary, and in it a preliminary definition of''cognates "as "dictionary entries in English and Spanish historically derived from the same root" is given.In Chapter II, after an analysis of the "components" of the vocabulary of Porteño Spanish, with special reference to borrowings, and after an analysis of borrowings in English, the definition of'cognates "is enlarged to include them. Thus, cognates are "dictionary entries in English and Spanish historically derived from the same root and borrowed lexical items from English into Spanish and vice versa."Chapter III is devoted to an investigation of which cognates (from the "complete list" as found in the vocabularies of English and Spanish) the students of English (or Spanish) will be exposed to during the process of learning a language. Thus, according to the definition of cognates given in Chapter II, two frequency lists, M.West's A General Service List of English 'fords and Juilland and Chang -Rodríguez's A Frequency Dictionary of Spanish Words was analysed. The percentage of cognates which the lists yielded was 48% for English and 74 for Spanish. But in the course of the analysis several facts emerged about factors other than etymological cognateness, which play an important part in the recognition and understanding of cognates, the consideration of which is taken up in Chapter IV.To prove certain of the points discussed, a series of text- books of English as a foreign language, viz. Hornby and Mackin's Oxford Progressive English Alternative Course, Books A - D, in use in Argentina, was analysed from the point of view of cognates. The main conclusion derived from the consideration of factors other than etymological cognateness is that similarity of graphic substance is a crucial criterion for the definition of cognates and so the "final" definition (for the purpose of this paper), viz., "Dictionary entries in English and Spanish, similar in graphic substance, derived from the same root, and borrowed lexical items from English into Spanish or vice versa" was reached.In Chapter V, after an introductory discussion of the relation- ship between theoretical and applied linguistics and lexicography, the basic Principles of transformational grammar to which we adhere are outlined, and the theoretical framework for semantic analysis which has been developed by Katz and, Fodor is discussed, with a view to its application in a dictionary of cognates. For reasons stated in this chapter we have deviated from the said framework in the sense of adopting conventional definitions of "readings" of lexical items instead of adhering to Katz' system of decomposing a "reading" of an item into Semantic Markers and Distinguishers, and in the sense that the systematization of cultural differences (part of what Katz calls "knowledge of the world ") via Cultural Semantic Markers is considered all important in this type of lexicography. The principle of Semantic Marker is also used as a cross -reference between conventional and conceptual (or ideological) dictionaries via what we have called Conceptual Field Semantic Markers.The information from the phonological and syntactic components which is relevant for the dictionary is discussed in Chapter VI. For this purpose, in order to be able to generalize, and due to the fact that the majority of cognates found in the research are marked with the category features noun or adjective or verb or adverb, cognates are divided into four main groups. After considering the phonological, syntactic (and sometimes semantic) questions involved, the information from the phonological and the syntactic components which entries for each of these "word- classes" should contain is specified.Ways and means for extracting and, specifying cultural features via the comparison of the structure of the lexical "fields" in English and Spanish, i.e., taking into account syntagmatic and paradigmatic relations between words (sense relations), are outlined in Chapter VII.A corollary of the need for the inclusion of cultural features in readings of cognate lexical items, i.e. the necessity to break with the word -for -word translation equivalence tradition in bilingual dictionaries is also discussed.In Chapter VIII, after a comparison of the aims of bilingual dictionaries and a dictionary of cognates, proposals for a) what the Introduction to a dictionary of cognates should contain, b) whether the dictionary ought to be monolingual or bilingual, c) for- mat of the dictionary, d) whether the dictionary ought to be inclusive or restrictive, e) the question of head -entries, f) the inclusion of "compound- words" and "idioms ", g) what an entry in the dictionary should contain, h) the ordering of "readings" of entries, and i) the inclusion of labels used in conventional lexicography, are listed.In the introductory section of Part II, background information for the comparison of. the "fields" of education in Britain and Argentina is outlined according to relevant dimensions. From this preliminary work three lexical "subsets" emerged, and from these the list of cognates for the practical analysis was compiled.The lexicographical procedures to be applied are discussed prior to the actual analysis of the cognates.The general conclusions arrived at in this work are listed at the end of the Practical Part.A bibliography of articles, books, reference books, etc. which were read and /or consulted for this work is included at this point.Appendixes I - III, consist of the lists of cognates found in our research. Appendix IV contains Mr. Mackin's letter in reply to questions put to him about the series Oxford Progressive English Alternative Course

    Coding the semantic relations for basic nouns and verbs

    Get PDF
    • …
    corecore