120 research outputs found

    LEXICALL: Lexicon Construction for Foreign Language Tutoring

    Get PDF
    We focus on the problem of building large repositories of lexical conceptual structure (LCS) representations for verbs in multiple languages. One of the main results of this work is the definition of a relation between broad semantic classes and LCS meaning components. Our acquisition program---LEXICALL---takes, as input, the result of previous work on verb classification and thematic grid tagging, and outputs LCS representations for different languages. These representations have been ported into English, Arabic and Spanish lexicons, each containing approximately 9000 verbs. We are currently using these lexicons in an operational foreign language tutoring and machine translation. (Also cross-referenced as UMIACS-TR-97-09

    Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004

    No full text
    International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants

    Exploiting Lexical Conceptual Structure for paraphrase generation

    Get PDF
    Abstract. Lexical Conceptual Structure (LCS) represents verbs as semantic structures with a limited number of semantic predicates. This paper attempts to exploit how LCS can be used to explain the regularities underlying lexical and syntactic paraphrases, such as verb alternation, compound word decomposition, and lexical derivation. We propose a paraphrase generation model which transforms LCSs of verbs, and then conduct an empirical experiment taking the paraphrasing of Japanese light-verb constructions as an example. Experimental results justify that syntactic and semantic properties of verbs encoded in LCS are useful to semantically constrain the syntactic transformation in paraphrase generation.

    Italian VerbNet: A Construction based Approach to Italian Verb Classification

    Get PDF
    This paper proposes a new method for Italian verb classification -and a preliminary example of resulting classes- inspired by Levin (1993) and VerbNet (Kipper-Schuler, 2005), yet partially independent from these resources; we achieved such a result by integrating Levin and VerbNet’s models of classification with other theoretic frameworks and resources. The classification is rooted in the constructionist framework (Goldberg, 1995; 2006) and is distribution-based. It is also semantically characterized by a link to FrameNet’ssemanticframesto represent the event expressed by a class. However, the new Italian classes maintain the hierarchic “tree” structure and monotonic nature of VerbNet’s classes, and, where possible, the original names (e.g.: Verbs of Killing, Verbs of Putting, etc.). We therefore propose here a taxonomy compatible with VerbNet but at the same time adapted to Italian syntax and semantics. It also addresses a number of problems intrinsic to the original classifications, such as the role of argument alternations, here regarded simply as epiphenomena, consistently with the constructionist approach

    Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language

    Full text link
    This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their effectiveness

    Current trends in computer linguistics and problem of the machine translation of Arabic

    Get PDF
    The aim of this paper is to present some problems concerning the machine translation of Arabic in the context of the chosen NLP theories and their evolution. First attempts of electronic machine translations in Europe started only a little more than fifty years ago. It is enough time to perceive some aspects of the evolution? Although a lot of the concepts are still valid, the situation in A.D. 2012 is quite different than even twelve years ago. We still see useful old works of N. Chomsky, D. Cohen, but CFG seems to be supported with some new theories which also have got some disadvantages. Some interesting problems occur in the process of automatic translation when the number of grammatical cases is smaller in the source language than that in the output language.The aim of this paper is to present some problems concerning the machine translation of Arabic in the context of the chosen NLP theories and their evolution. First attempts of electronic machine translations in Europe started only a little more than fifty years ago. It is enough time to perceive some aspects of the evolution? Although a lot of the concepts are still valid, the situation in A.D. 2012 is quite different than even twelve years ago. We still see useful old works of N. Chomsky, D. Cohen, but CFG seems to be supported with some new theories which also have got some disadvantages. Some interesting problems occur in the process of automatic translation when the number of grammatical cases is smaller in the source language than that in the output language

    日本の大学における英語アカデミックライティング教育の可能性と課題

    Get PDF
    Today, whether English's dominance as a global lingua franca benefits higher education, more and more universities around the world have made efforts to integrate English academic writing education into their institutional policies and strategies. This trend has been observed particularly against the background where, with the increased internationalization of higher education, the imperative for universities globally to focus on maintaining or improving their international reputation and rankings has grown significantly. Indeed, such prestige tends to be assessed largely in terms of publications in English. With this in mind, we are concerned with how higher education institutions address these efforts toward promoting English academic writing in a specific non-English L1 context, namely Japan. English academic writing in university contexts where English is an additional language exists where the fields of language education, higher education administration, research methodology, and cultural socialization converge. Therefore, this volume brings together scholarship that aims to examine the different ways in which academic writing education shapes and is shaped by students, faculty and other stakeholders in Japanese universities. This volume’s eight chapters, by authors with diverse backgrounds, ranging from administrators to researchers, and from humanities and social sciences to medical studies, explore the opportunities and challenges of English academic writing education in Japanese universities by looking at related topics, including writing centers, faculty members, genre-specific education, and technology development. Together, the discussions in the individual chapters can contribute profoundly to theory, policy, and practice in the domains of curriculum, research, and administration in university contexts.Introduction… Norifumi Miyokawa 1 Part I: A writing center in Japan: Hiroshima University Chapter One: Development of the Hiroshima University Writing Center -From an administrative perspective-… Hiroko Araki & Norifumi Miyokawa 3 Chapter Two: Perceptions of academic writing support -A needs analysis of the Hiroshima University Writing Center-… Roehl Sybing & Norifumi Miyokawa 17 Part II: Faculty development for academic writing Chapter Three: Potential roles of writing centers for writing related Faculty Development… Machi Sato & Shinichi Cho 31 Chapter Four: Academic writing support for faculty members -Writing Groups and Writing Retreats-… Adina Staicov 45 Part III: Genre-specific education: Cases in the medical field Chapter Five: How to write the Introduction of biomedical research articles -Move analysis of the first and last sentences-… Takeshi Kawamoto & Tatsuya Ishii 57 Chapter Six: Error analysis of overt lexicogrammatical errors in the prepublication English-language manuscripts of Japanese biomedical researchers -With implications for the teaching of writing for biomedical research –… Flaminia Miyamasu 67 Part IV: Theoretical and practical approaches to academic writing Chapter Seven: Language socialization and writing centers… Akiko Katayama 81 Chapter Eight: Socialization into integrity -Using plagiarism software to teach L2 writing-… Gavin Furukawa 95 Acknowledgements… Norifumi Miyokawa 10

    SEMANTIC CLASSIFICATION OF VERBS IN THE CROATIAN VERB VALENCY DATABASE

    Get PDF
    U radu se razmatra opravdanost semantičke razdiobe glagola kao polazišta u njihovoj sintaktičkoj obradbi. Pri podjeli glagola na semantičke skupine pojedini autori polaze od sintaktičkih kriterija (npr. Levin 1993, Dorr 1997, Korhonen i Briscoe 2004, Mikelić Preradović 2010), ističući snažnu povezanost značenjskih obilježja glagola i njihove sintaktičke strukture. S pomoću različitih sintaktičkih alternacija određuje se pripadaju li glagoli istoj semantičkoj skupini. S druge strane, dio jezikoslovaca smatra da je opravdanije držati se isključivo semantičkih kriterija i značenjskih odnosa među glagolima koji pripadaju istomu semantičkom polju (npr. Fellbaum 1998, Šojat 2012). Pri odlučivanju o načinu obradbe glagola u Bazi hrvatskih glagolskih valencija pošlo se od podjele glagola na semantičke skupine zbog spomenute povezanosti dviju razina te pretpostavke da neizvorni govornik hrvatskoga može predvidjeti sintaktički obrazac glagola unutar određene semantičke skupine ako zna sintaktički obrazac prototipnih glagola te semantičke skupine. Na osnovi kriterija čestotnosti i zastupljenosti u priručnicima relevantnim za ovladavanje hrvatskim kao stranim jezikom do razine B1 odabrano je 900 glagola. Ti su glagoli prvotno raspoređeni u semantičke skupine prema značenju koje se u rječnicima navodi kao prvo, no to prvo značenje služi samo za osnovnu, polazišnu podjelu s obzirom na to da se pojedini glagoli po svojim ostalim značenjima mogu naći i u kojoj drugoj skupini.In this paper the justification for the use of semantic classification as a starting point in the syntactic description of verbs is discussed. Some authors use syntactic criteria as the basis of their classification of verbs into semantic groups (e.g. Levin 1993, Dorr 1997, Korhonen and Briscoe 2004, Mikelić Preradović 2010), emphasizing the strong relationship between the verbs’ semantic features and their syntactic structure. They determine whether the verbs belong to the same semantic group by using various syntactic alternations. However, some linguists consider that the application of exclusively semantic criteria and the analysis of semantic relationships between verbs belonging to the same semantic field is a more justified approach (e.g. Fellbaum 1998, Šojat 2012). In the Croatian Verb Valency Database (Birtić and Nahod 2015; Birtić and Runjaić 2015), verb descriptions were created by classifying the verbs into semantic groups based on the aforementioned correlation between the two levels and the assumption that the non-native speakers of Croatian can predict syntactic patterns for most verbs within the semantic group if they know the syntactic patterns of the prototypical verbs within the same semantic group. On the basis of the criteria of frequency and occurrence of verbs in handbooks relevant for learning Croatian as a foreign language, 900 verbs that are necessary to master the Croatian language at B1 level have been selected. These verbs were divided into semantic groups according to the first meaning cited in the dictionaries, but for some verbs the criterion according to which verbs are put into a certain semantic group based on the first meaning has been brought into question. More precisely, it is not clear whether the lexicographers selected the oldest or the most common meaning as the first meaning listed for a particular verb. However, the first meaning was used only for the basic division into 34 semantic groups, because most of the verbs are polysemous and belong to other groups as well. Consequently, the introduction of new semantic groups is inevitable. The long-term goal of this research is to define prototypical syntactic patterns for each semantic group, i.e. to find common syntactic patterns within a semantic group
    corecore