85 research outputs found

    Acquisition and enrichment of morphological and morphosemantic knowledge from the French Wiktionary

    Get PDF
    International audienceWe present two approaches to automatically acquire morphologically related words from Wiktionary. Starting with related words explicitly mentioned in the dictionary, we propose a method based on orthographic similarity to detect new derived words from the entries' definitions with an overall accuracy of 93.5%. Using word pairs from the initial lexicon as patterns of formal analogies to filter new derived words enables us to rise the accuracy up to 99%, while extending the lexicon's size by 56%. In a last experiment, we show that it is possible to semantically type the morphological definitions, focusing on the detection of process nominals

    A domain-independent semantic tagger for the study of meaning associations in English text

    Get PDF
    A comparison of semantic tagging with syntactic Part-of-Speech tagging leads us to propose that a domain-independent semantic tagger for English corpora should not aim to annotate each word with an atomic 'sem-tag', but instead that a semantic tagging should attach to each word a set of semantic primitive attributes or features. These features should include: - lemma or root, grouping together inflected and derived forms of the same lexical item; - broad subject categories where applicable; - selectional restrictions; - a meaning definition, stated in terms of a restricted Defining Vocabulary, and processed to remove stoplist-words and repetitions. A semantic tagger meeting this description can be derived from the Longman Dictionary of Contemporary English, if combined with a robust lemmatiser; allowing automated semantic tagging of large English corpora such as LOB and BNC

    A Hybrid Environment for Syntax-Semantic Tagging

    Full text link
    The thesis describes the application of the relaxation labelling algorithm to NLP disambiguation. Language is modelled through context constraint inspired on Constraint Grammars. The constraints enable the use of a real value statind "compatibility". The technique is applied to POS tagging, Shallow Parsing and Word Sense Disambigation. Experiments and results are reported. The proposed approach enables the use of multi-feature constraint models, the simultaneous resolution of several NL disambiguation tasks, and the collaboration of linguistic and statistical models.Comment: PhD Thesis. 120 page

    A new approach for extracting inter-word semantic relationship from a contemporary Chinese thesaurus.

    Get PDF
    by Lam Sze-sing.Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 119-123).Chapter CHAPTER 1 --- INTRODUCTION --- p.1Chapter 1.1 --- Introduction --- p.1Chapter 1.2 --- Statement of Thesis --- p.5Chapter 1.3 --- Organization of this Thesis --- p.6Chapter CHAPTER 2 --- RELATED WORK --- p.8Chapter 2.1 --- Overview --- p.8Chapter 2.2 --- Corpus-Based Knowledge Acquisition --- p.12Chapter 2.3 --- Linguistic-Based Knowledge Acquisition --- p.18Chapter 2.3.1 --- Knowledge Acquisition from Standard Dictionaries --- p.18Chapter 2.3.2 --- Knowledge Acquisition from Standard Thesauri --- p.23Chapter 2.4 --- Remarks --- p.24Chapter CHAPTER 3 --- A METHOD TO EXTRACT THE INTER-WORD SEMANTIC RELATIONSHIP FROM《同義詞詞林》 --- p.25Chapter 3.1 --- Background --- p.25Chapter 3.1.1 --- Structure of《《同義詞詞林》 --- p.26Chapter 3.1.2 --- Knowledge Representation of a Machine Tractable Thesaurus --- p.28Chapter 3.1.3 --- Extracting the Semantic Knowledge by Simple Co-occurrence --- p.28Chapter 3.2 --- Association Network --- p.31Chapter 3.3 --- Semantic Association Model --- p.33Chapter 3.3.1 --- Problems with the Simple Co-occurrence Method --- p.34Chapter 3.3.2 --- Methodology of Semantic Association Model --- p.39Chapter 3.4 --- Inter-word Semantic Function ..… --- p.51Chapter CHAPTER 4 --- NOUN-VERB-NOUN COMPOUND WORD DETECTION : AN EXPERIMENT --- p.55Chapter 4.1 --- Overview --- p.56Chapter 4.2 --- N-V-N Compound Word Detection Model --- p.61Chapter 4.3 --- Experimental Results of N-V-N Compound Word Detection --- p.63Chapter CHAPTER 5 --- WORD SENSE DISAMBIGUATION : AN APPLICATION … --- p.66Chapter 5.1 --- Overview --- p.67Chapter 5.2 --- Word-Sense Disambiguation Model --- p.72Chapter 5.2.1 --- Linguistic Resource --- p.72Chapter 5.2.2 --- The LSD-C Algorithm --- p.73Chapter 5.2.3 --- LSD-C in Action --- p.78Chapter 5.3 --- Experimental Results of Word Sense Disambiguation --- p.83Chapter CHAPTER 6 --- CONCLUSIONS & FURTHER RESEARCH --- p.93Chapter 6.1 --- Conclusions --- p.93Chapter 6.2 --- Further Research --- p.96Chapter 6.2.1 --- Enriching the Knowledge --- p.96Chapter 6.2.2 --- Enhancing the N-V-N Compound Word Detection Model --- p.98Chapter 6.2.3 --- Enhancing the LSD-C Algorithm --- p.99APPENDICES --- p.101Appendix A - Dependency Grammar --- p.101Appendix B - Sample Articles from a Local Chinese Newspaper --- p.104Appendix C - Ambiguous Words with the Senses Given by《現代漢語詞 典》 --- p.108Appendix D - List of Stop Words for the Testing Samples --- p.117REFERENCES --- p.11

    Análisis contrastivo inglés-ruso de resúmenes de artículos de investigación del ámbito de geociencias

    Get PDF
    Un buen dominio del género textual del Resumen resulta fundamental para satisfacer las expectativas de la comunidad científica. Hasta la fecha ya contamos con diferentes investigaciones sobre este género en diversas disciplinas, si bien el Resumen en el ámbito de geociencias ha sido menos estudiado. Por otro lado, el enfoque adoptado en la mayoría de esos estudios se basa en las dicotomías nativo/no nativo. No obstante, el análisis de los resúmenes escritos por rusohablantes todavía presenta un campo de investigación poco explorado. El presente trabajo tiene por objetivo llevar a cabo una comparación lingüística de resúmenes escritos en inglés por geocientíficos noveles rusos, por un lado, y por expertos ingleses nativos, por el otro. Para ello se ha recopilado un corpus de resúmenes geocientíficos en inglés. El análisis multidimensional del corpus generalmente confirma los estudios previos sobre el tema, sin embargo, ha mostrado unas características diferentes en los resúmenes rusos.Mastering the genre of the research article abstract is crucially important to meet the expectations of a discourse community in a particular scientific field. To date, research has shed light on how abstracts are written in various disciplines. However, few if any attempts have been made to analyse the abstract in geoscience. Furthermore, several studies have investigated the genre of abstract drawing on native/non-native, expert/apprentice dichotomies. Even so, there has not been sufficient investigation into abstracts written by Russian native speakers. This study therefore aims to carry out a cross-linguistic comparison of abstracts written in English by Russian novice researchers and native English-speaking experts in geoscience. For this purpose, a monolingual English corpus of research articles in geoscience was created. The results of Biber’s multidimensional analysis generally confirm previous findings about abstracts in hard sciences, though they allow for hypotheses on some distinctive features of abstracts written by Russian geoscientists

    Diagnosing Reading strategies: Paraphrase Recognition

    Get PDF
    Paraphrase recognition is a form of natural language processing used in tutoring, question answering, and information retrieval systems. The context of the present work is an automated reading strategy trainer called iSTART (Interactive Strategy Trainer for Active Reading and Thinking). The ability to recognize the use of paraphrase—a complete, partial, or inaccurate paraphrase; with or without extra information—in the student\u27s input is essential if the trainer is to give appropriate feedback. I analyzed the most common patterns of paraphrase and developed a means of representing the semantic structure of sentences. Paraphrases are recognized by transforming sentences into this representation and comparing them. To construct a precise semantic representation, it is important to understand the meaning of prepositions. Adding preposition disambiguation to the original system improved its accuracy by 20%. The preposition sense disambiguation module itself achieves about 80% accuracy for the top 10 most frequently used prepositions. The main contributions of this work to the research community are the preposition classification and generalized preposition disambiguation processes, which are integrated into the paraphrase recognition system and are shown to be quite effective. The recognition model also forms a significant part of this contribution. The present effort includes the modeling of the paraphrase recognition process, featuring the Syntactic-Semantic Graph as a sentence representation, the implementation of a significant portion of this design demonstrating its effectiveness, the modeling of an effective preposition classification based on prepositional usage, the design of the generalized preposition disambiguation module, and the integration of the preposition disambiguation module into the paraphrase recognition system so as to gain significant improvement

    Inquiries into the lexicon-syntax relations in Basque

    Get PDF
    Index:- Foreword. B. Oyharçabal.- Morphosyntactic disambiguation and shallow parsing in computational processing in Basque. I. Aduriz, A. Díaz de Ilarraza.- The transitivity of borrowed verbs in Basque: an outline. X. Alberdi.- Patrixa: a unification-based parser for Basque and its application to the automatic analysis of verbs. I. Aldezabal, M. J. Aranzabe, A. Atutxa, K.Gojenola, K, Sarasola.- Learning argument/adjunct distinction for Basque. I. Aldezabal, M. J. Aranzabe, K. Gojenola, K, Sarasola, A. Atutxa.- Analyzing verbal subcategorization aimed at its computation application. I. Aldezabal, P. Goenaga.- Automatic extraction of verb paterns from “hauta-lanerako euskal hiztegia”. J. M. Arriola, X. Artola, A. Soroa.- The case of an enlightening, provoking an admirable Basque derivational siffux with implications for the theory of argument structure. X. Artiagoitia.- Verb-deriving processes in Basque. J. C. Odriozola.- Lexical causatives and causative alternation in Basque. B. Oyharçabal.- Causation and semantic control; diagnosis of incorrect use in minorized languages. I. Zabala.- Subject index.- Contributions
    corecore