76 research outputs found

    Multilingual Lexicon Extraction under Resource-Poor Language Pairs

    Get PDF
    In general, bilingual and multilingual lexicons are important resources in many natural language processing fields such as information retrieval and machine translation. Such lexicons are usually extracted from bilingual (e.g., parallel or comparable) corpora with external seed dictionaries. However, few such corpora and bilingual seed dictionaries are publicly available for many language pairs such as Korean–French. It is important that such resources for these language pairs be publicly available or easily accessible when a monolingual resource is considered. This thesis presents efficient approaches for extracting bilingual single-/multi-word lexicons for resource-poor language pairs such as Korean–French and Korean–Spanish. The goal of this thesis is to present several efficient methods of extracting translated single-/multi-words from bilingual corpora based on a statistical method. Three approaches for single words and one approach for multi-words are proposed. The first approach is the pivot context-based approach (PCA). The PCA uses a pivot language to connect source and target languages. It builds context vectors from two parallel corpora sharing one pivot language and calculates their similarity scores to choose the best translation equivalents. The approach can reduce the effort required when using a seed dictionary for translation by using parallel corpora rather than comparable corpora. The second approach is the extended pivot context-based approach (EPCA). This approach gathers similar context vectors for each source word to augment its context. The approach assumes that similar vectors can enrich contexts. For example, young and youth can augment the context of baby. In the investigation described here, such similar vectors were collected by similarity measures such as cosine similarity. The third approach for single words uses a competitive neural network algorithm (i.e., self-organizing mapsSOM). The SOM-based approach (SA) uses synonym vectors rather than context vectors to train two different SOMs (i.e., source and target SOMs) in different ways. A source SOM is trained in an unsupervised way, while a target SOM is trained in a supervised way. The fourth approach is the constituent-based approach (CTA), which deals with multi-word expressions (MWEs). This approach reinforces the PCA for multi-words (PCAM). It extracts bilingual MWEs taking all constituents of the source MWEs into consideration. The PCAM 2 identifies MWE candidates by pointwise mutual information first and then adds them to input data as single units in order to use the PCA directly. The experimental results show that the proposed approaches generally perform well for resource-poor language pairs, particularly Korean and French–Spanish. The PCA and SA have demonstrated good performance for such language pairs. The EPCA would not have shown a stronger performance than expected. The CTA performs well even when word contexts are insufficient. Overall, the experimental results show that the CTA significantly outperforms the PCAM. In the future, homonyms (i.e., homographs such as lead or tear) should be considered. In particular, the domains of bilingual corpora should be identified. In addition, more parts of speech such as verbs, adjectives, or adverbs could be tested. In this thesis, only nouns are discussed for simplicity. Finally, thorough error analysis should also be conducted.Abstract List of Abbreviations List of Tables List of Figures Acknowledgement Chapter 1 Introduction 1.1 Multilingual Lexicon Extraction 1.2 Motivations and Goals 1.3 Organization Chapter 2 Background and Literature Review 2.1 Extraction of Bilingual Translations of Single-words 2.1.1 Context-based approach 2.1.2 Extended approach 2.1.3 Pivot-based approach 2.2 Extractiong of Bilingual Translations of Multi-Word Expressions 2.2.1 MWE identification 2.2.2 MWE alignment 2.3 Self-Organizing Maps 2.4 Evaluation Measures Chapter 3 Pivot Context-Based Approach 3.1 Concept of Pivot-Based Approach 3.2 Experiments 3.2.1 Resources 3.2.2 Results 3.3 Summary Chapter 4 Extended Pivot Context-Based Approach 4.1 Concept of Extended Pivot Context-Based Approach 4.2 Experiments 4.2.1 Resources 4.2.2 Results 4.3 Summary Chapter 5 SOM-Based Approach 5.1 Concept of SOM-Based Approach 5.2 Experiments 5.2.1 Resources 5.2.2 Results 5.3 Summary Chapter 6 Constituent-Based Approach 6.1 Concept of Constituent-Based Approach 6.2 Experiments 6.2.1 Resources 6.2.2 Results 6.3 Summary Chapter 7 Conclusions and Future Work 7.1 Conclusions 7.2 Future Work Reference

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    A Stalnakerian Analysis of Metafictive Statements

    Get PDF

    Proceedings of the 19th Amsterdam Colloquium

    Get PDF

    A Stalnakerian Analysis of Metafictive Statements

    Get PDF
    Because Stalnaker’s common ground framework is focussed on cooperative information exchange, it is challenging to model fictional discourse. To this end, I develop an extension of Stalnaker’s analysis of assertion that adds a temporary workspace to the common ground. I argue that my framework models metafictive discourse better than competing approaches that are based on adding unofficial common grounds

    Definiteness across languages

    Get PDF
    Definiteness has been a central topic in theoretical semantics since its modern foundation. However, despite its significance, there has been surprisingly scarce research on its cross-linguistic expression. With the purpose of contributing to filling this gap, the present volume gathers thirteen studies exploiting insights from formal semantics and syntax, typological and language specific studies, and, crucially, semantic fieldwork and cross-linguistic semantics, in order to address the expression and interpretation of definiteness in a diverse group of languages, most of them understudied. The papers presented in this volume aim to establish a dialogue between theory and data in order to answer the following questions: What formal strategies do natural languages employ to encode definiteness? What are the possible meanings associated to this notion across languages? Are there different types of definite reference? Which other functions (besides marking definite reference) are associated with definite descriptions? Each of the papers contained in this volume addresses at least one of these questions and, in doing so, they aim to enrich our understanding of definiteness

    Reflexive constructions in the world's languages

    Get PDF
    Synopsis: This landmark publication brings together 28 papers on reflexive constructions in languages from all continents, representing very diverse language types. While reflexive constructions have been discussed in the past from a variety of angles, this is the first edited volume of its kind. All the chapters are based on original data, and they are broadly comparable through a common terminological framework. The volume opens with two introductory chapters by the editors that set the stage and lay out the main comparative concepts, and it concludes with a chapter presenting generalizations on the basis of the studies of individual languages
    corecore