13 research outputs found

    Advances in Weakly Supervised Learning of Morphology

    Get PDF
    Morphological analysis provides a decomposition of words into smaller constituents. It is an important problem in natural language processing (NLP), particularly for morphologically rich languages whose large vocabularies make statistical modeling difficult. Morphological analysis has traditionally been approached with rule-based methods that yield accurate results, but are expensive to produce. More recently, unsupervised machine learning methods have been shown to perform sufficiently well to benefit applications such as speech recognition and machine translation. Unsupervised methods, however, do not typically model allomorphy, that is, non-concatenative structure, for example pretty/prettier. Moreover, the accuracy of unsupervised methods remains far behind rule-based methods with the best unsupervised methods yielding between 50-66% F-score in Morpho Challenge 2010. We examine these problems with two approaches that have not previously attracted much attention in the field. First, we propose a novel extension to the popular unsupervised morphological segmentation method Morfessor Baseline to model allomorphy via the use of string transformations. Second, we examine the effect of weak supervision on accuracy by training on a small annotated data set in addition to a large unannotated data set. We propose two novel semi-supervised morphological segmentation methods, namely a semi-supervised extension of Morfessor Baseline and morphological segmentation with conditional random fields (CRF). The methods are evaluated on several languages with different morphological characteristics, including English, Estonian, Finnish, German and Turkish. The proposed methods are compared empirically to recently proposed weakly supervised methods. For the non-concatenative extension, we find that, while the string transformations identified by the model have high precision, their recall is low. In the overall evaluation the non-concatenative extension improves accuracy on English, but not on other languages. For the weak supervision we find that the semi-supervised extension of Morfessor Baseline improves the accuracy of segmentation markedly over the unsupervised baseline. We find, however, that the discriminatively trained CRFs perform even better. In the empirical comparison, the CRF approach outperforms all other approaches on all included languages. Error analysis reveals that the CRF excels especially on affix accuracy

    Evaluating the effect of word frequencies in a probabilistic generative model of morphology

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 230-237. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

    In search for volta : statistical analysis of word patterns in Shakespeare's sonnets

    Get PDF
    Helsinki University of Technology; 951-22-7734-4;Peer reviewe

    The interplay between cognitive, conative, and affective constructs along the entrepreneurial learning process

    Get PDF
    Purpose: Although the role of reflections in entrepreneurship education is undeniable, the research has focused mainly on their advantages and consequences for learning process, whereas their dynamics and interrelations with other mental processes remain unexplored. The purpose of this paper is to better understand how personality and intelligence constructs: cognition, conation, and affection evolve and change along the learning process during entrepreneurship education. Design/methodology/approach: To better understand reflective processes in entrepreneurial learning this paper adopts the tripartite constructs of personality and intelligence. By employing longitudinal explorative research approach and self-organizing map (SOM) algorithm, the authors follow students’ reflections during their two-year learning processes. First, the authors try to identify how the interplay between the cognitive, conative, and affective aspects emerges in students’ reflections. Then, the authors investigate how this interplay evolves during the individual learning process and finally, by looking for similarities in these learning pathways, the authors aim to identify patterns of students’ reflective learning process. Findings: All constructs are present during the learning process and all are prone to change. The individual constructs alone shed no light on the interplay between different constructs, but rather that the interplay between sub-constructs should be taken into consideration as well. This seems to be particularly true for cognition, as procedural and declarative knowledge have very different profiles. Procedural knowledge emerges together with emotions, motivation, and volition, whereas the profile of declarative knowledge is individual. The unique profile of declarative knowledge in students’ reflections is an important finding as declarative knowledge is regarded as the center of current pedagogic practices. Research limitations/implications The study broadens the understanding of reflective practices in the entrepreneurial learning process and the interplay between affective, cognitive, and conative sub-constructs and reflective practices in entrepreneurship education. The findings clearly indicate the need for further research on the interplay between sub-constructs and students’ reflection profiles. The authors see the study as an attempt to apply an exploratory statistical method for the problem in question. Practical implications: The results are able to advise pedagogy. Practical implications concern the need to develop reflective practises in entrepreneurial learning interventions to enhance all three meta-competencies, even though there are so far no irrefutable findings to indicate that some types of reflection may be better than others. Originality/value: The results of the analysis indicate that it is possible to study the complex and dynamic interplay between sub-constructs of cognitive, conative and affective constructs. Moreover, the research succeeded in identifying both individual variations and general reflection patterns and changes in these during the learning process. This was possible by adopting a longitudinal explorative research approach with SOM analyses.Peer reviewe

    A Constructionist Approach to Grammar Inference

    No full text
    Models based on Chomskian grammar are not as pervasive in certain NLP applications as might be expected from their status in linguistics, comparing to, e.g., statistical n-gram models or vector space models. In this paper, we look at grammar inference from a viewpoint of constructionist theories of language, which are in contrast to the Chomskian tradition, and avoid the separation of morphology, syntax, semantics and pragmatics. We consider the utility of constructionist theories in NLP applications, present a computational framework for learning constructions, and discuss related experimental work.

    A comparative study of minimally supervised morphological segmentation

    No full text
    VK: Kaski, S.This article presents a comparative study of a subfield of morphology learning referred to as minimally supervised morphological segmentation. In morphological segmentation, word forms are segmented into morphs, the surface forms of morphemes. In the minimally supervised data-driven learning setting, segmentation models are learned from a small number of manually annotated word forms and a large set of unannotated word forms. In addition to providing a literature survey on published methods, we present an in-depth empirical comparison on three diverse model families, including a detailed error analysis. Based on the literature survey, we conclude that the existing methodology contains substantial work on generative morph lexicon-based approaches and methods based on discriminative boundary detection. As for which approach has been more successful, both the previous work and the empirical evaluation presented here strongly imply that the current state of the art is yielded by the discriminative boundary detection methodology.Peer reviewe
    corecore