23 research outputs found

    Inducing Baseform Models from a Swedish Vocabulary Pool

    Get PDF
    Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. 51-58

    Extending the View: Explorations in Bootstrapping a Swedish PoS Tagger

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 34-40. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    Swedish CLARIN activities

    Get PDF
    Proceedings of the NODALIDA 2009 workshop Nordic Perspectives on the CLARIN Infrastructure of Language Resources. Editors: Rickard Domeij, Kimmo Koskenniemi, Steven Krauwer, Bente Maegaard, Eiríkur Rögnvaldsson and Koenraad de Smedt. NEALT Proceedings Series, Vol. 5 (2009), 1-5. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9207

    Training a Super Model Look-Alike: Featuring Edit Distance, N-Gram Occurrence, and One Reference Translation

    No full text
    Two string comparison measures, edit distance and n-gram co-occurrence, are tested for automatic evaluation of translation quality, where the quality is compared to one or several reference translations. The measures are tested in combination for diagnostic evaluation on segments. Both measures have been used for evaluation of translation quality before, but for another evaluation purpose (performance) and with another granularity (system). Preliminary experiments showed that the measures are not portable without redefinitions, so two new measures are defined, WAFT and NEVA. The new measures could be applied for both purposes and granularities

    Feature Combination for Genre Classification

    No full text
    In this paper, we describe an experiment on genre classification of Swedish texts, using as predictors the frequency of the top 50 most frequent words in the text collection Stockholm-Umeå Corpus (SUC). The purpose of this particular experiment was to find out if the combination of features in a fully-connected feedforward multi-layer perceptron (MLP) gives better classification than single features in a decision tree. The 1,040 text samples in SUC, classified into 9 major genres, were divided into 10 sets, and used for 10-fold cross-validation training of 10 MLPs (50-7-9), where the hidden layer is supposed to correspond to the 7 stylistic dimensions of Biber (1995). The result was better than for a previous experiment using a decision tree (48.6 vs. 58.8 % misclassification). Given the simplicity of the predictors, the sparse data and skewed distribution of genres in the text collection, the result is rather promising. In order to explain the knowledge learnt by the MLPs, we also extracted decision trees from the input and output of the MLPs. Extra input was generated by sampling from the feature space of the original training data. The resulting trees used finer distinctions (more branches) than the tree from the previous experiment, about the same features but with additional split points, and a few more features.

    Revision of Part-of-Speech Tagging in Stockholm Umeå Corpus 2.0

    No full text
    Many parsers use a part-of-speech tagger as a first step in parsing. The accuracy of the tagger naturally affects the performance of the parser. In this experiment, we revise 1500+ proposed errors in SUC 2.0 that were mainly found during work with schema parsing, and evaluate tagger instances trained on the revised corpus. The revisions turned out to be beneficial also for the taggers.Samarbete med Eva Forsbom, Uppsala universite
    corecore