9,069 research outputs found

    Handling unknown words in statistical latent-variable parsing models for Arabic, English and French

    Get PDF
    This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique often used in parsers which is based solely on word frequencies. This study is applied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate information about Arabic affixes and morphotactics into a PCFG-LA parser and obtain stateof-the-art accuracy. We also show that these morphological clues can be learnt automatically from an annotated corpus

    Methods for Amharic part-of-speech tagging

    Get PDF
    The paper describes a set of experiments involving the application of three state-of- the-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for Eng- lish, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy ap- proach, while HMM-based and SVM- based taggers got comparable results

    Comparing a statistical and a rule-based tagger for German

    Full text link
    In this paper we present the results of comparing a statistical tagger for German based on decision trees and a rule-based Brill-Tagger for German. We used the same training corpus (and therefore the same tag-set) to train both taggers. We then applied the taggers to the same test corpus and compared their respective behavior and in particular their error rates. Both taggers perform similarly with an error rate of around 5%. From the detailed error analysis it can be seen that the rule-based tagger has more problems with unknown words than the statistical tagger. But the results are opposite for tokens that are many-ways ambiguous. If the unknown words are fed into the taggers with the help of an external lexicon (such as the Gertwol system) the error rate of the rule-based tagger drops to 4.7%, and the respective rate of the statistical taggers drops to around 3.7%. Combining the taggers by using the output of one tagger to help the other did not lead to any further improvement.Comment: 8 page

    Story Cloze Ending Selection Baselines and Data Examination

    Full text link
    This paper describes two supervised baseline systems for the Story Cloze Test Shared Task (Mostafazadeh et al., 2016a). We first build a classifier using features based on word embeddings and semantic similarity computation. We further implement a neural LSTM system with different encoding strategies that try to model the relation between the story and the provided endings. Our experiments show that a model using representation features based on average word embedding vectors over the given story words and the candidate ending sentences words, joint with similarity features between the story and candidate ending representations performed better than the neural models. Our best model achieves an accuracy of 72.42, ranking 3rd in the official evaluation.Comment: Submission for the LSDSem 2017 - Linking Models of Lexical, Sentential and Discourse-level Semantics - Shared Tas

    MATREX: DCU machine translation system for IWSLT 2006

    Get PDF
    In this paper, we give a description of the machine translation system developed at DCU that was used for our first participation in the evaluation campaign of the International Workshop on Spoken Language Translation (2006). This system combines two types of approaches. First, we use an EBMT approach to collect aligned chunks based on two steps: deterministic chunking of both sides and chunk alignment. We use several chunking and alignment strategies. We also extract SMT-style aligned phrases, and the two types of resources are combined. We participated in the Open Data Track for the following translation directions: Arabic-English and Italian-English, for which we translated both the single-best ASR hypotheses and the text input. We report the results of the system for the provided evaluation sets

    No és la meva competència intercultural, soc jo.’ La identitat intercultural del professorat de llengües estrangeres en formació

    Get PDF
    This article examines the intercultural identity of pre-service foreign language teachers to determine whether they are aware of their intercultural stance and that of others, and whether they portray an identifiable emergent professional persona in relation to interculturality. The ultimate goal of this study is to identify common traits on which to focus future teacher training. The results show that these prospective teachers display an incipient intercultural identity characterised by a tendency to avoid agency and a certain shortage of intercultural knowledge, yet they are notably concerned about their professional image and their responsibility in work environmentsEste artículo indaga en la identidad intercultural del profesorado de lenguas extranjeras en formación para averiguar si es consciente de su postura intercultural y de la de otros, y si presenta una imagen profesional emergente reconocible en relación con la interculturalidad. El objetivo final de este estudio es identificar rasgos comunes en los que podría centrarse la formación del profesorado. Los resultados demuestran que este futuro profesorado muestra una identidad intercultural incipiente caracterizada por su tendencia a evitar la intervención y por una cierta falta de conocimiento intercultural, aunque está notablemente preocupado por su imagen profesional y su responsabilidad en contextos laboralesAquest article indaga en la identitat intercultural del professorat de llengües estrangeres en formació per esbrinar si és conscient de la seva postura intercultural i de la dels altres, i si presenta una imatge professional emergent recognoscible en relació amb la interculturalitat. L’objectiu final d’aquest estudi és identificar trets comuns en els quals es podria centrar la formació del professorat. Els resultats demostren que aquest futur professorat té una identitat intercultural incipient caracteritzada per la seva tendència a evitar la intervenció i per una certa falta de coneixement intercultural, encara que està notablement preocupat per la seva imatge professional i la seva responsabilitat en els contextos laboralsThis work was supported by the Spanish Ministry of Economy and Competition under Grant FFI2016-77540

    Automatisation of intonation modelling and its linguistic anchoring

    Get PDF
    This paper presents a fully machine-driven approach for intonation description and its linguistic interpretation. For this purpose,a new intonation model for bottom-up F0 contour analysis and synthesis is introduced, the CoPaSul model which is designed in the tradition of parametric, contour-based, and superpositional approaches. Intonation is represented by a superposition of global and local contour classes that are derived from F0 parameterisation. These classes were linguistically anchored with respect to information status by aligning them with a text which had been coarsely analysed for this purpose by means of NLP techniques. To test the adequacy of this data-driven interpretation a perception experiment was carried out, which confirmed 80% of the findings
    corecore