9,069 research outputs found
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French
This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique
often used in parsers which is based solely on word frequencies. This study is applied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate information
about Arabic affixes and morphotactics into a PCFG-LA parser and obtain stateof-the-art accuracy. We also show that these morphological clues can be learnt automatically
from an annotated corpus
Methods for Amharic part-of-speech tagging
The paper describes a set of experiments
involving the application of three state-of-
the-art part-of-speech taggers to Ethiopian
Amharic, using three different tagsets.
The taggers showed worse performance
than previously reported results for Eng-
lish, in particular having problems with
unknown words. The best results were
obtained using a Maximum Entropy ap-
proach, while HMM-based and SVM-
based taggers got comparable results
Comparing a statistical and a rule-based tagger for German
In this paper we present the results of comparing a statistical tagger for
German based on decision trees and a rule-based Brill-Tagger for German. We
used the same training corpus (and therefore the same tag-set) to train both
taggers. We then applied the taggers to the same test corpus and compared their
respective behavior and in particular their error rates. Both taggers perform
similarly with an error rate of around 5%. From the detailed error analysis it
can be seen that the rule-based tagger has more problems with unknown words
than the statistical tagger. But the results are opposite for tokens that are
many-ways ambiguous. If the unknown words are fed into the taggers with the
help of an external lexicon (such as the Gertwol system) the error rate of the
rule-based tagger drops to 4.7%, and the respective rate of the statistical
taggers drops to around 3.7%. Combining the taggers by using the output of one
tagger to help the other did not lead to any further improvement.Comment: 8 page
Story Cloze Ending Selection Baselines and Data Examination
This paper describes two supervised baseline systems for the Story Cloze Test
Shared Task (Mostafazadeh et al., 2016a). We first build a classifier using
features based on word embeddings and semantic similarity computation. We
further implement a neural LSTM system with different encoding strategies that
try to model the relation between the story and the provided endings. Our
experiments show that a model using representation features based on average
word embedding vectors over the given story words and the candidate ending
sentences words, joint with similarity features between the story and candidate
ending representations performed better than the neural models. Our best model
achieves an accuracy of 72.42, ranking 3rd in the official evaluation.Comment: Submission for the LSDSem 2017 - Linking Models of Lexical,
Sentential and Discourse-level Semantics - Shared Tas
MATREX: DCU machine translation system for IWSLT 2006
In this paper, we give a description of the machine translation system developed at DCU that was used for our first participation in the evaluation campaign of the International Workshop on Spoken Language Translation (2006). This system combines two types of approaches. First, we use an EBMT approach to collect aligned chunks based on two steps: deterministic chunking of both sides and chunk alignment. We use several chunking and alignment strategies. We also extract SMT-style aligned phrases, and the two types of resources are combined.
We participated in the Open Data Track for the following
translation directions: Arabic-English and Italian-English,
for which we translated both the single-best ASR hypotheses
and the text input. We report the results of the system for
the provided evaluation sets
No és la meva competència intercultural, soc jo.’ La identitat intercultural del professorat de llengües estrangeres en formació
This article examines the intercultural identity of pre-service foreign language teachers to
determine whether they are aware of their intercultural stance and that of others, and
whether they portray an identifiable emergent professional persona in relation to interculturality.
The ultimate goal of this study is to identify common traits on which to focus
future teacher training. The results show that these prospective teachers display an incipient
intercultural identity characterised by a tendency to avoid agency and a certain shortage
of intercultural knowledge, yet they are notably concerned about their professional
image and their responsibility in work environmentsEste artículo indaga en la identidad intercultural del profesorado de lenguas extranjeras en
formación para averiguar si es consciente de su postura intercultural y de la de otros, y si
presenta una imagen profesional emergente reconocible en relación con la interculturalidad.
El objetivo final de este estudio es identificar rasgos comunes en los que podría centrarse la
formación del profesorado. Los resultados demuestran que este futuro profesorado muestra
una identidad intercultural incipiente caracterizada por su tendencia a evitar la intervención
y por una cierta falta de conocimiento intercultural, aunque está notablemente preocupado
por su imagen profesional y su responsabilidad en contextos laboralesAquest article indaga en la identitat intercultural del professorat de llengües estrangeres en
formació per esbrinar si és conscient de la seva postura intercultural i de la dels altres, i si
presenta una imatge professional emergent recognoscible en relació amb la interculturalitat.
L’objectiu final d’aquest estudi és identificar trets comuns en els quals es podria centrar la
formació del professorat. Els resultats demostren que aquest futur professorat té una identitat
intercultural incipient caracteritzada per la seva tendència a evitar la intervenció i per una
certa falta de coneixement intercultural, encara que està notablement preocupat per la seva
imatge professional i la seva responsabilitat en els contextos laboralsThis work was supported by the Spanish Ministry of Economy and Competition under Grant
FFI2016-77540
Automatisation of intonation modelling and its linguistic anchoring
This paper presents a fully machine-driven approach for intonation description and its linguistic interpretation. For this purpose,a new intonation model for bottom-up F0 contour analysis and synthesis is introduced, the CoPaSul model which is designed in the tradition of parametric, contour-based, and superpositional approaches. Intonation is represented by a superposition of global and local contour classes that are derived from F0 parameterisation. These classes were linguistically anchored with respect to information status by aligning them with a text which had been coarsely analysed for this purpose by means of NLP techniques. To test the adequacy of this data-driven interpretation a perception experiment was carried out, which confirmed 80% of the findings
- …