Search CORE

7 research outputs found

Sequence Tagging for Fast Dependency Parsing

Author: Gómez-Rodríguez Carlos
Strzyz Michalina
Vilares David
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

[Abstract] Dependency parsing has been built upon the idea of using parsing methods based on shift-reduce or graph-based algorithms in order to identify binary dependency relations between the words in a sentence. In this study we adopt a radically different approach and cast full dependency parsing as a pure sequence tagging task. In particular, we apply a linearization function to the tree that results in an output label for each token that conveys information about the word’s dependency relations. We then follow a supervised strategy and train a bidirectional long short-term memory network to learn to predict such linearized trees. Contrary to the previous studies attempting this, the results show that this approach not only leads to accurate but also fast dependency parsing. Furthermore, we obtain even faster and more accurate parsers by recasting the problem as multitask learning, with a twofold objective: to reduce the output vocabulary and also to exploit hidden patterns coming from a second parsing paradigm (constituent grammars) when used as an auxiliary task.Ministerio de Economía y Competitividad; TIN2017-85160-C2-1-RXunta de Galicia; ED431B 2017/0

Repositorio da Universidade da Coruña

Crossref

Parsing as Pretraining

Author: Gómez-Rodríguez Carlos
Strzyz Michalina
Søgaard Anders
Vilares David
Publication venue
Publication date: 01/01/2020
Field of study

[Abstract] Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures – and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and LAS, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the PTB (93.5%) and end-to-end EN-EWT UD (78.8%).We thank Mark Anderson and Daniel Hershcovich for their comments. DV, MS and CGR are funded by the ERC under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant No 714150), by the ANSWER-ASAP project (TIN2017-85160-C2-1-R) from MINECO, and by Xunta de Galicia (ED431B 2017/01). AS is funded by a Google Focused Research AwardXunta de Galicia; ED431B 2017/0

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

Copenhagen University Research Information System

Association for the Advancement of Artificial Intelligence: AAAI Publications

"Jeg forstår (ikke) norsken din!". En sosiolingvistisk studie i forståelse av norske dialekter blant polske studenter i Oslo.

Author: Strzyz Michalina Maria
Publication venue
Publication date: 01/01/2013
Field of study

Denne oppgaven handler om polske norskinnlæreres evne til å forstå og lokalisere fem norske dialekter. Temaet blir belyst ved hjelp av kvalitative og kvantitative metoder. Ti polske informanter svarte på en spørreundersøkelse og tok en dialekttest. En kontrollgruppe bestående av fem norske informanter tok også dialekttesten. Dette gjorde det mulig å sammenligne resultatene til målgruppen med kontrollgruppen. Informantene ble testet i dialektene Oslo, Bergen, Tromsø, Stavanger og Trondheim. Analysen av resultatene av undersøkelsen ble gjennomført i tre deler. Den første delen viser i hvilken grad de polske og norske informantene svarte riktig på spørsmål som testet generell forståelse av de enkelte dialektene. Den andre delen viser i hvilken grad kontroll- og målgruppen lokaliserte de fem dialektene riktig geografisk. I den tredje delen er mulige faktorer som kan ha påvirket resultatene til de polske informantene presentert. Analysen viste at de polske informantene oppnådde lavere resultater på dialekttesten enn de norske informantene. Hypotesen om at de polske informantene forstår oslodialekt bedre enn de andre fire norske dialektene ble bekreftet. I tillegg var det stor variasjon i resultatene til de polske informantene. Dette gjelder både generell forståelse og geografisk lokalisering av de norske dialektene. Videre viser undersøkelsen at de polske informantene hadde liten kjennskap til dialektale ord. I tillegg hadde de problemer med å gjenkjenne ord de hadde kjenskap til, noe som kan tyde på at de kun har lagret en fonologisk representasjon av et ord og at representasjonen ikke alltid omfatter uttale av det samme ordet på en annen dialekt. Enkel regresjonsanalyse viser at de polske informantene som hadde bodd lengst i Norge gjennomsnittlig skåret best i generell forståelse av de utvalgte dialektene, men skåret dårligere på geografisk lokalisering. Studien tyder på at ”fortrolighetseffekten” (eng. ”familiarity effect”) mellom norske dialekter ser ut til å ha funnet sted hos de polske informantene. Denne undersøkelsen tyder på at andrespråksinnlærere kan ha nytte av å få mer opplæring i norske dialekter, for eksempel mer trening i å lytte til norske dialekter, opplæring i dialekttrekk, dialektale ord og dialektenes geografi på kurs for andrespråksinnlærere av norsk

NORA - Norwegian Open Research Archives