8,195 research outputs found
One model, two languages: training bilingual parsers with harmonized treebanks
We introduce an approach to train lexicalized parsers using bilingual corpora
obtained by merging harmonized treebanks of different languages, producing
parsers that can analyze sentences in either of the learned languages, or even
sentences that mix both. We test the approach on the Universal Dependency
Treebanks, training with MaltParser and MaltOptimizer. The results show that
these bilingual parsers are more than competitive, as most combinations not
only preserve accuracy, but some even achieve significant improvements over the
corresponding monolingual parsers. Preliminary experiments also show the
approach to be promising on texts with code-switching and when more languages
are added.Comment: 7 pages, 4 tables, 1 figur
Using the Variationist Comparative Method to Examine the Role of Language Contact in Synthetic and Periphrastic Verbs in Spanish
Language contact and linguistic change are thought to go hand in hand (e.g. Silva-Corvalán 1994), however there are methodological obstacles, such as collecting data at different points in time or the availability of monolingual data for comparison, that make claims about language change tenuous. The present study draws on two different corpora of spoken Spanish — bilingual New Mexican Spanish and monolingual Ecuadorian Spanish — in order to quantitatively assess the convergence hypothesis in which contact with English has produced a change to the Spanish verbal system, as reflected in an extension of the Present and Past Progressive forms at the expense of the synthetic Simple Present and Imperfect forms. The data do not show that the Spanish spoken by the bilinguals is changing to more closely resemble the analogous English progressive constructions, but instead suggest potential weakening of linguistic constraints on the conditioning of the variation between periphrastic and synthetic forms
- …