100 research outputs found
On Multilingual Training of Neural Dependency Parsers
We show that a recently proposed neural dependency parser can be improved by
joint training on multiple languages from the same family. The parser is
implemented as a deep neural network whose only input is orthographic
representations of words. In order to successfully parse, the network has to
discover how linguistically relevant concepts can be inferred from word
spellings. We analyze the representations of characters and words that are
learned by the network to establish which properties of languages were
accounted for. In particular we show that the parser has approximately learned
to associate Latin characters with their Cyrillic counterparts and that it can
group Polish and Russian words that have a similar grammatical function.
Finally, we evaluate the parser on selected languages from the Universal
Dependencies dataset and show that it is competitive with other recently
proposed state-of-the art methods, while having a simple structure.Comment: preprint accepted into the TSD201
A derivational model of discontinuous parsing
The notion of latent-variable probabilistic context-free derivation of syntactic structures is enhanced to allow heads and unrestricted discontinuities. The chosen formalization covers both constituent parsing and dependency parsing. The derivational model is accompanied by an equivalent probabilistic automaton model. By the new framework, one obtains a probability distribution over the space of all discontinuous parses. This lends itself to intrinsic evaluation in terms of perplexity, as shown in experiments.Postprin
Taking SPARQL 1.1 extensions into account in the SWIP system
International audienceThe SWIP system aims at hiding the complexity of expressing a query in a graph query language such as SPARQL. We propose a mechanism by which a query expressed in natural language is translated into a SPARQL query. Our system analyses the sentence in order to exhibit concepts, instances and relations. Then it generates a query in an internal format called the pivot language. Finally, it selects pre-written query patterns and instantiates them with regard to the keywords of the initial query. These queries are presented by means of explicative natural language sentences among which the user can select the query he/she is actually interested in. We are currently focusing on new kinds of queries which are handled by the new version of our system, which is now based on the 1.1 version of SPARQL
Integrating isotopes and documentary evidence : dietary patterns in a late medieval and early modern mining community, Sweden
We would like to thank the Archaeological Research Laboratory, Stockholm University, Sweden and the Tandem Laboratory (Ă
ngström Laboratory), Uppsala University, Sweden, for undertaking the analyses of stable nitrogen and carbon isotopes in both human and animal collagen samples. Also, thanks to Elin Ahlin Sundman for providing the ÎŽ13C and ÎŽ15N values for animal references from VĂ€sterĂ„s. This research (BĂ€ckströmâs PhD employment at Lund University, Sweden) was supported by the Berit Wallenberg Foundation (BWS 2010.0176) and Jakob and Johan Söderbergâs foundation. The âSala projectâ (excavations and analyses) has been funded by Riksens Clenodium, Jernkontoret, Birgit and Gad Rausingâs Foundation, SAUâs Research Foundation, the Royal Physiographic Society of Lund, Berit Wallenbergs Foundation, Ă
ke Wibergs Foundation, Lars Hiertas Memory, Helge Ax:son Johnsonâs Foundation and The Royal Swedish Academy of Sciences.Peer reviewedPublisher PD
Splitting Arabic Texts into Elementary Discourse Units
International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system
- âŠ