28,512 research outputs found

    Marrying Universal Dependencies and Universal Morphology

    Full text link
    The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. With compatibility of tags, each project's annotations could be used to validate the other's. Additionally, the availability of both type- and token-level resources would be a boon to tasks such as parsing and homograph disambiguation. To ease this interoperability, we present a deterministic mapping from Universal Dependencies v2 features into the UniMorph schema. We validate our approach by lookup in the UniMorph corpora and find a macro-average of 64.13% recall. We also note incompatibilities due to paucity of data on either side. Finally, we present a critical evaluation of the foundations, strengths, and weaknesses of the two annotation projects.Comment: UDW1

    On past participle agreement in transitive clauses in French

    Get PDF
    This paper provides a Minimalist analysis of past participle agreement in French in transitive clauses. Our account posits that the head v of vP in such structures carries an (accusativeassigning) structural case feature which may apply (with or without concomitant agreement) to case-mark a clause-mate object, the subject of a defective complement clause, or an intermediate copy of a preposed subject in spec-CP. In structures where a goal is extracted from vP (e.g. via wh-movement) v also carries an edge feature, and may also carry a specificity feature and a set of (number and gender) agreement features. We show how these assumptions account for agreement of a participle with a preposed specific clausemate object or defective-clause subject, and for the absence of agreement with an embedded object, with the complement of an impersonal verb, and with the subject of an embedded (finite or nonfinite) CP complement. We also argue that the absence of agreement marking (in expected contexts) on the participles faitmade and laissélet in infinitive structures is essentially viral in nature. Finally, we claim that obligatory participle agreement with reflexive and reciprocal objects arises because the derivation of reflexives involves A-movement and concomitant agreement

    Presupposition, perceptional relativity and translation theory

    Get PDF
    The intertwining of assertions and presuppositions in utterances affects the way a text is perceived in the source language (SL) and the target language (TL). Presuppositions can be thought of as shared assumptions that form the background of the asserted meaning. To translate presuppositions as assertions, or vice versa, can distort the thematic meaning of the SL text and produce a text with a different information structure. Since a good translation is not simply concerned with transferring the propositional content of the SL text, but also its other semantic and pragmatic components, including thematic meaning, a special attention should be accorded to the translation of presupposition. This article examines the intrinsic relation between presupposition and thematic meaning, why the concept is relevant to translation theory, and how presupposition can affect the structure and understanding of discourse. Unshared presuppositions are major obstacles in translation, as cultural concepts may be conveyed through expressions that yield presuppositions. To attain an optimal proximity to the SL text, presupposition needs to be singled out as a distinct aspect of meaning, and distinctions need to be made between definite and indefinite meaning, topic and comment, topic and focus, presupposition and entailment, and presupposition and implicature

    "Political Aspects of European Monetary Integration After WW II"

    Get PDF
    It is nearly impossible to fully understand the developments and problems of European economic and monetary integration without examining its historical and political background. European monetary integration has at times been propelled, at times been hindered by political motives and developments. Political motivations were already of great importance for the first stage of European economic and monetary integration in the framework of the OEEC and the European Payments Union in the forties and fifties. Even more for the creation of the European Communities (ECSC, EURATOM, EEC) in the fifties. However, considerations of national sovereignty and diverse if not conflicting concepts of economic and monetary policy priorities resulted in the virtual exclusion of economic and monetary policies from any serious attempts of coordination and harmonization. The completion of the EEC customs union and the monetary crises in the late sixties led to a major effort in economic and monetary policy integration culminating in the Werner Plan of 1970. The only result of these efforts was the "snake" which, while in the end shrinking to not much more than a DM zone, provided some framework for monetary cooperation. and The economic difficulties of the seventies on a European and global scale but also the desire to boost the political momentum behind European integration led to the EMS. Political considerations played an important role not only in reviving the process toward Economic and Monetary Union which led to the Delors Report but also in the specific design of the provisions of the Maastricht Treaty and in the premature notion of the EMS as a de facto monetary union. Contrary to many expectations, the EMS crises in 1992 and 1993 did not lead to an abandonment of the objectives of the Maastricht Treaty but resulted in efforts - for political as well as economic reasons - to salvage the EMS and to continue with efforts aimed at economic convergence for the core of the EMS countries. Strong adherence to national sovereignty, insufficient economic convergence, and difficulties in agreeing on meaningful elements of political union undermined until now attempts to create EMU. The substitution of national currencies - powerful symbols of national sovereignty and achievements - by an untested European currency requires progress in the mentioned areas

    Parsing and Evaluation. Improving Dependency Grammars Accuracy. Anàlisi Sintàctica Automàtica i Avaluació. Millora de qualitat per a Gramàtiques de Dependències

    Get PDF
    Because parsers are still limited in analysing specific ambiguous constructions, the research presented in this thesis mainly aims to contribute to the improvement of parsing performance when it has knowledge integrated in order to deal with ambiguous linguistic phenomena. More precisely, this thesis intends to provide empirical solutions to the disambiguation of prepositional phrase attachment and argument recognition in order to assist parsers in generating a more accurate syntactic analysis. The disambiguation of these two highly ambiguous linguistic phenomena by the integration of knowledge about the language necessarily relies on linguistic and statistical strategies for knowledge acquisition. The starting point of this research proposal is the development of a rule-based grammar for Spanish and for Catalan following the theoretical basis of Dependency Grammar (Tesnière, 1959; Mel’čuk, 1988) in order to carry out two experiments about the integration of automatically- acquired knowledge. In order to build two robust grammars that understand a sentence, the FreeLing pipeline (Padró et al., 2010) has been used as a framework. On the other hand, an eclectic repertoire of criteria about the nature of syntactic heads is proposed by reviewing the postulates of Generative Grammar (Chomsky, 1981; Bonet and Solà, 1986; Haegeman, 1991) and Dependency Grammar (Tesnière, 1959; Mel’čuk, 1988). Furthermore, a set of dependency relations is provided and mapped to Universal Dependencies (Mcdonald et al., 2013). Furthermore, an empirical evaluation method has been designed in order to carry out both a quantitative and a qualitative analysis. In particular, the dependency parsed trees generated by the grammars are compared to real linguistic data. The quantitative evaluation is based on the Spanish Tibidabo Treebank (Marimon et al., 2014), which is large enough to carry out a real analysis of the grammars performance and which has been annotated with the same formalism as the grammars, syntactic dependencies. Since the criteria between both resources are differ- ent, a process of harmonization has been applied developing a set of rules that automatically adapt the criteria of the corpus to the grammar criteria. With regard to qualitative evaluation, there are no available resources to evaluate Spanish and Catalan dependency grammars quali- tatively. For this reason, a test suite of syntactic phenomena about structure and word order has been built. In order to create a representative repertoire of the languages observed, descriptive grammars (Bosque and Demonte, 1999; Solà et al., 2002) and the SenSem Corpus (Vázquez and Fernández-Montraveta, 2015) have been used for capturing relevant structures and word order patterns, respectively. Thanks to these two tools, two experiments have been carried out in order to prove that knowl- edge integration improves the parsing accuracy. On the one hand, the automatic learning of lan- guage models has been explored by means of statistical methods in order to disambiguate PP- attachment. More precisely, a model has been learned with a supervised classifier using Weka (Witten and Frank, 2005). Furthermore, an unsupervised model based on word embeddings has been applied (Mikolov et al., 2013a,b). The results of the experiment show that the supervised method is limited in predicting solutions for unseen data, which is resolved by the unsupervised method since provides a solution for any case. However, the unsupervised method is limited if it Parsing and Evaluation Improving Dependency Grammars Accuracy only learns from lexical data. For this reason, training data needs to be enriched with the lexical value of the preposition, as well as semantic and syntactic features. In addition, the number of patterns used to learn language models has to be extended in order to have an impact on the grammars. On the other hand, another experiment is carried out in order to improve the argument recog- nition in the grammars by the acquisition of linguistic knowledge. In this experiment, knowledge is acquired automatically from the extraction of verb subcategorization frames from the SenSem Corpus (Vázquez and Fernández-Montraveta, 2015) which contains the verb predicate and its arguments annotated syntactically. As a result of the information extracted, subcategorization frames have been classified into subcategorization classes regarding the patterns observed in the corpus. The results of the subcategorization classes integration in the grammars prove that this information increases the accuracy of the argument recognition in the grammars. The results of the research of this thesis show that grammars’ rules on their own are not ex- pressive enough to resolve complex ambiguities. However, the integration of knowledge about these ambiguities in the grammars may be decisive in the disambiguation. On the one hand, sta- tistical knowledge about PP-attachment can improve the grammars accuracy, but syntactic and semantic information, and new patterns of PP-attachment need to be included in the language models in order to contribute to disambiguate this phenomenon. On the other hand, linguistic knowledge about verb subcategorization acquired from annotated linguistic resources show a positive influence positively on grammars’ accuracy.Aquesta tesi vol tractar les limitacions amb què es troben els analitzadors sintàctics automàtics actualment. Tot i els progressos que s’han fet en l’àrea del Processament del Llenguatge Nat- ural en els darrers anys, les tecnologies del llenguatge i, en particular, els analitzadors sintàc- tics automàtics no han pogut traspassar el llindar de certes ambiguïtats estructurals com ara l’agrupació del sintagma preposicional i el reconeixement d’arguments. És per aquest motiu que la recerca duta a terme en aquesta tesi té com a objectiu aportar millores signiflcatives de quali- tat a l’anàlisi sintàctica automàtica per mitjà de la integració de coneixement lingüístic i estadístic per desambiguar construccions sintàctiques ambigües. El punt de partida de la recerca ha estat el desenvolupament de d’una gramàtica en espanyol i una altra en català basades en regles que segueixen els postulats de la Gramàtica de Dependèn- dencies (Tesnière, 1959; Mel’čuk, 1988) per tal de dur a terme els experiments sobre l’adquisició de coneixement automàtic. Per tal de crear dues gramàtiques robustes que analitzin i entenguin l’oració en profunditat, ens hem basat en l’arquitectura de FreeLing (Padró et al., 2010), una lli- breria de Processament de Llenguatge Natural que proveeix una anàlisi lingüística automàtica de l’oració. Per una altra banda, s’ha elaborat una proposta eclèctica de criteris lingüístics per determinar la formació dels sintagmes i les clàusules a la gramàtica per mitjà de la revisió de les propostes teòriques de la Gramàtica Generativa (Chomsky, 1981; Bonet and Solà, 1986; Haege- man, 1991) i de la Gramàtica de Dependències (Tesnière, 1959; Mel’čuk, 1988). Aquesta proposta s’acompanya d’un llistat de les etiquetes de relació de dependència que fan servir les regles de les gramàtques. A més a més de l’elaboració d’aquest llistat, s’han establert les correspondències amb l’estàndard d’anotació de les Dependències Universals (Mcdonald et al., 2013). Alhora, s’ha dissenyat un sistema d’avaluació empíric que té en compte l’anàlisi quantitativa i qualitativa per tal de fer una valoració completa dels resultats dels experiments. Precisament, es tracta una tasca empírica pel fet que es comparen les anàlisis generades per les gramàtiques amb dades reals de la llengua. Per tal de dur a terme l’avaluació des d’una perspectiva quan- titativa, s’ha fet servir el corpus Tibidabo en espanyol (Marimon et al., 2014) disponible només en espanyol que és prou extens per construir una anàlisi real de les gramàtiques i que ha estat anotat amb el mateix formalisme que les gramàtiques. En concret, per tal com els criteris de les gramàtiques i del corpus no són coincidents, s’ha dut a terme un procés d’harmonització de cri- teris per mitjà d’unes regles creades manualment que adapten automàticament l’estructura i la relació de dependència del corpus al criteri de les gramàtiques. Pel que fa a l’avaluació qualitativa, pel fet que no hi ha recursos disponibles en espanyol i català, hem dissenyat un reprertori de test de fenòmens sintàctics estructurals i relacionats amb l’ordre de l’oració. Amb l’objectiu de crear un repertori representatiu de les llengües estudiades, s’han fet servir gramàtiques descriptives per fornir el repertori d’estructures sintàctiques (Bosque and Demonte, 1999; Solà et al., 2002) i el Corpus SenSem (Vázquez and Fernández-Montraveta, 2015) per capturar automàticament l’ordre oracional. Gràcies a aquestes dues eines, s’han pogut dur a terme dos experiments per provar que la integració de coneixement en l’anàlisi sintàctica automàtica en millora la qualitat. D’una banda, Parsing and Evaluation Improving Dependency Grammars Accuracy s’ha explorat l’aprenentatge de models de llenguatge per mitjà de models estadístics per tal de proposar solucions a l’agrupació del sintagma preposicional. Més concretament, s’ha desen- volupat un model de llenguatge per mitjà d’un classiflcador d’aprenentatge supervisat de Weka (Witten and Frank, 2005). A més a més, s’ha après un model de llenguatge per mitjà d’un mètode no supervisat basat en l’aproximació distribucional anomenat word embeddings (Mikolov et al., 2013a,b). Els resultats de l’experiment posen de manifest que el mètode supervisat té greus lim- itacions per fer donar una resposta en dades que no ha vist prèviament, cosa que és superada pel mètode no supervisat pel fet que és capaç de classiflcar qualsevol cas. De tota manera, el mètode no supervisat que s’ha estudiat és limitat si aprèn a partir de dades lèxiques. Per aquesta raó, és necessari que les dades utilitzades per entrenar el model continguin el valor de la preposi- ció, trets sintàctics i semàntics. A més a més, cal ampliar el número de patrons apresos per tal d’ampliar la cobertura dels models i tenir un impacte en els resultats de les gramàtiques. D’una altra banda, s’ha proposat una manera de millorar el reconeixement d’arguments a les gramàtiques per mitjà de l’adquisició de coneixement lingüístic. En aquest experiment, s’ha op- tat per extreure automàticament el coneixement en forma de classes de subcategorització verbal d’el Corpus SenSem (Vázquez and Fernández-Montraveta, 2015), que conté anotats sintàctica- ment el predicat verbal i els seus arguments. A partir de la informació extreta, s’ha classiflcat les diverses diàtesis verbals en classes de subcategorització verbal en funció dels patrons observats en el corpus. Els resultats de la integració de les classes de subcategorització a les gramàtiques mostren que aquesta informació determina positivament el reconeixement dels arguments. Els resultats de la recerca duta a terme en aquesta tesi doctoral posen de manifest que les regles de les gramàtiques no són prou expressives per elles mateixes per resoldre ambigüitats complexes del llenguatge. No obstant això, la integració de coneixement sobre aquestes am- bigüitats pot ser decisiu a l’hora de proposar una solució. D’una banda, el coneixement estadístic sobre l’agrupació del sintagma preposicional pot millorar la qualitat de les gramàtiques, però per aflrmar-ho cal incloure informació sintàctica i semàntica en els models d’aprenentatge automàtic i capturar més patrons per contribuir en la desambiguació de fenòmens complexos. D’una al- tra banda, el coneixement lingüístic sobre subcategorització verbal adquirit de recursos lingüís- tics anotats influeix decisivament en la qualitat de les gramàtiques per a l’anàlisi sintàctica au- tomàtica

    Selective transfer in the acquisition of english double object constrctions by brazilian learners

    Get PDF
    The present study investigates the acquisition of the English double object constructions (GOLDBERG, 1995) by Brazilian learners. We hypothesize that, due to first language (L1) influences, the prepositional ditransitive construction (John gave a book to Mary) will be acquired earlier, while the ditransitive construction (John gave Mary a book) will be part of the learner’s interlanguages (SELINKER, 1972) only at the advanced level of proficiency. We also hypothesize that learners may transfer (ODLIN, 1989) the placement of the object pronoun in pre-verbal position from their L1 to their interlanguage in early stages of acquisition (João me deu um livro / *John me gave a book). We test our hypotheses by comparing the performance of three groups of learners (beginning, intermediate, and advanced) and native speakers of English on an acceptability judgment task used as a measure of learnability and generalization. Results confirm the order of acquisition of the English double object constructions predicted for native speakers of Brazilian Portuguese. Moreover, results suggest that, although mother tongue influences may have taken place, they do not do so pervasively, but rather selectively, corroborating the proposal by Kellerman (1983)

    The Third Multilingual Surface Realisation Shared Task (SR’20):Overview and Evaluation Results

    Get PDF
    This paper presents results from the Third Shared Task on Multilingual Surface Realisation (SR’20) which was organised as part of the COLING’20 Workshop on Multilingual Surface Realisation. As in SR’18 and SR’19, the shared task comprised two tracks: (1) a Shallow Track where the inputs were full UD structures with word order information removed and tokens lemmatised; and (2) a Deep Track where additionally, functional words and morphological information were removed. Moreover, each track had two subtracks: (a) restricted-resource, where only the data provided or approved as part of a track could be used for training models, and (b) open-resource, where any data could be used. The Shallow Track was offered in 11 languages, whereas the Deep Track in 3 ones. Systems were evaluated using both automatic metrics and direct assessment by human evaluators in terms of Readability and Meaning Similarity to reference outputs. We present the evaluation results, along with descriptions of the SR’19 tracks, data and evaluation methods, as well as brief summaries of the participating systems. For full descriptions of the participating systems, please see the separate system reports elsewhere in this volume

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl
    • …
    corecore