5 research outputs found
Pseudo-relatives complement of perception predicates
Pseudorelatives (PRs) are single constituents formed by a DP (the head) and an embedded clause
headed by the complementizer que (1). The relation between the head and the embedded clause is a
relation of predication. PRs do not display a restrictive reading but a situational one.
(1) He visto a [PR Juan que corría]
I.have seen a Juan that ran
'I saw Juan running'
Previous literature on pseudorelatives contains different explanations regarding their internal
structure, the way PRs relate to the matrix predicate, the position PRs can occupy within the matrix
clause and the function the head of the PR has within the embedded clause. The goal of this thesis is
to go in depth through these four aspects in the light of the following three new observations:
i) Previous literature only considers the possibility of having subject-gap PRs (1) (the head of the
PR is the subject of the embedded predicate). However, I propose the Object-gap PR
generalization: object-gap PRs (2) (the head of the PR is either the direct or the indirect object of
the emebdded predicate) are available in those languages allowing Object Clitic Doubling (Spanish,
Greek). Those languages lacking Object clitic Doubling do not allow object-gap PRs (Italian,
French or Portuguese).
(2) a. He visto a Maríai que *( lai) traían en coche
I.have seen a María that her-ACC brought.3.PL by car
'I saw María who was being brought by car'
b. He visto a Pacoi que *( lei) pedían la hora unos chavales
I.have seen a Paco that le-DAT asked.3PL the time some guys
'I saw Paco who was being asked the time by some guys'
ii) The head of the PR needs to be animate. Animacy becomes a crucial factor for object-gap PRs
since if the object-head of the PR is not animate, the situational reading is not obtained (3).
(3) He visto el tren que lo ?? reparaban en cocheras/ llegaba a cocheras
I.have seen the train that lo-ACC fixed-3.PL in sheds / arrived to sheds
'I have just seen the train being fixed up in the shed / arriving to the shed '
iii) PRs can only appear in complement position of the matrix predicate.
Considering the consequences of these new observations, the previous control and raising
analyses are discarded. A control analysis cannot account for objet-gap PRs because the controller
can never control the direct object of the embedded predicate. The raising analysis is ruled out
because it cannot explain the mandatory presence of object clitics within the embedded clause, the
double case assignment of the head in subject-gap and indirect object-gap PRs or the motivation for
the movement of the head to its superficial position. Thus, a dislocation analysis for PRs where the
head of the PR is base-generated in the left periphery of the embedded clause is proposed to account
for the availability of subject-gap and object-gap PRs and the presence of the clitics in the case of
object-gap PRs and pro in the case of subject-gap PRs.
Further research includes an explanation for those languages that do not allow for objectgap
PRs (e.g. Italian) but allow clitic left dislocation structures, the concrete properties that allow
perception predicates to select for PRs or the secondary predication character of PRs
Pseudo-relatives complement of perception predicates
Pseudorelatives (PRs) are single constituents formed by a DP (the head) and an embedded clause
headed by the complementizer que (1). The relation between the head and the embedded clause is a
relation of predication. PRs do not display a restrictive reading but a situational one.
(1) He visto a [PR Juan que corría]
I.have seen a Juan that ran
'I saw Juan running'
Previous literature on pseudorelatives contains different explanations regarding their internal
structure, the way PRs relate to the matrix predicate, the position PRs can occupy within the matrix
clause and the function the head of the PR has within the embedded clause. The goal of this thesis is
to go in depth through these four aspects in the light of the following three new observations:
i) Previous literature only considers the possibility of having subject-gap PRs (1) (the head of the
PR is the subject of the embedded predicate). However, I propose the Object-gap PR
generalization: object-gap PRs (2) (the head of the PR is either the direct or the indirect object of
the emebdded predicate) are available in those languages allowing Object Clitic Doubling (Spanish,
Greek). Those languages lacking Object clitic Doubling do not allow object-gap PRs (Italian,
French or Portuguese).
(2) a. He visto a Maríai que *( lai) traían en coche
I.have seen a María that her-ACC brought.3.PL by car
'I saw María who was being brought by car'
b. He visto a Pacoi que *( lei) pedían la hora unos chavales
I.have seen a Paco that le-DAT asked.3PL the time some guys
'I saw Paco who was being asked the time by some guys'
ii) The head of the PR needs to be animate. Animacy becomes a crucial factor for object-gap PRs
since if the object-head of the PR is not animate, the situational reading is not obtained (3).
(3) He visto el tren que lo ?? reparaban en cocheras/ llegaba a cocheras
I.have seen the train that lo-ACC fixed-3.PL in sheds / arrived to sheds
'I have just seen the train being fixed up in the shed / arriving to the shed '
iii) PRs can only appear in complement position of the matrix predicate.
Considering the consequences of these new observations, the previous control and raising
analyses are discarded. A control analysis cannot account for objet-gap PRs because the controller
can never control the direct object of the embedded predicate. The raising analysis is ruled out
because it cannot explain the mandatory presence of object clitics within the embedded clause, the
double case assignment of the head in subject-gap and indirect object-gap PRs or the motivation for
the movement of the head to its superficial position. Thus, a dislocation analysis for PRs where the
head of the PR is base-generated in the left periphery of the embedded clause is proposed to account
for the availability of subject-gap and object-gap PRs and the presence of the clitics in the case of
object-gap PRs and pro in the case of subject-gap PRs.
Further research includes an explanation for those languages that do not allow for objectgap
PRs (e.g. Italian) but allow clitic left dislocation structures, the concrete properties that allow
perception predicates to select for PRs or the secondary predication character of PRs
The object-gap pseudorelative generalization
Previous literature contains two different points of view regarding the subject-object asymmetry related to the DP head of pseudorelatives (PRs). Some authors claim that the DP head can only be interpreted as the subject of the embedded predicate (subject-gap PRs). Other authors point towards the possibility of finding other constituents (e.g. direct object) in head position (object-gap PR), too. In this paper I claim that there are certain languages that only allow the DP head to be the subject of the embedded predicate, that is, they only allow subject-gap PRs, whereas other languages allow both subject-gap and object-gap PRs. Thus, the aim of this paper is to present the object-gap pseudorelative (PR) generalization to account for the cross-linguistic availability of subject-gap and object-gap PRs: the availability of object-gap PRs is subject to object clitic doubling. The structure of this paper goes as follows. Section 1 introduces PRs. Section 2 presents data about subject-gap and object-gap PRs. Section 3 gives some remarks on object clitic doubling. Section 4 presents the object-gap PR generalization. Conclusions and further research issues are presented in section 5
Desambiguación de construcciones con se con aprendizaje automático
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Filosofía y Letras, Departamento de Lingüística, Lenguas Modernas, Lógica y Fª de la Ciencia y Tª de la Literatura y Literatura Comparada. Fecha de lectura: 10-12-2021Spanish se constructions constitute a linguistic phenomenon that challenges
Natural Language Processing (NLP) tasks such as part-of-speech or dependency
relation tagging. The three main reasons why se is a hurdling topic for NLP
are: rst, the high-frequency of appearance of se in Spanish; second, the nine
di erent syntactic constructions where se appears adding information of diverse
nature depending on the context; third, the lack of gender and number features
se displays that does not help se-type disambiguation. This thesis' main goal is
to improve the state-of-the-art results on automatic morphosyntactic se analysis
on the basis of two hypotheses: the grouping (GH) and the subcategorization
frame (SFH) hypotheses. This thesis proposes a new annotation scheme for se
that connects the di erent constructions through a transitivity gradient (Moreno
Cabrera, 2004). The new annotation scheme is applied on the SE-corpus, a
European Spanish corpus made of 3,100 sentences containing the word se. The
SE-corpus belongs to the news, leisure and daily life domain of CORPES XXI
(Real Academia Espa~nola, 2018) and it has been manually annotated as part
of this research work. The SE-corpus is used to train di erent models using
UDPipe1.2 to test whether the new annotation scheme can be learnt by the
neural networks that underlie the dependency parser. The resulting models are
evaluated on an additional gold standard test corpus made of 100 sentences
containing the form se. These sentences are obtained from CORPES XXI, too.
The best model yields a LAS F-score of 86.97 points and a UAS F-score of
89.65 points. Regarding se analysis, the best model yields a LAS F-score of
82.55 points and a UAS F-score of 98.16 points. The main contributions of this
thesis are: a new annotation scheme for se adapted to Universal Dependencies'
guidelines, manual annotation guidelines for Spanish se disambiguation, the raw
and annotated version of the SE-corpus and the best resulting mode
Clasificación de construcciones con se en español: de modelos de bolsa de palabras a modelos de lenguaje
Spanish se constructions are a complex linguistic phenomenon that challenges Natural Language Processing (NLP) tasks such as part-of-speech or dependency relation tagging. Se is a high-frequency word that appears in nine different types of syntactic constructions and adds information of diverse nature depending on the context. Thus, to solve the problem Spanish se constructions poses in an efficient way, this study proposes a tagging system for se applied to a corpus composed of 2,140 sentences. This corpus is used in a classification experiment where 9 classifiers based on machine learning models and a dependency parser are tested. Results show that pre-trained language models based on transformers architecture reach the highest accuracy (0.83) and f-score (0.70) values.Las construcciones con se en español son un complejo fenómeno lingüístico que desafía tareas de Procesamiento del Lenguaje Natural (PLN) como el etiquetado automático de categoría gramatical (POS tagging) o de relaciones de dependencias. Se es una forma de alta frecuencia que aparece en nueve tipos de construcciones sintácticas del español, aportando información de diferente naturaleza en función del contexto. Por ello, para tratar el problema de clasificación que plantean las construcciones con se de manera eficiente, este estudio propone un sistema de etiquetado de se aplicado a un corpus de 2.140 oraciones y probado con 9 clasificadores basados en modelos de aprendizaje automático y un parser de dependencias. Los resultados muestran que los modelos pre-entrenados basados en arquitectura de transformers alcanzan los valores más elevados de exactitud (0,83) y de F-score (0,70).The authors acknowledge financial support from PID2019-106827GB-I00 / AEI / 10.13039/501100011033 and from the European Regional Development Fund and from the Spanish Ministry of Economy, Industry, and Competitiveness - State Research Agency, project TIN2016-76406-P (AEI/FEDER, UE)