487 research outputs found

    The scope of just: evidence from information-structure annotation in diachronic English corpora

    Get PDF
    The current research explores just in Corpus of Middle English Prose and Verse and EEBO as a focusing adverb, which demonstrates its standing and development throughout 14-17 century English. Automated data retrieval and analysis provides new insights into the adverb transformation from the contextual perspective, as well as, shows its grammaticalization cline based on various chronological timeframes. The analysis proves that the polysemous meaning of the form correlates with syntactic changes relevant for every time frame and is determined by information-structural considerations. To check the initial hypothesis the study required annotation of giveness-neweness tagging in the text segments retrieved from the corpora. To ensure the automated and semi-automated procedures, the methodology relies on Discourse Representation Theory proving corpus tagging algorithms taking into account discourse, encyclopedic, situational and scenario contexts. Labeling the relevant constituents for their information status presupposes employing “coreference resolution” enabled through “Cesax” coreference editor. The further manual study of focusing just centers on its position in the XP along with word-order patterns registered. To observe regularities in word order fluctuations in the models a special attention is given to different Focus types marked by the adverb in XPs, viz. informational, identificational, contrastive, emphatic, etc

    "From the speaker's point of view" : subjectification as pragmatic-semantic language change

    Get PDF
    Przedmiotem artykułu jest rola procesów subiektywizacyjnych w kształtowaniu się metatekstowej warstwy języka. Na przykładzie trzech jednostek leksykalnych ("prawda", "pewnie", "szalenie"), autorka pokazuje, jak wprowadzenie elementów subiektywnych do ich struktury znaczeniowej wpłynęło na pojawienie się najpierw nowych funkcji wypowiedzeniowych, a następnie doprowadziło do utrwalenia się nowego znaczenia kodowego. Rozważania są prowadzone w ujęciu diachronicznym, w oparciu o najstarsze poświadczenia badanych jednostek. Na podstawie analizy kontekstowej wskazywany jest moment pojawienia się znaczeń z elementem subiektywnym. Wyniki jednoznacznie wskazują, że subiektywizacja ma swoje źródło w przestrzeni pragmatycznej, natomiast jej konsekwencje widoczne są na poziomie semantycznym. Materiał językowy został wyekscerpowany zarówno ze źródeł leksykograficznych, jak i korpusowych. Analiza pokazuje, że subiektywizacja jest skorelowana ze zmianami natury formalnej, takimi jak utrata końcówek fleksyjnych, utrata właściwości morfologicznych, zmiana przynależności kategorialnej i izolacja składniowa. Artykuł wskazuje na potrzebę pogłębionych diachronicznych badań konfrontatywnych nad procesem subiektyfikacji.The article demonstrates the importance of subjectification processes in shaping the metatextual layer of language. Using three lexical units ("prawda" ‘true, right’, "pewnie" ‘sure, certainly’ and "szalenie" ‘extremely, madly’) as examples, the author shows how the enrichment of their semantic structure with a subjective component led to the emergence of new propositional functions and ultimately to the establishment of a new meaning. The study is conducted diachronically, drawing on the oldest attestations of the lexemes in question. Based on a contextual analysis, the moment the meanings with a subjective component appeared is identified. The results unequivocally demonstrate that subjectification has its origin in the pragmatic domain, while its consequences are visible on the semantic level. The language material comes from both lexicographic sources and corpora. The analysis shows that subjectification is correlated with formal changes including loss of inflectional endings, loss of morphological properties, recategorization, and syntactic isolation. The paper provides evidence for the need for in-depth comparative diachronic research on subjectification

    The polysemy of the Spanish verb sentir: a behavioral profile analysis

    Get PDF
    This study investigates the intricate polysemy of the Spanish perception verb sentir (‘feel’) which, analogous to the more-studied visual perception verbs ver (‘see’) and mirar (‘look’), also displays an ample gamut of semantic uses in various syntactic environments. The investigation is based on a corpus-based behavioral profile (BP) analysis. Besides its methodological merits as a quantitative, systematic and verifiable approach to the study of meaning and to polysemy in particular, the BP analysis offers qualitative usage-based evidence for cognitive linguistic theorizing. With regard to the polysemy of sentir, the following questions were addressed: (1) What is the prototype of each cluster of senses? (2) How are the different senses structured: how many senses should be distinguished – i.e. which senses cluster together and which senses should be kept separately? (3) Which senses are more related to each other and which are highly distinguishable? (4) What morphosyntactic variables make them more or less distinguishable? The results show that two significant meaning clusters can be distinguished, which coincide with the division between the middle voice uses (sentirse) and the other uses (sentir). Within these clusters, a number of meaningful subclusters emerge, which seem to coincide largely with the more general semantic categories of physical, cognitive and emotional perception

    LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese

    Get PDF
    Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus

    Quantification and scales in change

    Get PDF
    This volume contains thematic papers on semantic change which emerged from the second edition of Formal Diachronic Semantics held at Saarland University. Its authorship ranges from established scholars in the field of language change to advanced PhD students whose contributions have equally qualified and have been selected after a two-step peer-review process. The key foci are variablity and diachronic trajectories in scale structures and quantification, but readers will also find a variety of further (and clearly non-disjoint) issues covered including reference, modality, givenness, presuppositions, alternatives in language change, temporality, epistemic indefiniteness, as well as - in more general terms - the interfaces of semantics with syntax, pragmatics and morphology. Given the nature of the field, the contributions are primarily based on original corpus studies (in one case also on synchronic experimental data) and present a series of new findings and theoretical analyses of several languages, first and foremost from the Germanic and Romance subbranches of Indo-European (English, French, German, Italian, Spanish) and from Semitic (with an analysis of universal quantification in Biblical Hebrew)

    From Crystal-clear to Limpide: Translating English [Noun+adj] Compound Adjectives with a Figurative-intensifying Noun into French

    Get PDF
    English [Noun+Adj] compound adjectives containing an intensifying metaphor (e.g. crystal-clear) pose particular challenges for French translation, due in part to the absence of a direct equivalent construction. Our study examines morphosyntactic and conceptual-semantic translation procedures that capture how these challenges are resolved. We also explore the little-investigated aspect of translation variation (the number of different solutions for each item). We analyze the potential effects of two factors: the presence or absence of figurative intensification and the items’ frequency of use in English. Our results indicate that translators prefer different morphosyntactic procedures for different compound subtypes. Overall, an adjective constituent is most frequently retained, although complete reformulations with a noun or verb also occur. Semantically, the intensifying meaning is often rendered non-figuratively, depending on what is available in idiomatic French usage. Intensification is also frequently dropped. Translation variation is remarkably high, due in part to extensive use of near-synonyms. High-frequency items do not appear to converge on a smaller number of translations, but instead provide more opportunities for diversification

    Minimizers and the syntax of negation: a diachronic and comparative approach from european portuguese

    Get PDF
    A sintaxe da negação, o fenómeno de concordância negativa e a legitimação de itens de polaridade negativa (IPNs) têm sido tópicos amplamente estudados na literatura nas décadas mais recentes. Os minimizadores são um subtipo de IPNs que designam pontos mínimos em escalas de dimensão ou de valor (cf. Hoeksema 2001) e cuja frequência e diversidade sofreu alterações desde o Português Antigo até aos dias de hoje. O presente trabalho pretende caraterizar os minimizadores existentes em estádios mais antigos da língua e avançar uma possível explicação para o seu desenvolvimento incipiente na história do português, ao contrário do que se verifica em outras línguas românicas como o francês ou o italiano. O Português Antigo (PA) dispunha de dois grandes grupos de minimizadores que, embora diferentes, se comportavam como IPNs fracos, sendo legitimados em contextos negativos e modais, mas não ocorrendo em contextos afirmativos. Por um lado, existia o grupo dos minimizadores partitivos e valorativos onde se incluíam elementos com traços escalares que designam pontos mínimos em escalas de dimensão (os partitivos), mas também de valor (os valorativos). Embora com pouca expressão nos dados do PA, este grupo permanece no Português Europeu Contemporâneo (PEC), apresentando variação nos elementos que o constituem, com o desaparecimento de alguns itens e a inclusão de novos elementos. Por seu turno, é possível encontrar no PA um segundo grupo: os minimizadores indefinidos. Neste grupo incluem-se os itens al, cousa, pessoa, homem e rem cuja ocorrência era bastante produtiva nos dados do PA, sendo a sua frequência, até determinada altura, superior à dos indefinidos negativos nada, nenhum e ninguém. Apesar da frequência com que se registam no PA, nenhum dos elementos deste grupo sobrevive para além do século XVI. O desaparecimento de todos os minimizadores indefinidos acompanha a progressiva generalização dos indefinidos negativos, sugerindo um processo semelhante à competição entre gramáticas como proposto por Kroch (1989, 1994). Os dados analisados sugerem que os minimizadores indefinidos terão competido diretamente com os indefinidos negativos nada e nenhum [+humano] (e eventualmente ninguém), no mesmo tipo de contextos, estando estes últimos em melhores condições de ganhar a competição. Na verdade, o que parece verificar-se é a competição entre duas famílias construcionais, no sentido de Smet et al. (2018). A família construcional dos minimizadores indefinidos reunia elementos com diferentes níveis de gramaticalização, não apresentando uniformidade e coesão. Pelo contrário, a família construcional dos indefinidos negativos era coesa e composta por elementos com graus semelhantes de gramaticalização. Além disso, todos os seus membros beneficiavam do fator (que sugeria negação), favorecendo a sua transformação em IPNs fortes (cf. Martins, 1997, 2000). É possível considerar a existência de uma terceira família construcional composta pelos minimizadores partitivos/valorativos, mas cuja competição é pouco significativa. Apenas o seu elemento mais gramaticalizado, nomeadamente o item nemigalha, se apresentava como um competidor com significativa expressividade. Embora a competição entre famílias construcionais possa explicar, em parte, o desaparecimento dos minimizadores indefinidos, não permite explicar a gramaticalização incipiente de outros minimizadores e a mudança na configuração de minimizadores menos gramaticalizados que passam a ocorrer antecedidos do numeral cardinal UM, passando a ser esse o padrão observado para a generalidade dos minimizadores do PEC. A grande maioria dos minimizadores do PA manteve as suas propriedades nominais, exibindo marcas de género e número, conservando o valor semântico do nome comum que lhes deu origem e admitindo modificação. Por esta razão, considero que permaneceram núcleos nominais, diretamente inseridos como núcleos de NP. No entanto, contrariamente ao que se verifica em PEC, os minimizadores nominais do PA ocorriam como bare nouns. Por exemplo, os minimizadores partitivos ocorrem exclusivamente sob a forma de bare nouns no século XIII, havendo os primeiros registos de ocorrência com numeral cardinal UM apenas no século XIV e de forma mais sistemática a partir do século XVI. Esta ocorrência coincide também com a generalização do uso de determinante indefinido a partir do século XIV, conforme afirma Ledgeway (2012). Tal como está descrito para línguas como o Francês, também o Português parece ter sofrido uma alteração no sistema D, perdendo a possibilidade de ter um D nulo e passando a ter de preencher essa posição com um elemento lexicalmente realizado. Os minimizadores partitivos passam, a partir de certa altura, a ocorrer com o numeral cardinal UM, tal como já sucedia com a maioria dos minimizadores valorativos. Por um lado, a presença do numeral cardinal passa a permitir satisfazer o requisito de ter um D lexicalmente preenchido. Por outro, codifica positivamente um traço [quantificação], podendo ser o único elemento a verificar positivamente esse traço, quando precede minimizadores sem o traço [+quantificação] ou estabelecendo concordância quando o próprio minimizador contém o traço [+quantificação], verificando-se assim concordância de traços entre os dois elementos. Sendo o numeral cardinal gerado como núcleo da projeção NumP, esta posição deixa de estar disponível para acomodar minimizadores, impedindo a sua reanálise como quantificadores. O PA regista, contudo, alguns exemplos de gramaticalização bem sucedida, com os itens ponto, rem e nemigalha. Estes itens encontram-se atestados com função de quantificadores nominais com um Sintagma Preposicional partitivo e também como quantificadores intransitivos. Nestes casos, verifica-se um processo semelhante ao que se encontra documentado para outros itens de diferentes línguas românicas, com a passagem de um item nominal, gerado como núcleo de NP, para um item mais funcional que projeta um Sintagma Quantificador, obedecendo ao processo de gramaticalização postulado por Roberts & Roussou (2003). Um elemento em N seleciona um complemento preposicional. Posteriormente, o elemento em N move-se para o núcleo de NumP. À medida que a gramaticalização avança, passa a ser diretamente inserido em Num, deixando de haver movimento de N para Num (N-to-Num movement). Finalmente, o minimizador é reinterpretado como núcleo do seu próprio sintagma quantificador. Num momento posterior, o quantificador pode tornar-se intransitivo, passando a ser ambíguo entre quantificador ou partícula de reforço da negação, com estatuto adverbial, em contextos específicos como, por exemplo, em frases com verbos opcionalmente transitivos (cf. Lucas 2007, Breitbarth et al. 2020, a.o.). Estes contextos gerados de ambiguidade são encontrados para alguns itens no PA, sobretudo para nemigalha que ocorre inclusivamente em contextos pressuposicionais e em estruturas de tópico-comentário. A comparação dos dados do PA com dados do PEC parece confirmar a ideia de que a generalização do numeral cardinal à esquerda do minimizador condicionou a gramaticalização de minimizadores nominais. Embora o PEC apresente minimizadores com um grau avançado de gramaticalização, não há evidência de que estes tenham passado por um estádio em que fossem antecedidos pelo numeral cardinal UM. Na verdade, minimizadores como puto, bola ou peva são relativamente recentes, mas apresentam um comportamento de quantificador e ocorrem inclusivamente como únicos elementos negativos na frase. Não há, contudo, estádios intermédios destes itens em que se registe a presença do numeral cardinal, o que sugere que foram recrutados sob uma forma bare, tendo rapidamente gramaticalizado com estatuto de quantificador. Além disso, minimizadores como um boi e um caraças, que se afastam de um comportamento nominal, também parecem não conter um numeral cardinal, mas antes um determinante expletivo, provavelmente gerado diretamente em D. Em todo o caso, a comparação entre os dados do PA e do PEC mostra que a sintaxe dos minimizadores permanece idêntica, sendo candidatos a um estatuto mais gramaticalizado os elementos com condições para abandonarem o núcleo de NP e subirem para uma posição mais à esquerda, nomeadamente Num. Por outro lado, a configuração NUMERAL CARDINAL+MINIMIZADOR passa a ser a configuração por defeito, como consequência da perda de bare nouns em português. Os dados do PA permitem contribuir para o estudo dos fenómenos associados à sintaxe da negação, numa perspetiva diacrónica, ilustrando aquilo a que Breitbarth et al. (2013) convencionaram chamar Ciclo de Jespersen Incipiente, dando conta de que a gramaticalização de minimizadores pode parar em qualquer momento, sendo este o cenário mais frequente.Minimizers and their interaction with the syntax of negation seem to be an inexhaustible topic of research, due to the richness of these items and the unexpected paths of evolution found for counterpart items across languages. The present work aimed at providing some insights from European Portuguese, in particular from early stages of the language. Old Portuguese (OP) displayed two main groups of minimizers which behaved as weak negative polarity items (NPIs). On the one hand, there was the partitive/evaluative group which included items with a partitive reading and referring to low endpoints in a scale of size, but also items with an evaluative reading, originating from nouns associated to low endpoints in a scale of value. Partitive/evaluative minimizers manage to survive until nowadays. On the other hand, there was the group of indefinite minimizers, which included the items al ‘other thing/person’, cousa ‘thing’, pessoa ´person’, homem ‘man’ and rem ‘thing’ which were very productive in OP. Contrary to expectations, all indefinite minimizers disappeared from the language until the end of the 16th century, including items which had reached the status of a quantifier element, as was the case of rem. The disappearance of all indefinite minimizers until the 16th century can be explained under the hypothesis of grammar competition as proposed by Kroch (1989, 1994). Indefinite minimizers directly competed against the negative indefinites nada ‘nothing’, nenhum [+hum] ‘no one’ which were in a better position to win the competition. Indefinite minimizers on the one hand, and negative indefinites, on the other, constituted two different constructional families, in the sense of Smet et al. (2018). While the family of indefinite minimizers was unstable and contained items with different behaviour and different levels of grammaticalization, the family of negative indefinites was cohese and consistent, benefiting from the so-called factor, which allowed these items to, eventually, become strong NPIs. A third constructional family also competed against indefinite minimizers and negative indefinites, in particular through the item nemigalha, which originated as a member of the partitive/evaluative family. Nevertheless, competition between constructional families does not fully explain the incipient grammaticalization of most items, since competition occurred mainly between the most grammaticalized forms of each constructional family. The OP data show that most minimizers in OP maintained their nominal properties, allowing modification and exhibiting gender and number features. For this reason, they are analysed as base-generated in N, as nominal heads. In any case, OP registers a few cases of minimizers that have reached more advanced stages of grammaticalization, behaving as quantifier-like elements, namely, adnominal quantifiers taking a partitive PP and intransitive bare quantifiers, both projecting their own Quantifier Phrase. (QP) Items such as rem, ponto and nemigalha constitute examples of minimizers originating from common nouns which have become heads of a QP. They follow the grammaticalization path described for other minimizers in Romance, starting as heads of NP and moving leftward to become heads of NumP (cf. Roberts & Roussou 2003). They eventually start being directly merged as NumP heads, being reinterpreted as quantifiers. Additionally, they may start appearing as intransitive QPs, leading to ambiguity between a quantifier or a negation reinforcement particle in specific contexts. Parallel to the disappearance of the more grammaticalized items, in the 16th century there seems to emerge a new configuration for minimizers which were still nominal heads. Partitive minimizers occurred exclusively under a bare form in the 13th century. The first examples of partitive minimizers occurring with a cardinal numeral to their left coincides with what is argued to be the period of widespread of the indefinite determiner (the 14th century). There seems to have been a change in the D system that resulted in the disappearance of bare nouns and in the need to have a lexically filled D. Partitive minimizers progressively start occurring with a cardinal numeral at their left. This, I proposed, allowed to fulfil the need to have a lexical D, with the numeral rising from the head of NumP, to the head of DP. The cardinal numeral also encoded a [+quantification] feature that agreed with the [+quantification] feature present in some minimizers; in the cases where the minimizer did not display a [+quantification] feature, the cardinal numeral alone encoded that feature. However, the insertion of the cardinal numeral blocked the rise of minimizers to Num, a position where they could be reinterpreted as quantifier elements. An argument in favour of this hypothesis is the fact that there are no registers of minimizers going from a CARDINAL NUMERAL+MINIMIZER configuration into a quantifier configuration. CEP shows us that minimizers behaving as adnominal and intransitive bare quantifiers are, in general, directly recruited under a bare form. In the few cases displaying a configuration UM+MINIMIZER, the minimizer exhibits a more advanced stage of grammaticalization and the element UM seems to be an expletive element, sitting in D, rather than a cardinal numeral. All in all, Old Portuguese seems to illustrate quite well the functioning of an Incipient Jespersen Cycle (cf. Breitbarth et al. 2013), since it presented a few promising candidates to becoming independent negation markers but none of them remained in the language, despite displaying advanced stages of grammaticalization

    CoDiAJe - the Annotated Diachronic Corpus of Judeo-spanish : Description of a Multi-alphabetic Corpus and its Textual and Linguistic Annotations

    Get PDF
    Judeo-Spanish differs from late 15th-century Spanish and modern Spanish in several respects, such as its morphology, syntax, and semantics, but the most visible difference is in the alphabet. From the end of the 19th century, Judeo-Spanish has been written in various alphabets -Greek, Cyrillic, and especially Latin-. However, the Hebrew alphabet had been used since ancient times, before it was abandoned finally only in the 1940s. This means that the majority of Judeo-Spanish texts are written in Hebrew characters. CoDiAJe is an annotated diachronic corpus that includes documents produced from the 16th century up to the present day, developed in TEITOK. The significance of its development is that this tool processes linguistic data in the alphabets mentioned above, allowing users to visualize each text in five orthographic forms (the original version in which it was written, its transcription in Latin characters, an expanded form to complete abbreviations or to correct defective writing, a version in modern Judeo-Spanish, and a version in orthographic modern Spanish). CoDiAJe enables the user to conduct searches not only for a specific word, but also for all its linguistic and orthographic variants in the different alphabets. During the annotation process, tags from the EAGLES tagset for Spanish were modified, and others were created: these are simply steps towards the creation of an accurate tagset for Judeo-Spanish. The digitized texts are also enriched with semantic-conceptual information and information on the affiliation of all non-Romance elements.El judeoespañol se diferencia del español de finales del siglo XV y del español moderno en varios aspectos que afectan a la fonética y fonología, morfología, sintaxis y semántica. Sin embargo, la diferencia más fácilmente apreciable está en el alfabeto. A finales del siglo XIX se comenzó a escribir con diferentes alfabetos: griego, cirílico y, sobre todo, latino en diferentes versiones. Sin embargo, desde tiempos remotos se utilizó el alfabeto hebreo, y su abandono definitivo solo ocurrió en la década de los cuarenta del siglo pasado, por lo que la mayor parte de los textos escritos en esta lengua están en caracteres hebreos. CoDiAJe es un corpus diacrónico anotado que incluye documentos creados desde el siglo XVI hasta nuestros días, desarrollado en TEITOK. La importancia de su desarrollo está en que procesa datos lingüísticos en los alfabetos mencionados anteriormente, da al usuario la opción de visualizar cada texto en cinco formas gráficas (la versión original independientemente del alfabeto en el que fue escrita, su transcripción en caracteres latinos, una forma expandida para completar las abreviaturas o corregir la escritura defectuosa, una versión en judeoespañol moderno y una versión en la ortografía del español moderno), y permite realizar búsquedas no solo de una palabra específica sino de todas sus variantes lingüísticas y ortográficas en textos escritos en los diferentes alfabetos. Durante el proceso de anotación se fueron modificando las etiquetas de EAGLES para el español y se crearon algunas nuevas. Significa que, a medida que se van anotando los textos, vamos creando un etiquetador para el judeoespañol. Los textos digitalizados también se enriquecen con información semántico-conceptual e información sobre la filiación de todos los elementos no románicos que se detectan en los textos

    Quantification and scales in change

    Get PDF
    This volume contains thematic papers on semantic change which emerged from the second edition of Formal Diachronic Semantics held at Saarland University. Its authorship ranges from established scholars in the field of language change to advanced PhD students whose contributions have equally qualified and have been selected after a two-step peer-review process. The key foci are variablity and diachronic trajectories in scale structures and quantification, but readers will also find a variety of further (and clearly non-disjoint) issues covered including reference, modality, givenness, presuppositions, alternatives in language change, temporality, epistemic indefiniteness, as well as - in more general terms - the interfaces of semantics with syntax, pragmatics and morphology. Given the nature of the field, the contributions are primarily based on original corpus studies (in one case also on synchronic experimental data) and present a series of new findings and theoretical analyses of several languages, first and foremost from the Germanic and Romance subbranches of Indo-European (English, French, German, Italian, Spanish) and from Semitic (with an analysis of universal quantification in Biblical Hebrew)
    corecore