    A French Corpus Annotated for Multiword Expressions with Adverbial Function

    International audienceThis paper presents a French corpus annotated for multiword expressions (MWEs) with adverbial function. This corpus is designed for investigation on information retrieval and extraction, as well as on deep and shallow syntactic parsing. We delimit which kind of MWEs we annotated, we describe the resources and methods we used for the annotation, and we briefly comment the results. The annotated corpus is available at http://infolingu.univ-mlv.fr/ under the LGPLLR license.Cet article présente un corpus du français pourvu d'annotations sur les expressions multi-mots à fonction adverbiale. Ce corpus est conçu pour la recherche sur l'extraction d'information ainsi que sur l'analyse syntaxique profonde ou superficielle. Nous délimitons le type d'expressions multi-mots que nous avons annotées, nous décrivons les ressources et méthodes que nous avons utilisées pour l'annotation, et nous commentons brièvement les résultats. Le corpus annoté est disponible sur http://infolingu.univ-mlv.fr/ sous licence LGPLLR

    Coordination and timing of speech gestures in Parkinson’s disease

    Many individuals with Parkinson's disease (PD) experience articulatory difficulties, which often have a considerable impact on their quality of life. It is currently poorly understood which mechanisms underlie these articulatory difficulties. In order to learn more about these mechanisms, this dissertation examined the coordination and timing of speech gestures in PD speech. Both these aspects are intrinsic to articulation, but at current it is unknown how they relate to the articulatory difficulties observed in PD speech. The studies in this dissertation address this issue using state-of-the-art methods. In the first study of this thesis, the effect of levodopa on vowel articulation in PD was examined. The results from this study suggest that articulation of vowels is not influenced by levodopa. In the following two studies, spatial and temporal aspects of speech gestures have been examined. The results from these studies suggest that the timing of speech gestures, and also the coupling between speech gestures is impaired in PD. In the final study of this dissertation, the prevalence and nature of tongue tremor in individuals with PD were investigated. Using a computer algorithm, we found different types of tongue tremor in our data, which we believe may affect the timing of speech gestures. Together, the studies in this dissertation show that coordination and timing are indeed impaired in the speech of (at least some) individuals with PD. We believe that this impairment may be caused by the presence of malfunctioning regulatory mechanisms in PD speech

    Adjunction in hierarchical phrase-based translation

    Cappadocian kinship

    Cappadocian kinship systems are very interesting from a sociolinguistic and anthropological perspective because of the mixture of inherited Greek and borrowed Turkish kinship terms. Precisely because the number of Turkish kinship terms differs from one variety to another, it is necessary to talk about Cappadocian kinship systems in the plural rather than about the Cappadocian kinship system in the singular. Although reference will be made to other Cappadocian varieties, this paper will focus on the kinship systems of Mišotika and Aksenitika, the two Central Cappadocian dialects still spoken today in several communities in Greece. Particular attention will be given to the use of borrowed Turkish kinship terms, which sometimes seem to co-exist together with their inherited Greek counterparts, e.g. mána vs. néne ‘mother’, ailfó/aelfó vs. γardáš ‘brother’ etc. In the final part of the paper some kinship terms with obscure or hitherto unknown etymology will be discussed, e.g. káka ‘grandmother’, ižá ‘aunt’, lúva ‘uncle (father’s brother)’ etc

    A modular architecture for systematic text categorisation

    This work examines and attempts to overcome issues caused by the lack of formal standardisation when defining text categorisation techniques and detailing how they might be appropriately integrated with each other. Despite text categorisation’s long history the concept of automation is relatively new, coinciding with the evolution of computing technology and subsequent increase in quantity and availability of electronic textual data. Nevertheless insufficient descriptions of the diverse algorithms discovered have lead to an acknowledged ambiguity when trying to accurately replicate methods, which has made reliable comparative evaluations impossible. Existing interpretations of general data mining and text categorisation methodologies are analysed in the first half of the thesis and common elements are extracted to create a distinct set of significant stages. Their possible interactions are logically determined and a unique universal architecture is generated that encapsulates all complexities and highlights the critical components. A variety of text related algorithms are also comprehensively surveyed and grouped according to which stage they belong in order to demonstrate how they can be mapped. The second part reviews several open-source data mining applications, placing an emphasis on their ability to handle the proposed architecture, potential for expansion and text processing capabilities. Finding these inflexible and too elaborate to be readily adapted, designs for a novel framework are introduced that focus on rapid prototyping through lightweight customisations and reusable atomic components. Being a consequence of inadequacies with existing options, a rudimentary implementation is realised along with a selection of text categorisation modules. Finally a series of experiments are conducted that validate the feasibility of the outlined methodology and importance of its composition, whilst also establishing the practicality of the framework for research purposes. The simplicity of experiments and results gathered clearly indicate the potential benefits that can be gained when a formalised approach is utilised

    The ditransitive alternation in present-day German : a corpus-based analysis

    The ditransitive alternation in present-day German A corpus-based study Hilde De Vaere The study is a corpus-based analysis of the ditransitive alternation in present-day German with 17 noncomplex and complex verbs, viz. geben, schicken, senden; abgeben, preisgeben, übergeben, vergeben, weitergeben, zurückgeben; einschicken, einsenden, übersenden, zurückschicken, zurücksenden; ausleihen, verleihen and verkaufen. The alternating constructions are the Indirect Object Construction (IOC) and the Prepositional Object Construction (POC). Both alternants contain a trivalent transfer verb in combination with three arguments: an AGENT in the nominative, a THEME in the accusative and a RECIPIENT-like argument. The RECIPIENT-like argument can either be realised as a dative Noun Phrase or as a Prepositional Phrase introduced by an + accusative (or, alternatively, zu + dative with the verbs schicken and senden and their complex counterparts), resulting in IOC or POC, respectively. Statistical analyses of 7400 sentences retrieved from the IDS Mannheim’s DeReKo corpus and taken from German, Swiss, Austrian and Wikipedia sources show that the alternation is associated with multiple factors that are assumed to operate simultaneously. A major conclusion of the investigation is that predictors pertaining to the principle of Harmonic Alignment of the arguments (according to which animate, pronominal, definite, given, short arguments precede inanimate, nominal, indefinite, new and long arguments) play a role in the alternation, but that other predictors are involved as well and, hence, Harmonic Alignment only partly accounts for the German data. Apart from factors such as Case Syncretism and Propernounhood of the RECIPIENT argument, which relate to a tendency towards greater transparency associated with POC, properties specifically pertaining to the verb, the three denotational classes (viz. concrete, abstract, propositional) and various senses turn out to be important factors in view of a comprehensive account of the alternation. The alternation moreover proves to be stongly verb-dependent. The two alternating constructions IOC and POC are thus shown to relate to the semantics/pragmatics interface, which requires a careful analysis of the encoded and inferred meanings that ground the alternation. Apart from the Probabilistic Approach utilised to analyse the data, the theoretical framework in which the study is embedded is an Integrative Approach which takes into account both constructionist and projectionist assumptions in the analysis of morphosyntax and alternating constructions. With regard to the issues of meaning and sense, the analysis is couched in a Three-Layer Approach to meaning, in which a difference is made between encoded linguistic content (semantics proper) and inferred linguistic content (the domain of pragmatics). Importantly, the pragmatic level is further differentiated to account for the partly highly conventionalised variation in form and meaning at the intermediate level of ‘normal language use’, in line with the theories of meaning developed by E. Coseriu and S. Levinson. IOC and POC are thus not considered two encoded constructions in their own right in German grammar, but rather as two pragmatically defined ‘allostructions’ of an overarching general ‘constructeme’, which is termed the AGENT-THEME-GOAL construction. Both the verbs and the AGENT-THEME-GOAL construction contribute to the alternation with their general, underspecified meanings but they are varyingly enriched by encyclopaedic knowledge and a range of factors that pertain to pragmatics. IOC or POC can thus be shown to be associated with a large set of statistically significant factors that interact with each other and with the AGENT-THEME-GOAL construction, i.e. the ‘constructeme’ that underpins both IOC and POC. Samenvatting De studie is een corpusgebaseerde analyse van de ditransitieve alternantie in hedendaags Duits met een selectie van 17 non-complexe en complexe werkwoorden, meer bepaald: geben, schicken, senden; abgeben, preisgeben, übergeben, vergeben, weitergeben, zurückgeben; einschicken, einsenden, übersenden, zurückschicken, zurücksenden; ausleihen, verleihen en verkaufen. De alternantie doet zich voor tussen de zgn. ‘Indirect Object Constructie’ (IOC) en de ‘Prepositioneel Object Constructie’ (POC). Beide alternanten bevatten een trivalent werkwoord dat een transfer uitdrukt in combinatie met drie argumenten: een AGENS in de nominatief, een THEMA in de accusatief en een RECIPIENS-achtig (“ONTVANGER”) argument. Het laatstgenoemde argument kan ofwel in de datief gerealiseerd worden, of door middel van een voorzetselconstituent ingeleid door an + accusatief (of ook zu + datief met de werkwoorden schicken en senden en hun complexe tegenhangers), wat respectievelijk resulteert in IOC of POC. Statistische analyses van 7400 zinnen uit DeReKo (IDS Mannheim), opgevraagd uit Duitse, Zwitserse en Oostenrijkse bronnen en uit Wikipedia, tonen aan dat de alternantie geassocieerd kan worden met een samenspel van meerdere factoren die gelijktijdig de alternantie beïnvloeden. Een belangrijke conclusie van het onderzoek is dat predictoren met betrekking tot het principe van “Harmonic Alignment” van de argumenten (volgens welke animate, pronominale, definiete, gegeven, korte argumenten voorafgaan aan inanimate, nominale, indefiniete, nieuwe en lange argumenten) met zekerheid een rol spelen in de alternantie, maar de analyse brengt ook aan het licht dat andere predictoren eveneens van belang zijn en dat “Harmonic Alignment” de Duitse data slechts gedeeltelijk kan verklaren. Afgezien van factoren zoals Casussyncretisme en Proprialiteit (d.i. eigennaam vs. soortnaam) van de ONTVANGER-rol, die te maken hebben met een tendens tot grotere transparantie in POC, zijn er eigenschappen, meer specifiek met betrekking tot het werkwoord, de drie denotationele klassen (concreet, abstract, propositioneel) en diverse gebruiksbetekenissen (Eng. “senses”), die belangrijk zijn voor een omvattende verklaring van de alternantie tussen IOC en POC. De alternantie blijkt bovendien in sterke mate werkwoordspecifiek te zijn. De studie toont op die manier aan dat de twee alternanten gesitueerd moeten worden op het raakvlak van semantiek en pragmatiek, wat een nauwkeurige analyse van de gecodeerde en geïnfereerde betekenissen vereist waarop de alternantie gebaseerd is. Naast de Probabilistische Aanpak die gebruikt wordt voor de analyse van de data, is het theoretisch kader van de studie een Integrale Aanpak, die in de analyse van morfosyntaxis en alternerende constructies zowel met constructionele als met projectionistische principes rekening houdt. Met het oog op betekenis (Engels “meaning” en “senses”) wordt een analyse volgens drie betekenisniveaus voorgesteld, waarin een verschil gemaakt wordt tussen gecodeerde inhouden (de semantiek stricto sensu) en geïnfereerde inhouden (het domein van de pragmatiek). Belangrijk is dat het pragmatische niveau verder gedifferentieerd wordt om de deels sterk geconventionaliseerde variatie in vorm en betekenis op het intermediaire niveau van ‘normaal taalgebruik’ te verklaren, in overeenstemming met de betekenistheorieën ontwikkeld door E. Coseriu en S. Levinson. IOC en POC worden niet beschouwd als twee op zich gecodeerde constructies in de Duitse grammatica, maar als twee pragmatisch gedefinieerde ‘allostructies’ van een overkoepelend algemeen ‘constructeem’, dat de AGENS-THEMA-DOEL-constructie genoemd wordt. Zowel de werkwoorden als het ‘constructeem’ dragen bij tot de alternantie met hun algemene, ondergespecificeerde betekenissen, maar ze worden op verschillende manieren verrijkt met encyclopedische kennis en een reeks factoren die tot de pragmatiek behoren. Op die manier wordt er aangetoond dat IOC en POC in verband gebracht kunnen worden met een groot aantal statistisch significante factoren die interageren met elkaar en met de AGENS-THEMA-DOEL-constructie, het ‘constructeem’ dat aan de grondslag van IOC en POC ligt