303 research outputs found

    Morphological Productivity in the Lexicon

    Full text link
    In this paper we outline a lexical organization for Turkish that makes use of lexical rules for inflections, derivations, and lexical category changes to control the proliferation of lexical entries. Lexical rules handle changes in grammatical roles, enforce type constraints, and control the mapping of subcategorization frames in valency-changing operations. A lexical inheritance hierarchy facilitates the enforcement of type constraints. Semantic compositions in inflections and derivations are constrained by the properties of the terms and predicates. The design has been tested as part of a HPSG grammar for Turkish. In terms of performance, run-time execution of the rules seems to be a far better alternative than pre-compilation. The latter causes exponential growth in the lexicon due to intensive use of inflections and derivations in Turkish.Comment: 10 pages LaTeX, {lingmacros,avm,psfig}.sty, 1 figure, 1 bibtex fil

    Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

    Get PDF
    How does the input segmentation of pretrained language models (PLMs) affect their interpretations of complex words? We present the first study investigating this question, taking BERT as the example PLM and focusing on its semantic representations of English derivatives. We show that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else need to be computed from the subwords, which implies that maximally meaningful input tokens should allow for the best generalization on new words. This hypothesis is confirmed by a series of semantic probing tasks on which DelBERT (Derivation leveraging BERT), a model with derivational input segmentation, substantially outperforms BERT with WordPiece segmentation. Our results suggest that the generalization capabilities of PLMs could be further improved if a morphologically-informed vocabulary of input tokens were used

    Neurocognitive processing of inflected and derived words

    Get PDF
    The representation of morphologically complex words in the mental lexicon and their neurocognitive processing has been a vigorously debated topic in psycholinguistics and the cognitive neuroscience of language. This thesis investigates the effect of stimulus modality on morphological processing, the spatiotemporal dynamics of the neural processing of inflected (e.g., work+ed ) and derived (e.g., work+er ) words and their interaction, using the Finnish language. Overall, the results suggest that the constituent morphemes of isolated written and spoken inflected words are accessed separately, whereas spoken derived words activate both their full form and the constituent morphemes. The processing of both spoken and written inflected words elicited larger N400 responses than monomorphemic words (Study I), whereas the responses to spoken derived words did not differ from those to monomorphemic words (Study IV). Spoken inflected words elicited a larger left-lateralized negativity and greater source strengths in the left temporal cortices than derived words (Study IV). Thus, the results suggest different cortical processing for derived and inflected words. Moreover, the neural mechanisms underlying inflection and derivation seem to be not only different, but also independent as indexed by the linear summation of the responses to derived and inflected stimuli in a combined (derivation+inflection) condition (Study III). Furthermore, the processing of meaningless, spoken derived pseudowords was more difficult than for existing derived words, indexed by a larger N400-type effect for the pseudowords. However, no differences were observed between meaningful derived pseudowords and existing derived words (Study II). The results of Study II suggest that semantic compatibility between morphemes seems to have a crucial role in a successful morphological analysis. As a methodological note, time-locking the auditory event-related potentials/fields (ERP/ERF) to the suffix onset revealed the processes related to morphological analysis more precisely (Studies II and IV), which also enables comparison of the neural processes in different modalities (Study I).Morfologisesti kompleksisten, eli useammasta itsenÀisestÀ merkitysyksiköstÀ muodostuvien sanojen kÀsittely aivoissa on ollut vilkkaan keskustelun kohteena psykolingvistiikassa sekÀ kielen kognitiivisessa neurotieteessÀ. TÀllaisia morfologisesti kompleksisia sanoja ovat esimerkiksi taivutetut ( työ + tÀ ) ja johdetut ( työ + tön ) sanat. Erityisesti keskustelua on kÀyty siitÀ, missÀ mÀÀrin tÀllaiset sanat haetaan muistista kokonaisina ja mikÀ tehtÀvÀ on mekanismilla, joka pilkkoo pÀÀtteet (kuten tÀ ja tön ) sanavartaloista. Samoin on ollut vielÀ epÀselvyyttÀ siitÀ miten kuullun ja luetun kielen kÀsittelymekanismit eroavat toisistaan. TÀssÀ vÀitöskirjassa tutkittiin taivutettujen ja johdettujen sanojen hermostollista kÀsittelyÀ, niiden vÀlistÀ vuorovaikutusta sekÀ aistipiirin vaikutusta sanojen morfologian kÀsittelyyn. VÀitöskirjatutkimuksen perusteella on mahdollista olettaa, ettÀ taivutettujen sanojen morfeemit ( työ + tÀ ) kÀsitellÀÀn erikseen lukemisen tai sanan kuulemisen aikana, kun taas johdosten osalta aktivoituvat sekÀ koko sanan edustuma ( työtön ) ettÀ yksittÀiset morfeemit ( työ + tön ). TaivutuspÀÀtteen yhdistÀminen sanavartaloon aktivoi voimakkaammin vasemman aivopuoliskon LAN- ja N400-vasteita. MEG-kokeessa löydettiin erityisesti taivutukseen liittyvÀ hermostollinen vaste ohimolohkon alueella. Tulosten mukaan taivutuksen ja johtamisen hermostolliset taustamekanismit ovat toisistaan erillisiÀ ja lisÀksi myös ainakin osittain toisistaan riippumattomia. JÀlkimmÀistÀ havaintoa tuki tulos, jonka mukaan yhdistelmÀtilanteen aiheuttamat jÀnnitevasteet olivat selitettÀvissÀ johtamisen ja taivutuksen erikseen aiheuttamien jÀnnitevasteiden yhtÀaikaisella summautumisella. Morfologisesti kompleksisten sanojen lukemisen ja kuulemisen mekanismien vÀlillÀ havaittiin myös eroja. TÀmÀ johtunee siitÀ, ettÀ kuultaessa sanaa kÀsitellÀÀn koko ajan sen edetessÀ ajassa, kun visuaalisessa modaliteetissa puolestaan kielellinen informaatio tullee kÀyttöön kokonaisena nopeammin. TÀstÀ johtuen morfologisia prosesseja tutkittaessa pitÀÀ kiinnittÀÀ erityistÀ huomiota siihen, millÀ ajanhetkellÀ morfologinen informaatio on aivojen kÀytettÀvissÀ aistipiiristÀ riippuen. VÀitöskirjatyössÀ kehitetty suffiksilukittujen vasteiden menetelmÀ auttaa tÀssÀ vertailussa

    Induction, Semantic Validation and Evaluation of a Derivational Morphology Lexicon for German

    Get PDF
    This thesis is about computational morphology for German derivation. Derivation is a word formation process that creates new words from existing ones, where the base and the derived word share the same stem. Mostly, derivation is conducted by means of relatively regular affixation rules, as in to bake - bakery. In German, derivation is highly productive, thus leading to a high language variability which can be employed to express similar facts in different ways, as derivationally related words are often also semantically related (or transparent). However, linguistic variance is a challenge for computational applications, particularly in semantic processing: It makes it more difficult to automatically grasp the meaning of texts and to match similar information onto each other. Thus, computational systems require linguistic knowledge. We develop methods to induce and represent derivational knowledge, and to apply it in language processing. The main outcome of our study is DErivBase, a German derivational lexicon. It groups derivationally related words (words that are derived from the same stem) into derivational families. To achieve high quality and high coverage, we induce DErivBase by combining rule-based and data-driven methods: We implement linguistic derivation rules to define derivational processes, and feed lemmas extracted from a German corpus into the rules to derive new lemmas. All words that are connected - directly or indirectly - by such rules are considered a derivational family. As mentioned above, a derivational relationship often implies semantic relationship, but this is not always the case. Semantic drifts can cause semantically unrelated (opaque) derivational relations, such as to depart - department. Capturing the difference between transparent and opaque relations is important from a linguistic as well as a practical point of view. Thus, we conduct a semantic refinement of DErivBase, i.e., we determine which lemma pairs are derivationally and semantically related, and which are not. We establish a second, semantically validated version of our lexicon, where families are sub-clustered according to semantic coherence, using supervised machine learning methods: We learn a binary classifier based on features that arise from structural information about the derivation rules, and from distributional information about the semantic relatedness of lemmas. Accordingly, the derivational families are subdivided into semantically coherent clusters. To demonstrate the utility of the two lexicon versions, we evaluate them on three extrinsic - and in the broadest sense, semantic - tasks. The underlying assumption for applying DErivBase to semantic tasks is that derivational relatedness is a reasonable approximation to semantic relatedness, since derivation is often semantically transparent. Our three experiments are the following: 1., we incorporate DErivBase into distributional semantic models to overcome sparsity problems and to improve the prediction quality of the underlying model. We test this method, which we call derivational smoothing, for semantic similarity prediction, and for synonym choice. 2., we employ DErivBase to model a psycholinguistic experiment that examines priming effects of transparent and opaque derivations to draw conclusions about the mental lexical representation in German. Derivational information is again incorporated into a distributional model, but this time, it introduces a kind of morphological generalisation. 3., in order to solve the task of Recognising Textual Entailment, we integrate DErivBase into a matching-based entailment system by means of a query expansion. Assuming that derivational relationships between two texts suggest them to be entailing rather than non-entailing, this expansion increases the chance of a lexical overlap, which should improve the system's entailment predictions. The incorporation of DErivBase indeed improves the performance of the underlying systems in each task, however, it is differently suitable in different settings. In experiment 1., the semantically validated lexicon yields improvements over the purely morphological lexicon, and the more coarse-grained similarity prediction profits more from DErivBase than the synonym choice. In experiment 2., purely morphological information clearly outperforms the other lexicon version, as the latter cannot model opaque derivations. On the entailment task in experiment 3., DErivBase has only minor impact, because textual entailment is hard to solve by addressing only one linguistic phenomenon. In sum, our findings show that the induction of a high-quality, high-coverage derivational lexicon is beneficial for very different applications in computational linguistics. It might be worthwhile to further investigate the semantic aspects of derivation to better understand its impact on language and thus, on language processing

    A Task-based Evaluation of French Morphological Resources and Tools

    Get PDF
    Morphology is a key component for many Language Technology applications. However, morphological relations, especially those relying on the derivation and compounding processes, are often addressed in a superïŹcial manner. In this article, we focus on assessing the relevance of deep and motivated morphological knowledge in Natural Language Processing applications. We ïŹrst describe an annotation experiment whose goal is to evaluate the role of morphology for one task, namely Question Answering (QA). We then highlight the kind of linguistic knowledge that is necessary for this particular task and propose a qualitative analysis of morphological phenomena in order to identify the morphological processes that are most relevant. Based on this study, we perform an intrinsic evaluation of existing tools and resources for French morphology, in order to quantify their coverage. Our conclusions provide helpful insights for using and building appropriate morphological resources and tools that could have a signiïŹcant impact on the application performance

    Simple Subject-Verb Agreement: a Morphosyntactic Path to Arabic Variations

    Get PDF
    The analytic object of this dissertation is to formally model the Arabic subject-verb agreement aspects, more particularly, the verbal agreement with simple subject DPs. It aims to define how φ-agreement is formally manifested across the Arabic varieties, more specifically, Standard Arabic and the current dialects, and hopes to draw the latter varieties’ interrelation. In other words, this thesis hopes to advance the overall understanding of subject-verb agreement in Arabic and contribute to a clearer and simpler view of a number of specific syntactic phenomena. Most important of all, the subject DP relative order with respect to the verbal predicate influences the possible subject-verb-agreement choices attested in Standard Arabic (SA), whereby a subject-verb (SV) order shows full agreement in all φ-features, but a verb-subject (VS) order shows only partial agreement, typically, in Gender and Person. Nonetheless, full subject-verb agreement in VS order is robustly found in different dialects of the Arab world, in which the Number feature is obligatory. Remarkably, not only is the partial agreement attested in SA absent in the modern dialects, but also Gender and Number morphology distinctions may often be minimized. On the one hand, a masculine agreement is syncretic whenever the agreement relation is established between a verbal predicate and dual or plural subject DPs, whether they are masculine or feminine. On the other hand, plural and dual nouns trigger plural agreement on the agreeing verbal predicate; the plural number is syncretic whenever the subject DP is plural or dual. What’s more, the Arabic (traditional) texts have an abundance of examples that do not conform to the SA norm of agreement and whose well-formedness is unquestionable, suggesting that the agreement asymmetry may not be absolute. These observations urge an in-depth investigation, assuming that they may present profound paradoxes when analyzed via the standard Agree-based mechanism. Despite the dissimilarity between SA and the modern dialects in terms of subject-verb agreement, these varieties are mostly alike in other matters. For these reasons, I believe that any account to the subject-verb agreement must take these points into consideration. To my knowledge, there has been no detailed analysis devoted to the interrelation between the standard variety and the modern dialects in terms of subject-verb agreement. So, believing that any syntactic account to the subject-verb agreement in Arabic ought to be flexible to cover the various agreement phenomena, I argue that the various (often outwardly non-canonical) agreement patterns in Arabic are manifestations of the core syntactic Agree mechanism. Their agreement behavior is often attributed to a fundamental mismatch between the syntactic and morphological components, subject to variety/dialect-specific requirements. In simple terms, taking the core properties of the Agree-based system to feature valuation (Chomsky, 2000 et seq.), the assumptions in Distributed Morphology (Halle & Marantz, 1993; 1994; Halle, 1994, among others), and the feature geometry advocated by Harley & Ritter (2002), among others, I posit that these agreement patterns attest very general conditions on the agreement and φ-feature manifestations in Arabic, defined in terms of restrictions on T’s φ-Probe that agrees with the subject DP. Overall, given the formulation of the conditions advanced, the agreement facts across the Arabic varieties, I believe, arise naturally and predictably from the interaction of Agree, conditions on T’s φ-Probe, and postsyntactic requirements

    Meaning versus Grammar

    Get PDF
    This volume investigates the complicated relationship between grammar, computation, and meaning in natural languages. It details conditions under which meaning-driven processing of natural language is feasible, discusses an operational and accessible implementation of the grammatical cycle for Dutch, and offers analyses of a number of further conjectures about constituency and entailment in natural language
    • 

    corecore