849 research outputs found

    A Language-Independent Approach to Extracting Derivational Relations from an Inflectional Lexicon

    Get PDF
    International audienceIn this paper, we describe and evaluate an unsupervised method for cquiring pairs of lexical entries belonging to the same morphological family, i.e., derivationally related words, starting from a purely inflectional lexicon. Our approach relies on transformation rules that relate lexical entries with the one another, and which are automatically extracted from the inflected lexicon based on surface form analogies and on part-of-speech information. It is generic enough to be applied to any language with a mainly concatenative derivational morphology. Results were obtained and evaluated on English, French, German and Spanish. Precision results are satisfying, and our French results favorably compare with another resource, although its construction relied on manually developed lexicographic information whereas our approach only requires an inflectional lexicon

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

    DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German

    Get PDF
    International audienceWe introduce DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German developed within the Alexina framework. We extracted lexical information from the German wiktionary and developed a morphological inflection grammar for German, based on a linguistically sound model of inflectional morphology. Although the developement of DeLex involved some manual work, we show that is represents a good tradeoff between development cost, lexical coverage and resource accuracy

    A Semi-automatic and Low Cost Approach to Build Scalable Lemma-based Lexical Resources for Arabic Verbs

    Get PDF
    International audienceThis work presents a method that enables Arabic NLP community to build scalable lexical resources. The proposed method is low cost and efficient in time in addition to its scalability and extendibility. The latter is reflected in the ability for the method to be incremental in both aspects, processing resources and generating lexicons. Using a corpus; firstly, tokens are drawn from the corpus and lemmatized. Secondly, finite state transducers (FSTs) are generated semi-automatically. Finally, FSTsare used to produce all possible inflected verb forms with their full morphological features. Among the algorithm’s strength is its ability to generate transducers having 184 transitions, which is very cumbersome, if manually designed. The second strength is a new inflection scheme of Arabic verbs; this increases the efficiency of FST generation algorithm. The experimentation uses a representative corpus of Modern Standard Arabic. The number of semi-automatically generated transducers is 171. The resulting open lexical resources coverage is high. Our resources cover more than 70% Arabic verbs. The built resources contain 16,855 verb lemmas and 11,080,355 fully, partially and not vocalized verbal inflected forms. All these resources are being made public and currently used as an open package in the Unitex framework available under the LGPL license

    A distributional semantic study on German event nominalizations

    Get PDF
    AbstractWe present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, 'the evaluation') and nominal infinitives (e.g., das Evaluieren, 'the evaluating'). Among the many available event nominalization patterns for German, we selected these two because they are both highly productive and challenging from the semantic point of view. Both patterns are known to keep a tight relation with the event denoted by the base verb, but with different nuances. Our study targets a better understanding of the differences in their semantic import.The key notion of our comparison is that of semantic transparency, and we propose a usage-based characterization of the relationship between derived nominals and their bases. Using methods from distributional semantics, we bring to bear two concrete measures of transparency which highlight different nuances: the first one, cosine, detects nominalizations which are semantically similar to their bases; the second one, distributional inclusion, detects nominalizations which are used in a subset of the contexts of the base verb. We find that only the inclusion measure helps in characterizing the difference between the two types of nominalizations, in relation with the traditionally considered variable of relative frequency (Hay, 2001). Finally, the distributional analysis allows us to frame our comparison in the broader coordinates of the inflection vs. derivation cline

    Morphology Within the Parallel Architecture Framework : the Centrality of the Lexicon Below the Word Level

    Get PDF
    The Parallel Architecture (PA) framework (Jackendoff 2002, 2007, Culicover & Jackendoff 2005) is one of the most complete constraint-based linguistic theories that encompasses phonology, syntax and semantics. However, it lacks a fully developed model of word formation. More recently, a theory called Relational Morphology (RM) (Jackendoff & Audring 2020) has been developed, that integrates into the PA. The current study shows how the Slot Structure model (Benavides 2003, 2009, 2010), which is compatible with the PA and is based on the dual-route model and percolation of features (Pinker 1999, 2006; Huang & Pinker 2010), can provide a better account of morphology than RM, and can also be incorporated into the PA, thus contributing to make this a more explanatory framework. Spanish data are used as the basis to demonstrate the implementation of the SSM. The current paper demonstrates two key problems for RM: inconsistent and confusing coindexation, and a proliferation of schemas, and shows that these issues do not arise in the Slot Structure model. Overall, the paper points out significant drawbacks in the RM framework, while at the same time showing how the PA's morphological component can be enriched with the Slot Structure model

    Grapho-morphological awareness in Spanish L2 reading

    Get PDF
    This paper contributes to the literature on the transferability of grapho-morphological awareness (GMA) for L2 learners by analyzing L2 learners' morphology knowledge at the word and text level. GMA helps readers to identify grammatical categories, infer meanings of unfamiliar words, and access stored lexical information (Koda, 2008). Previous research indicates that L2 GMA is influenced by L1 GMA (Fender 2003; Hancin-Bhatt & Nagy, 1994; Koda, 2000; Ramirez, et. al., 2010; Schiff & Calif, 2007).In this paper, native speakers of Spanish (n=30) and native speakers of English learning Spanish as an L2 (n=46) completed four tasks: two timed lexical decision tasks (LDT) in English (only English speakers) and Spanish; three short passages followed by multiple choice questions; a cloze task; and an interview to discuss their answers. L2 learners show a native-like word recognition pattern (Clahsen & Felser, 2006a, 2006b), providing evidence for a language-specific morphological processing. L2 learners could recognize and decompose words into morphemes and lexemes through the different tasks, which implies that they neither ignore morphology nor follow a whole-word reading approach. However, this ability did not always help them to access the right word meaning. Also, orthographically similar words from L1 and L2 interfere with word recognition of inflected and derived words. Despite showing interference in inflected words during the timed LDT, they show a greater control during the interviews. However, derivational morphology is more difficult for L2 learners since they do not know derivational constraints either implicitly or explicitly. The results suggest that intermediate L2 learners with an alphabetic writing system in their L1 can go beyond transfer in an alphabetic L2, and that the relationship between proficiency and GMA might be reciprocal (Kuo & Anderson, 2008)

    Analogical classification in formal grammar

    Get PDF
    The organization of the lexicon, and especially the relations between groups of lexemes is a strongly debated topic in linguistics. Some authors have insisted on the lack of any structure of the lexicon. In this vein, Di Sciullo & Williams (1987: 3) claim that “[t]he lexicon is like a prison – it contains only the lawless, and the only thing that its inmates have in commonis lawlessness”. In the alternative view, the lexicon is assumed to have a rich structure that captures all regularities and partial regularities that exist between lexical entries.Two very different schools of linguistics have insisted on the organization of the lexicon. On the one hand, for theories like HPSG (Pollard & Sag 1994), but also some versions of construction grammar (Fillmore & Kay 1995), the lexicon is assumed to have a very rich structure which captures common grammatical properties between its members. In this approach, a type hierarchy organizes the lexicon according to common properties between items. For example, Koenig (1999: 4, among others), working from an HPSG perspective, claims that the lexicon “provides a unified model for partial regularties, medium-size generalizations, and truly productive processes”. On the other hand, from the perspective of usage-based linguistics, several authors have drawn attention to the fact that lexemes which share morphological or syntactic properties, tend to be organized in clusters of surface (phonological or semantic) similarity (Bybee & Slobin 1982; Skousen 1989; Eddington 1996). This approach, often called analogical, has developed highly accurate computational and non-computational models that can predict the classes to which lexemes belong. Like the organization of lexemes in type hierarchies, analogical relations between items help speakers to make sense of intricate systems, and reduce apparent complexity (Köpcke & Zubin 1984). Despite this core commonality, and despite the fact that most linguists seem to agree that analogy plays an important role in language, there has been remarkably little work on bringing together these two approaches. Formal grammar traditions have been very successful in capturing grammatical behaviour, but, in the process, have downplayed the role analogy plays in linguistics (Anderson 2015). In this work, I aim to change this state of affairs. First, by providing an explicit formalization of how analogy interacts with grammar, and second, by showing that analogical effects and relations closely mirror the structures in the lexicon. I will show that both formal grammar approaches, and usage-based analogical models, capture mutually compatible relations in the lexicon
    • …
    corecore