2,362 research outputs found

    The Perfective Past Tense in Greek Child Language

    Get PDF

    Lexical information from a minimalist point of view

    Get PDF
    Simplicity as a methodological orientation applies to linguistic theory just as to any other field of research: ‘Occam’s razor’ is the label for the basic heuristic maxim according to which an adequate analysis must ultimately be reduced to indispensible specifications. In this sense, conceptual economy has been a strict and stimulating guideline in the development of Generative Grammar from the very beginning. Halle’s (1959) argument discarding the level of taxonomic phonemics in order to unify two otherwise separate phonological processes is an early characteristic example; a more general notion is that of an evaluation metric introduced in Chomsky (1957, 1975), which relates the relative simplicity of alternative linguistic descriptions systematically to the quest for explanatory adequacy of the theory underlying the descriptions to be evaluated. Further proposals along these lines include the theory of markedness developed in Chomsky and Halle (1968), Kean (1975, 1981), and others, the notion of underspecification proposed e.g. in Archangeli (1984), Farkas (1990), the concept of default values and related notions. An important step promoting this general orientation was the idea of Principles and Parameters developed in Chomsky (1981, 1986), which reduced the notion of language particular rule systems to universal principles, subject merely to parametrization with restricted options, largely related to properties of particular lexical items. On this account, the notion of a simplicity metric is to be dispensed with, as competing analyses of relevant data are now supposed to be essentially excluded by the restrictive system of principles

    A broad-coverage distributed connectionist model of visual word recognition

    Get PDF
    In this study we describe a distributed connectionist model of morphological processing, covering a realistically sized sample of the English language. The purpose of this model is to explore how effects of discrete, hierarchically structured morphological paradigms, can arise as a result of the statistical sub-regularities in the mapping between word forms and word meanings. We present a model that learns to produce at its output a realistic semantic representation of a word, on presentation of a distributed representation of its orthography. After training, in three experiments, we compare the outputs of the model with the lexical decision latencies for large sets of English nouns and verbs. We show that the model has developed detailed representations of morphological structure, giving rise to effects analogous to those observed in visual lexical decision experiments. In addition, we show how the association between word form and word meaning also give rise to recently reported differences between regular and irregular verbs, even in their completely regular present-tense forms. We interpret these results as underlining the key importance for lexical processing of the statistical regularities in the mappings between form and meaning

    Inducing the Cross-Disciplinary Usage of Morphological Language Data Through Semantic Modelling

    Get PDF
    Despite the enormous technological advancements in the area of data creation and management the vast majority of language data still exists as digital single-use artefacts that are inaccessible for further research efforts. At the same time the advent of digitisation in science increased the possibilities for knowledge acquisition through the computational application of linguistic information for various disciplines. The purpose of this thesis, therefore, is to create the preconditions that enable the cross-disciplinary usage of morphological language data as a sub-area of linguistic data in order to induce a shared reusability for every research area that relies on such data. This involves the provision of morphological data on the Web under an open license and needs to take the prevalent diversity of data compilation into account. Various representation standards emerged across single disciplines which lead to heterogeneous data that differs with regard to complexity, scope and data formats. This situation requires a unifying foundation enabling direct reusability. As a solution to fill the gap of missing open data and to overcome the presence of isolated datasets a semantic data modelling approach is applied. Being rooted in the Linked Open Data (LOD) paradigm it pursues the creation of data as uniquely identifiable resources that are realised as URIs, accessible on the Web, available under an open license, interlinked with other resources, and adhere to Linked Data representation standards such as the RDF format. Each resource then contributes to the LOD cloud in which they are all interconnected. This unification results from ontologically shared bases that formally define the classification of resources and their relation to other resources in a semantically interoperable manner. Subsequently, the possibility of creating semantically structured data has sparked the formation of the Linguistic Linked Open Data (LLOD) research community and LOD sub-cloud containing primarily language resources. Over the last decade, ontologies emerged mainly for the domain of lexical language data which lead to a significant increase in Linked Data-based linguistic datasets. However, an equivalent model for morphological data is still missing, leading to a lack of this type of language data within the LLOD cloud. This thesis presents six publications that are concerned with the peculiarities of morphological data and the exploration of their semantic representation as an enabler of cross-disciplinary reuse. The Multilingual Morpheme Ontology (MMoOn Core) as well as an architectural framework for morphemic dataset creation as RDF resources are proposed as the first comprehensive domain representation model adhering to the LOD paradigm. It will be shown that MMoOn Core permits the joint representation of heterogeneous data sources such as interlinear glossed texts, inflection tables, the outputs of morphological analysers, lists of morphemic glosses or word-formation rules which are all equally labelled as “morphological data” across different research areas. Evidence for the applicability and adequacy of the semantic modelling entailed by the MMoOn Core ontology is provided by two datasets that were transformed from tabular data into RDF: the Hebrew Morpheme Inventory and Xhosa RDF dataset. Both further demonstrate how their integration into the LLOD cloud - by interlinking them with external language resources - yields insights that could not be obtained from the initial source data. Altogether the research conducted in this thesis establishes the foundation for an interoperable data exchange and the enrichment of morphological language data. It strives to achieve the broader goal of advancing language data-driven research by overcoming data barriers and discipline boundaries

    Prosodic form and identity effects in German

    Get PDF
    Identity effects in phonology are deviations from regular phonological form (i.e. canonical patterns) which are due to the relatedness between words. More specifically, identity effects are those deviations which have the function to enhance similarity in the surface phonological form of morphologically related words. In rule-based generative phonology the effects in question are described by means of the cycle. For example, the stress on the second syllable in cond[ɛ]nsation as opposed to the stresslessness of the second syllable in comp[ǝ]nsation is described by applying the stress rules initially to the sterns thereby yielding condénse and cómpensàte. Subsequently the stress rules are reapplied to the affixed words with the initial stress assignment (i.e. stress on the second syllable in condense, but not in compensate) leaving its mark in the output form (cf. Chomsky and Halle 1968). A second example are words like lie[p]los 'unloving' in German, which shows the effects of neutralization in coda position (i.e. only voiceless obstruents may occur in coda position) even though the obstruent should 'regularly' be syllabified in head position (i.e. bl is a wellformed syllable head in German). Here the stern is syllabified on an initial cycle, obstruent devoicing applies (i.e. lie[p]) and this structure is left intact when affixation applies (i.e. lie[p ]Ios ) (cf. Hall 1992). As a result the stern of lie[p]los is identical to the base lie[p]

    Cappadocian in the social media era

    Get PDF
    Until very recently, Cappadocian Greek seemed to have disappeared without a trace. Linguists and dialectologists even believed it had become extinct altogether. However, one Cappadocian variety, Mišótika, is still spoken in some villages and towns in the decentralized administrations of Macedonia and Thrace, Epirus and Western Macedonia, and Thessaly and Central Greece. The dialect is undergoing attrition under the growing pressure of Standard Modern Greek and its regional varieties and is actually being re-Hellenized. Even the oldest speakers make free use of Greek instead of Misiótika words and expressions and attrition is noticeable in at the phonological, morphological and syntactic levels. As a result, there are now many semi- or even would-be speakers whose speech is located somewhere on a continuum from Mišótika with Standard or Regional Modern Greek elements in it to Standard or Regional Modern Greek with Mišótika elements in it - in both cases mostly words and phrases. Over the past ten years, we have witnessed a growing interest in Mišótika as a marker of (Mišótika) Cappadocian identity. Speakers feel more confident to speak their language in public, for instance at the annual Gavoustima, where theatrical plays in Mišótika are now regularly performed by the syllogos of Néo Agionéri (to the amusement and also to bewilderment of the audience). Remarkably and very fortunately, Mišótika is now also used in the Social Media. I will concentrate here on Facebook, especially on the page called Έναρξη Διδασκαλίας Εκµάθησης Μυστιώτικου Ιδιώµατος ( group 470281169768316 on FB). The title is identical with the subtitle of Thomas Fates’ book Χ͜ιογός α ας χαρίσ̌’, which is some sort of “Teach Yourself Mišótika” and in which, interestingly, a special orthography for Mišótika has been developed. I will discuss the kind of information found on the FB page: questions, questionnaires, explanations of words and short phrases, folktales and other short stories, audio & video clips etc. Particular attention will be paid to the problems of using the Greek alphabet to write Mišótika in relation to the ongoing phonological attrition and also to the insecurity when it comes to interpretation linguistic phenomena in Mišótika

    Feature-based inheritance networks for computational lexicons

    Get PDF
    The virtues of viewing the lexicon as an inheritance network are its succinctness and its tendency to highlight significant clusters of linguistic properties. From its succinctness follow two practical advantages, namely its ease of maintenance and modification. In this paper we present a feature-based foundation for lexical inheritance. We argue that the feature-based foundation is both more economical and expressively more powerful than non-feature-based systems. It is more economical because it employs only mechanisms already assumed to be present elsewhere in the grammar (viz., in the feature system), and it is more expressive because feature systems are more expressive than other mechanisms used in expressing lexical inheritance (cf. DATR). The lexicon furthermore allows the use of default unification, based on the ideas of default unification, defined by Bouma. These claims are buttressed in sections sketching the opportunities for lexical description in feature-based lexicons in two central lexical topics, inflection and derivation. Briefly, we argue that the central notion of paradigm may be defined in feature structures, and that it may be more satisfactorily (in fact, immediately) linked to the syntactic information in this fashion. Our discussion of derivation is more programmatic; but here, too, we argue that feature structures of a suitably rich sort provide a foundation for the definition of lexical rules. We illustrate theoretical claims in application to German lexis. This work is currently under implementation in a natural language understanding effort (DISCO) at the German Artiffical Intelligence Center (Deutsches Forschungszentrum für Künstliche Intelligenz)
    corecore