45,437 research outputs found

    The Porter stemming algorithm: then and now

    Get PDF
    Purpose: In 1980, Porter presented a simple algorithm for stemming English language words. This paper summarises the main features of the algorithm, and highlights its role not just in modern information retrieval research, but also in a range of related subject domains. Design: Review of literature and research involving use of the Porter algorithm. Findings: The algorithm has been widely adopted and extended so that it has become the standard approach to word conflation for information retrieval in a wide range of languages. Value: The 1980 paper in Program by Porter describing his algorithm has been highly cited. This paper provides a context for the original paper as well as an overview of its subsequent use

    Small clause results revisited

    Get PDF
    The main purpose of this paper is to show that argument structure constructions like complex telic path of motion constructions (John walked to the store) or complex resultative constructions (The dog barked the chickens awake) are not to be regarded as "theoretical entities" (Jackendoff (1997b); Goldberg (1995)). As an alternative to these semanticocentric accounts, I argue that their epiphenomenal status can be shown iff we take into account some important insights from three syntactically-oriented works: (i) Hoekstra's (1988, 1992) analysis of SC R, (ii) Hale & Keyser's (1993f.) configurational theory of argument structure, and (iii) Mateu & Rigau’s (1999; i.p.) syntactic account of Talmy's (1991) typological distinction between 'satellite framed languages' (e.g., English, German, Dutch, etc.) and 'verb-framed languages' (e.g., Catalan, Spanish, French, etc.). In particular, it is argued that the formation of the abovementioned constructions involves a conflation process of two different syntactic argument structures, this process being carried out via a 'generalized transformation'. Accordingly, the so-called 'lexical subordination process' (Levin & Rapoport (1988)) is argued to involve a syntactic operation, rather than a semantic one. Due to our assuming that the parametric variation involved in the constructions under study cannot be explained in purely semantic terms (Mateu & Rigau (1999)), Talmy's (1991) typological distinction is argued to be better stated in lexical syntactic terms

    Lexical typology : a programmatic sketch

    Get PDF
    The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

    Conflation: a new type of accelerated expansion

    Full text link
    In the framework of scalar-tensor theories of gravity, we construct a new kind of cosmological model that conflates inflation and ekpyrosis. During a phase of conflation, the universe undergoes accelerated expansion, but with crucial differences compared to ordinary inflation. In particular, the potential energy is negative, which is of interest for supergravity and string theory where both negative potentials and the required scalar-tensor couplings are rather natural. A distinguishing feature of the model is that, for a large parameter range, it does not significantly amplify adiabatic scalar and tensor fluctuations, and in particular does not lead to eternal inflation and the associated infinities. We also show how density fluctuations in accord with current observations may be generated by adding a second scalar field to the model. Conflation may be viewed as complementary to the recently proposed anamorphic universe of Ijjas and Steinhardt.Comment: 22 pages, 6 figures, replaced with published versio

    Thesaurus based automatic keyphrase indexing

    Get PDF
    We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus. We evaluate the results against keyphrase sets assigned by a state-of-the-art keyphrase extraction system and those assigned by six professional indexers

    The Place of the Mihanović Psalter in the Fourteenth-Century Revisions of the Church Slavonic Psalter

    Get PDF
    Modern scholarship on the textual history of Church Slavonic biblical translation recognizes two distinct revisions of the Church Slavonic Psalter from the early fourteenth century, Redaction III (sometimes called the ‘Athonite’ redaction) and Redaction IV, known only in the Norov psalter manuscript. Although they are both attested from the same period and in manuscripts of similar Bulgarian provenance, these two redactions are in some respects systematically different in their linguistic character, their approach to translational issues and their Greek textual basis. In the light of A.A. Turilov’s observation that the Mihanović Psalter, possibly the earliest witness to Redaction III, is written in the same hand as the greater part of the Norov Psalter, this paper examines the textual antecedents of the two redactions and the importance of the Mihanović Psalter as a link between them

    Sampling from Stochastic Finite Automata with Applications to CTC Decoding

    Full text link
    Stochastic finite automata arise naturally in many language and speech processing tasks. They include stochastic acceptors, which represent certain probability distributions over random strings. We consider the problem of efficient sampling: drawing random string variates from the probability distribution represented by stochastic automata and transformations of those. We show that path-sampling is effective and can be efficient if the epsilon-graph of a finite automaton is acyclic. We provide an algorithm that ensures this by conflating epsilon-cycles within strongly connected components. Sampling is also effective in the presence of non-injective transformations of strings. We illustrate this in the context of decoding for Connectionist Temporal Classification (CTC), where the predictive probabilities yield auxiliary sequences which are transformed into shorter labeling strings. We can sample efficiently from the transformed labeling distribution and use this in two different strategies for finding the most probable CTC labeling

    Why "consciousness" means what it does.

    Get PDF
    “Consciousness” seems to be a polysemic, ambiguous, term. Because of this, theorists have sought to distinguish the different kinds of phenomena that “consciousness” denotes, leading to a proliferation of terms for different kinds of consciousness. However, some philosophers—univocalists about consciousness—argue that “consciousness” is not polysemic or ambiguous. By drawing upon the history of philosophy and psychology, and some resources from semantic theory, univocalism about consciousness is shown to be implausible. This finding is important, for if we accept the univocalist account then we are less likely to subject our thought and talk about the mind to the kind of critical analysis that it needs. The exploration of the semantics of “consciousness” offered here, by way of contrast, clarifies and fine-tunes our thought and talk about consciousness and conscious mentality and explains why “consciousness” means what it does, and why it means a number of different, but related, things
    • 

    corecore