6,499 research outputs found

    Detecting and ordering adjectival scalemates

    Get PDF
    This paper presents a pattern-based method that can be used to infer adjectival scales, such as , from a corpus. Specifically, the proposed method uses lexical patterns to automatically identify and order pairs of scalemates, followed by a filtering phase in which unrelated pairs are discarded. For the filtering phase, several different similarity measures are implemented and compared. The model presented in this paper is evaluated using the current standard, along with a novel evaluation set, and shown to be at least as good as the current state-of-the-art.Comment: Paper presented at MAPLEX 2015, February 9-10, Yamagata, Japan (http://lang.cs.tut.ac.jp/maplex2015/

    Mining Domain-Specific Thesauri from Wikipedia: A case study

    Get PDF
    Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts

    Enroller: an experiment in aggregating resources

    Get PDF
    This chapter describes a collaborative project between e-scientists and humanists working to create an online repository of linguistic data sets and tools. Corpora, dictionaries, and a thesaurus are brought together to enable a new method of research. It combines our most advanced knowledge in both computing and linguistic research techniques

    The civilizing process in London’s Old Bailey

    Get PDF
    The jury trial is a critical point where the state and its citizens come together to define the limits of acceptable behavior. Here we present a large-scale quantitative analysis of trial transcripts from the Old Bailey that reveal a major transition in the nature of this defining moment. By coarse-graining the spoken word testimony into synonym sets and dividing the trials based on indictment, we demonstrate the emergence of semantically distinct violent and nonviolent trial genres. We show that although in the late 18th century the semantic content of trials for violent offenses is functionally indistinguishable from that for nonviolent ones, a long-term, secular trend drives the system toward increasingly clear distinctions between violent and nonviolent acts. We separate this process into the shifting patterns that drive it, determine the relative effects of bureaucratic change and broader cultural shifts, and identify the synonym sets most responsible for the eventual genre distinguishability. This work provides a new window onto the cultural and institutional changes that accompany the monopolization of violence by the state, described in qualitative historical analysis as the civilizing process

    Retrieving with good sense

    Get PDF
    Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval

    Diachronic and synchronic thesauruses

    Get PDF
    No abstract available
    corecore