62,941 research outputs found

    LocLinkVis: A Geographic Information Retrieval-Based System for Large-Scale Exploratory Search

    Get PDF
    In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory information access to a document collection based on geo-referencing and visualization. It uses a gazetteer which contains representations of places ranging from countries to buildings, and that is used to recognize toponyms, disambiguate them into places, and to visualize the resulting spatial footprints.Comment: SEM'1

    Quantitative Perspectives on Fifty Years of the Journal of the History of Biology

    Get PDF
    Journal of the History of Biology provides a fifty-year long record for examining the evolution of the history of biology as a scholarly discipline. In this paper, we present a new dataset and preliminary quantitative analysis of the thematic content of JHB from the perspectives of geography, organisms, and thematic fields. The geographic diversity of authors whose work appears in JHB has increased steadily since 1968, but the geographic coverage of the content of JHB articles remains strongly lopsided toward the United States, United Kingdom, and western Europe and has diversified much less dramatically over time. The taxonomic diversity of organisms discussed in JHB increased steadily between 1968 and the late 1990s but declined in later years, mirroring broader patterns of diversification previously reported in the biomedical research literature. Finally, we used a combination of topic modeling and nonlinear dimensionality reduction techniques to develop a model of multi-article fields within JHB. We found evidence for directional changes in the representation of fields on multiple scales. The diversity of JHB with regard to the representation of thematic fields has increased overall, with most of that diversification occurring in recent years. Drawing on the dataset generated in the course of this analysis, as well as web services in the emerging digital history and philosophy of science ecosystem, we have developed an interactive web platform for exploring the content of JHB, and we provide a brief overview of the platform in this article. As a whole, the data and analyses presented here provide a starting-place for further critical reflection on the evolution of the history of biology over the past half-century.Comment: 45 pages, 14 figures, 4 table

    Distantly Labeling Data for Large Scale Cross-Document Coreference

    Full text link
    Cross-document coreference, the problem of resolving entity mentions across multi-document collections, is crucial to automated knowledge base construction and data mining tasks. However, the scarcity of large labeled data sets has hindered supervised machine learning research for this task. In this paper we develop and demonstrate an approach based on ``distantly-labeling'' a data set from which we can train a discriminative cross-document coreference model. In particular we build a dataset of more than a million people mentions extracted from 3.5 years of New York Times articles, leverage Wikipedia for distant labeling with a generative model (and measure the reliability of such labeling); then we train and evaluate a conditional random field coreference model that has factors on cross-document entities as well as mention-pairs. This coreference model obtains high accuracy in resolving mentions and entities that are not present in the training data, indicating applicability to non-Wikipedia data. Given the large amount of data, our work is also an exercise demonstrating the scalability of our approach.Comment: 16 pages, submitted to ECML 201

    A constraint-based approach to noun phrase coreference resolution in German newspaper text

    Get PDF
    In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging

    Remarks on deixis

    Get PDF
    The prevailing conception of deixis is oriented to the idea of 'concrete' physical and perceptual characteristics of the situation of speech. Signs standardly adduced as typical deictics are I, you, here, now, this, that. I and you are defined as meaning "the person producing the utterance in question" and "the person spoken to", here and now as meaning "where the speaker is at utterance time" and "at the moment the utterance is made" (also, "at the place/time of the speech exchange"); similarly, the meanings of this and that are as a rule defined via proximity to speaker's physical location. The elements used in such definitions form the conceptual framework of most of the general characterisations of deixis in the literature. [...] There is much in the literature, of course, that goes far beyond this framework . A great variety of elements, mostly with very abstract meanings, have been found to share deictic characteristics although they do not fit into the personnel-place-time-of-utterance schema. The adequacy of that schema is also called into question by many observations to the effect that the use of such standard deictics as here, now, this, that cannot really be accounted for on its basis, and by the far-reaching possibilities of orienting deictics to reference points in situations other than the situation of speech, to 'deictic centers' other than the speaker. [...] Analyses along the lines of the standard conception regularly acknowledge the existence of deviations from the assumed basic meanings. One traditional solution attributes them to speaker's "subjectivity", or to differences between "physical" and "psychological" space or time; in a similar vein, metaphorical extensions may be said to be at play, or a distinction between prototypical and non-prototypical meanings invoked. Quite apart from the question of the relative merits of these explanatory principles, which I do not wish to discuss here, the problem with all such accounts is that the definitions of the assumed basic meanings themselves are founded on axiom rather than analysis of situated use. The logical alternative, of course, is to set out for more abstract and comprehensive meaning definitions from the start. In fact, a number of recent, discourse-oriented, treatments of the demonstratives proceed this way; they view those elements as processing instructions rather than signs with inherently spatial denotation (Isard 1975, Hawkins 1978, Kirsner 1979, Linde 1979 , Ehlich 1982.
    • …
    corecore