62,941 research outputs found
LocLinkVis: A Geographic Information Retrieval-Based System for Large-Scale Exploratory Search
In this paper we present LocLinkVis (Locate-Link-Visualize); a system which
supports exploratory information access to a document collection based on
geo-referencing and visualization. It uses a gazetteer which contains
representations of places ranging from countries to buildings, and that is used
to recognize toponyms, disambiguate them into places, and to visualize the
resulting spatial footprints.Comment: SEM'1
Quantitative Perspectives on Fifty Years of the Journal of the History of Biology
Journal of the History of Biology provides a fifty-year long record for
examining the evolution of the history of biology as a scholarly discipline. In
this paper, we present a new dataset and preliminary quantitative analysis of
the thematic content of JHB from the perspectives of geography, organisms, and
thematic fields. The geographic diversity of authors whose work appears in JHB
has increased steadily since 1968, but the geographic coverage of the content
of JHB articles remains strongly lopsided toward the United States, United
Kingdom, and western Europe and has diversified much less dramatically over
time. The taxonomic diversity of organisms discussed in JHB increased steadily
between 1968 and the late 1990s but declined in later years, mirroring broader
patterns of diversification previously reported in the biomedical research
literature. Finally, we used a combination of topic modeling and nonlinear
dimensionality reduction techniques to develop a model of multi-article fields
within JHB. We found evidence for directional changes in the representation of
fields on multiple scales. The diversity of JHB with regard to the
representation of thematic fields has increased overall, with most of that
diversification occurring in recent years. Drawing on the dataset generated in
the course of this analysis, as well as web services in the emerging digital
history and philosophy of science ecosystem, we have developed an interactive
web platform for exploring the content of JHB, and we provide a brief overview
of the platform in this article. As a whole, the data and analyses presented
here provide a starting-place for further critical reflection on the evolution
of the history of biology over the past half-century.Comment: 45 pages, 14 figures, 4 table
Distantly Labeling Data for Large Scale Cross-Document Coreference
Cross-document coreference, the problem of resolving entity mentions across
multi-document collections, is crucial to automated knowledge base construction
and data mining tasks. However, the scarcity of large labeled data sets has
hindered supervised machine learning research for this task. In this paper we
develop and demonstrate an approach based on ``distantly-labeling'' a data set
from which we can train a discriminative cross-document coreference model. In
particular we build a dataset of more than a million people mentions extracted
from 3.5 years of New York Times articles, leverage Wikipedia for distant
labeling with a generative model (and measure the reliability of such
labeling); then we train and evaluate a conditional random field coreference
model that has factors on cross-document entities as well as mention-pairs.
This coreference model obtains high accuracy in resolving mentions and entities
that are not present in the training data, indicating applicability to
non-Wikipedia data. Given the large amount of data, our work is also an
exercise demonstrating the scalability of our approach.Comment: 16 pages, submitted to ECML 201
A constraint-based approach to noun phrase coreference resolution in German newspaper text
In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging
Remarks on deixis
The prevailing conception of deixis is oriented to the idea of 'concrete' physical and perceptual characteristics of the situation of speech. Signs standardly adduced as typical deictics are I, you, here, now, this, that. I and you are defined as meaning "the person producing the utterance in question" and "the person spoken to", here and now as meaning "where the speaker is at utterance time" and "at the moment the utterance is made" (also, "at the place/time of the speech exchange"); similarly, the meanings of this and that are as a rule defined via proximity to speaker's physical location. The elements used in such definitions form the conceptual framework of most of the general characterisations of deixis in the literature. [...] There is much in the literature, of course, that goes far beyond this framework . A great variety of elements, mostly with very abstract meanings, have been found to share deictic characteristics although they do not fit into the personnel-place-time-of-utterance schema. The adequacy of that schema is also called into question by many observations to the effect that the use of such standard deictics as here, now, this, that cannot really be accounted for on its basis, and by the far-reaching possibilities of orienting deictics to reference points in situations other than the situation of speech, to 'deictic centers' other than the speaker. [...] Analyses along the lines of the standard conception regularly acknowledge the existence of deviations from the assumed basic meanings. One traditional solution attributes them to speaker's "subjectivity", or to differences between "physical" and "psychological" space or time; in a similar vein, metaphorical extensions may be said to be at play, or a distinction between prototypical and non-prototypical meanings invoked. Quite apart from the question of the relative merits of these explanatory principles, which I do not wish to discuss here, the problem with all such accounts is that the definitions of the assumed basic meanings themselves are founded on axiom rather than analysis of situated use. The logical alternative, of course, is to set out for more abstract and comprehensive meaning definitions from the start. In fact, a number of recent, discourse-oriented, treatments of the demonstratives proceed this way; they view those elements as processing instructions rather than signs with inherently spatial denotation (Isard 1975, Hawkins 1978, Kirsner 1979, Linde 1979 , Ehlich 1982.
- âŚ