6,388 research outputs found
Historical collaborative geocoding
The latest developments in digital have provided large data sets that can
increasingly easily be accessed and used. These data sets often contain
indirect localisation information, such as historical addresses. Historical
geocoding is the process of transforming the indirect localisation information
to direct localisation that can be placed on a map, which enables spatial
analysis and cross-referencing. Many efficient geocoders exist for current
addresses, but they do not deal with the temporal aspect and are based on a
strict hierarchy (..., city, street, house number) that is hard or impossible
to use with historical data. Indeed historical data are full of uncertainties
(temporal aspect, semantic aspect, spatial precision, confidence in historical
source, ...) that can not be resolved, as there is no way to go back in time to
check. We propose an open source, open data, extensible solution for geocoding
that is based on the building of gazetteers composed of geohistorical objects
extracted from historical topographical maps. Once the gazetteers are
available, geocoding an historical address is a matter of finding the
geohistorical object in the gazetteers that is the best match to the historical
address. The matching criteriae are customisable and include several dimensions
(fuzzy semantic, fuzzy temporal, scale, spatial precision ...). As the goal is
to facilitate historical work, we also propose web-based user interfaces that
help geocode (one address or batch mode) and display over current or historical
topographical maps, so that they can be checked and collaboratively edited. The
system is tested on Paris city for the 19-20th centuries, shows high returns
rate and is fast enough to be used interactively.Comment: WORKING PAPE
The DIGMAP geo-temporal web gazetteer service
This paper presents the DIGMAP geo-temporal Web gazetteer service, a system providing access to names of places, historical periods, and associated geo-temporal information. Within the DIGMAP project, this gazetteer serves as the unified repository of geographic and temporal information, assisting in the recognition and disambiguation of geo-temporal expressions over text, as well as in resource searching and indexing. We describe the data integration methodology, the handling of temporal information and some of the applications that use the gazetteer. Initial evaluation results show that the proposed system can adequately support several tasks related to geo-temporal information extraction and retrieval
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
We describe the CoNLL-2003 shared task: language-independent named entity
recognition. We give background information on the data sets (English and
German) and the evaluation method, present a general overview of the systems
that have taken part in the task and discuss their performance
Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity
In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands)
Recommended from our members
Linking early geospatial documents, one place at a time: annotation of geographic documents with Recogito
Recogito is an open source tool for the semi-automatic annotation of place references in maps and texts. It was developed as part of the Pelagios 3 research project, which aims to build up a comprehensive directory of places referred to in early maps and geographic writing predating the year 1492. Pelagios 3 focuses specifically on sources from the Classical Latin, Greek and Byzantine periods; on Mappae Mundi and narrative texts from the European Medieval period; on Late Medieval Portolans; and on maps and texts from the early Islamic and early Chinese traditions. Since the start of the project in September 2013, the team has harvested more than 120,000 toponyms, manually verifying almost 60,000 of them. Furthermore, the team held two public annotation workshops supported through the Open Humanities Awards 2014. In these workshops, a mixed audience of students and academics of different backgrounds used Recogito to add several thousand contributions on each workshop day.
A number of benefits arise out of this work: on the one hand, the digital identification of places – and the names used for them – makes the documents' contents amenable to information retrieval technology, i.e. documents become more easily search- and discoverable to users than through conventional metadata-based search alone. On the other hand, the documents are opened up to new forms of re-use. For example, it becomes possible to “map” and compare the narrative of texts, and the contents of maps with modern day tools like Web maps and GIS; or to analyze and contrast documents’ geographic properties, toponymy and spatial relationships. Seen in a wider context, we argue that initiatives such as ours contribute to the growing ecosystem of the “Graph of Humanities Data” that is gathering pace in the Digital Humanities (linking data about people, places, events, canonical references, etc.), which has the potential to open up new avenues for computational and quantitative research in a variety of fields including History, Geography, Archaeology, Classics, Genealogy and Modern Languages
Semi-supervised sequence tagging with bidirectional language models
Pre-trained word embeddings learned from unlabeled text have become a
standard component of neural network architectures for NLP tasks. However, in
most cases, the recurrent network that operates on word-level representations
to produce context sensitive representations is trained on relatively little
labeled data. In this paper, we demonstrate a general semi-supervised approach
for adding pre- trained context embeddings from bidirectional language models
to NLP systems and apply it to sequence labeling tasks. We evaluate our model
on two standard datasets for named entity recognition (NER) and chunking, and
in both cases achieve state of the art results, surpassing previous systems
that use other forms of transfer or joint learning with additional labeled data
and task specific gazetteers.Comment: To appear in ACL 201
KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition
KnowNER is a multilingual Named Entity Recognition (NER) system that
leverages different degrees of external knowledge. A novel modular framework
divides the knowledge into four categories according to the depth of knowledge
they convey. Each category consists of a set of features automatically
generated from different information sources (such as a knowledge-base, a list
of names or document-specific semantic annotations) and is used to train a
conditional random field (CRF). Since those information sources are usually
multilingual, KnowNER can be easily trained for a wide range of languages. In
this paper, we show that the incorporation of deeper knowledge systematically
boosts accuracy and compare KnowNER with state-of-the-art NER approaches across
three languages (i.e., English, German and Spanish) performing amongst
state-of-the art systems in all of them
- …