Search CORE

16 research outputs found

On assigning place names to geography related web pages

Author: GOH Dion Hoe-Lian
LIM Ee Peng
SUN Aixin
WU Dan
ZONG Wenbo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

In this paper, we attempt to give spatial semantics to web pages by assigning them place names. The entire assignment task is divided into three sub-problems, namely place name extraction, place name disambiguation and place name assignment. We propose our approaches to address these sub-problems. In particular, we have modified GATE, a wellknown named entity extraction software, to perform place name extraction using a US Census gazetteer. A rule-based place name disambiguation method and a place name assignment method capable of assigning place names to web page segments have also been proposed. We have evaluated our proposed disambiguation and assignment methods on a web page collection referenced by the DLESE metadata collection. The results returned by our methods are compared with manually disambiguated place names and place name assignment. It is shown that our proposed place name disambiguation method works well for geo/geo ambiguities. The preliminary results of our place name assignment method indicate promising results given the existence of geo/non-geo ambiguities among place names.Published versio

Crossref

Institutional Knowledge at Singapore Management University

DR-NTU (Digital Repository of NTU)

One model to rule them all: unified classification model for geotagging websites

Author: Maslov Michael
Serdyukov Pavel
Volkov Alexey
Publication venue
Publication date: 01/01/2012
Field of study

The paper presents a novel approach to finding regional scopes (geotagging) of websites. It relies on a single binary classification model per region type to perform the multi-label classification and uses a variety of different features that have not been yet used together for machine-learning based regional classification of websites. The evaluation demonstrates the advantage of our one model per region type method versus the traditional one model per region approach

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Единая модель для геоклассификации веб-сайтов

Author: A. Volkov N.
Алексей Волков Николаевич
Publication venue: 'P.G. Demidov Yaroslavl State University'
Publication date: 20/04/2013
Field of study

The paper presents a novel approach to finding regional scopes (geotagging) of websites. Unlike the traditional approaches, which generally involve training a separate classification model for each class (region), the proposed method is based on training a single model which is used for all regions of the same type (e.g. cities). This approach is made possible by the usage of ”relative” features which indicate how a selected region matches up to other regions for a given website. The classification system uses a variety of features of different nature that have not been yet used together for machine-learning based regional classification of websites. The evaluation demonstrates the advantage of our ”one model per region type” method versus the traditional ”one model per region” approach. A separate experiment demonstrates the ability of the proposed classifier to successfully detect regions which were not present in the training set (which is impossible for traditional approaches).Работа представляет новый подход к задаче определения регионального фокуса веб-сайтов (геоклассификации). В отличие от традиционных подходов к многозначной классификации, когда для каждого класса (региона) обучается по отдельной классификационной модели, предлагаемый подход основан на обучении всего одной модели, которая используется для всех регионов одного типа (например, для городов). Такой подход становится возможным благодаря использованию "относительных" факторов, которые показывают, как некоторый выбранный регион соотносится с другими регионами для заданного веб-сайта. Классификатор задействует большой набор разнородных факторов, которые до этого момента не использовались вместе для геоклассификации веб-сайтов с применением машинного обучения. Оценка качества демонстрирует преимущество нашего подхода "по одной модели на тип региона" перед традиционным подходом "по одной модели на регион". Отдельный эксперимент демонстрирует способность описываемого классификатора успешно детектировать регионы, которые отсутствовали в обучающей выборке (что невозможно при использовании традиционных подходов)

Modeling and Analysis of Information Systems / Моделирование и анализ информационных систем (МАИС)

Improving the geospatial consistency of digital libraries metadata

Author: Lacasta Javier
Lopez-Pellicer Francisco J.
Muro-Medrano Pedro R.
Renteria-Agualimpia Walter
Zarazaga-Soria F. Javier
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

Consistency is an essential aspect of the quality of metadata. Inconsistent metadata records are harmful: given a themed query, the set of retrieved metadata records would contain descriptions of unrelated or irrelevant resources, and may even not contain some resources considered obvious. This is even worse when the description of the location is inconsistent. Inconsistent spatial descriptions may yield invisible or hidden geographical resources that cannot be retrieved by means of spatially themed queries. Therefore, ensuring spatial consistency should be a primary goal when reusing, sharing and developing georeferenced digital collections. We present a methodology able to detect geospatial inconsistencies in metadata collections based on the combination of spatial ranking, reverse geocoding, geographic knowledge organization systems and information-retrieval techniques. This methodology has been applied to a collection of metadata records describing maps and atlases belonging to the Library of Congress. The proposed approach was able to automatically identify inconsistent metadata records (870 out of 10,575) and propose fixes to most of them (91.5%) These results support the ability of the proposed methodology to assess the impact of spatial inconsistency in the retrievability and visibility of metadata records and improve their spatial consistency

Repositorio Universidad de Zaragoza

Linking archival data to location A case study at the UK National Archives

Author: Amy Warner
Jiayu Tang
Mark M. Hall
Paul Clough
Peter Willett
Publication venue: 'Emerald'
Publication date: 01/01/2011
Field of study

Purpose The National Archives (TNA) is the UK Government's official archive. It stores and maintains records spanning over a 1,000 years in both physical and digital form. Much of the information held by TNA includes references to place and frequently user queries to TNA's online catalogue involve searches for location. The purpose of this paper is to illustrate how TNA have extracted the geographic references in their historic data to improve access to the archives. Design/methodology/approach To be able to quickly enhance the existing archival data with geographic information, existing technologies from Natural Language Processing (NLP) and Geographical Information Retrieval (GIR) have been utilised and adapted to historical archives. Findings Enhancing the archival records with geographic information has enabled TNA to quickly develop a number of case studies highlighting how geographic information can improve access to large‐scale archival collections. The use of existing methods from the GIR domain and technologies, such as OpenLayers, enabled one to quickly implement this process in a way that is easily transferable to other institutions. Practical implications The methods and technologies described in this paper can be adapted, by other archives, to similarly enhance access to their historic data. Also the data‐sharing methods described can be used to enable the integration of knowledge held at different archival institutions. Originality/value Place is one of the core dimensions for TNA's archival data. Many of the records which are held make reference to place data (wills, legislation, court cases), and approximately one fifth of users' searches involve place names. However, there are still a number of open questions regarding the adaptation of existing GIR methods to the history domain. This paper presents an overview over available GIR methods and the challenges in applying them to historical data

Crossref

Open Research Online (The Open University)

Edge Hill University Research Information Repository

A Survey of Location Prediction on Twitter

Author: Han Jialong
Sun Aixin
Zheng Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Extracting Geospatial Entities from Wikipedia

Author: Jeremy Witmer
Jugal Kalita
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

This paper addresses the challenge of extracting geospa-tial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vec-tor Machine (SVM) for the task of geospatial named entity recognition. We target for testing a corpus of Wikipedia articles about battles and wars, as these have a high in-cidence of geospatial content. The SVM recognizes place names in the corpus with a very high recall, close to 100%, with an acceptable precision. The set of geospatial NEs is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not im-mediately geocode to a single location, we present a data structure and algorithm to resolve ambiguity based on sen-tence and article context, so the correct coordinates can be selected. We achieve an f-measure of 82%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovi-sualization of Wikipedia.

CiteSeerX

Crossref

Automatic Generation of Geospatial Metadata for Web Resources

Author: Florczyk Aneta Jadwiga
López-Pellicer Francisco Javier
Nogueras-Iso Javier
Zarazaga-Soria Francisco Javier
Publication venue: Publication Office of the European Union
Publication date: 23/05/2012
Field of study

Web resources that are not part of any Spatial Data Infrastructure can be an important source of information. However, the incorporation of Web resources within a Spatial Data Infrastructure requires a significant effort to create metadata. This work presents an extensible architecture for an automatic characterisation of Web resources and a strategy for assignation of their geographic scope. The implemented prototype generates automatically geospatial metadata for Web pages. The metadata model conforms to the Common Element Set, a set of core properties, which is encouraged by the OGC Catalogue Service Specification to permit the minimal implementation of a catalogue service independent of an application profile. The performed experiments consisted in the creation of metadata for Web pages of providers of Geospatial Web resources. The Web pages have been gathered by a Web crawler focused on OGC Web Services. The manual revision of the results has shown that the coverage estimation method applied produces acceptable results for more than 80% of tested Web resources

International Journal of Spatial Data Infrastructures Research (Joint Research Centre of the European Commission)

Enriching the Digital Library Experience: Innovations With Named Entity Recognition and Geographic Information System Technologies

Author: MacKay Adrienne W.
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2008
Field of study

Digital libraries are seeking innovative ways to share their resources and enhance user experience. To this end, numerous openly available technologies can be exploited. For this project, NER technology was applied to a subset of the Documenting the American South (DocSouth) digital collections. Personal and location names were hand-annotated to achieve a gold standard, and GATE, a text engineering tool, was run under two conditions: a defaults baseline and a test run that included gazetteers built from DocSouth's Colonial and State Records collection. Overall, GATE performance is promising, and numerous strategies for improvement are discussed. Next, derived location annotations were georeferenced and stored in a geodatabase through automated processes, and a prototype for a web-based map search was developed using the Google Maps API. This project showcases innovations with automated NER coupled with GIS technologies, and strongly supports further investment in applying these techniques across DocSouth and other digital libraries

Carolina Digital Repository