956 research outputs found
Spatial Search Strategies for Open Government Data: A Systematic Comparison
The increasing availability of open government datasets on the Web calls for
ways to enable their efficient access and searching. There is however an
overall lack of understanding regarding spatial search strategies which would
perform best in this context. To address this gap, this work has assessed the
impact of different spatial search strategies on performance and user relevance
judgment. We harvested machine-readable spatial datasets and their metadata
from three English-based open government data portals, performed metadata
enhancement, developed a prototype and performed both a theoretical and
user-based evaluation. The results highlight that (i) switching between area of
overlap and Hausdorff distance for spatial similarity computation does not have
any substantial impact on performance; and (ii) the use of Hausdorff distance
induces slightly better user relevance ratings than the use of area of overlap.
The data collected and the insights gleaned may serve as a baseline against
which future work can compare.Comment: Paper accepted to GIR'19: 13th Workshop on Geographic Information
Retrieval (Lyon, France
Using WordNet for query expansion: ADAPT @ FIRE 2016 microblog track
User-generated content on social websites such as Twitter
is known to be an important source of real-time information on significant events as they occur, for example natural
disasters. Our participation in the FIRE 2016 Microblog
track, seeks to exploit WordNet as an external resource
for synonym-based query expansion to support improved
matching between search topics and the target Tweet collection. The results of our participation in this task show that
this is an effective method for use with a standard BM25
based information retrieval system for this task
Toponym Disambiguation in Information Retrieval
In recent years, geography has acquired a great importance in the context of
Information Retrieval (IR) and, in general, of the automated processing of
information in text. Mobile devices that are able to surf the web and at the
same time inform about their position are now a common reality, together
with applications that can exploit this data to provide users with locally
customised information, such as directions or advertisements. Therefore,
it is important to deal properly with the geographic information that is
included in electronic texts. The majority of such kind of information is
contained as place names, or toponyms.
Toponym ambiguity represents an important issue in Geographical Information
Retrieval (GIR), due to the fact that queries are geographically constrained.
There has been a struggle to nd speci c geographical IR methods
that actually outperform traditional IR techniques. Toponym ambiguity
may constitute a relevant factor in the inability of current GIR systems to
take advantage from geographical knowledge. Recently, some Ph.D. theses
have dealt with Toponym Disambiguation (TD) from di erent perspectives,
from the development of resources for the evaluation of Toponym Disambiguation
(Leidner (2007)) to the use of TD to improve geographical scope
resolution (Andogah (2010)). The Ph.D. thesis presented here introduces
a TD method based on WordNet and carries out a detailed study of the
relationship of Toponym Disambiguation to some IR applications, such as
GIR, Question Answering (QA) and Web retrieval.
The work presented in this thesis starts with an introduction to the applications
in which TD may result useful, together with an analysis of the
ambiguity of toponyms in news collections. It could not be possible to
study the ambiguity of toponyms without studying the resources that are
used as placename repositories; these resources are the equivalent to language
dictionaries, which provide the di erent meanings of a given word.Buscaldi, D. (2010). Toponym Disambiguation in Information Retrieval [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8912Palanci
Firsthand Opiates Abuse on Social Media: Monitoring Geospatial Patterns of Interest Through a Digital Cohort
In the last decade drug overdose deaths reached staggering proportions in the
US. Besides the raw yearly deaths count that is worrisome per se, an alarming
picture comes from the steep acceleration of such rate that increased by 21%
from 2015 to 2016. While traditional public health surveillance suffers from
its own biases and limitations, digital epidemiology offers a new lens to
extract signals from Web and Social Media that might be complementary to
official statistics. In this paper we present a computational approach to
identify a digital cohort that might provide an updated and complementary view
on the opioid crisis. We introduce an information retrieval algorithm suitable
to identify relevant subspaces of discussion on social media, for mining data
from users showing explicit interest in discussions about opioid consumption in
Reddit. Moreover, despite the pseudonymous nature of the user base, almost 1.5
million users were geolocated at the US state level, resembling the census
population distribution with a good agreement. A measure of prevalence of
interest in opiate consumption has been estimated at the state level, producing
a novel indicator with information that is not entirely encoded in the standard
surveillance. Finally, we further provide a domain specific vocabulary
containing informal lexicon and street nomenclature extracted by user-generated
content that can be used by researchers and practitioners to implement novel
digital public health surveillance methodologies for supporting policy makers
in fighting the opioid epidemic.Comment: Proceedings of the 2019 World Wide Web Conference (WWW '19
Query Expansion Dengan Menggabungkan Metode Ruang Vektor Dan Wordnet Pada Sistem Information Retrieval
Salah satu metode yang sering digunakan dalam mengukur relevansi dokumen padasistem information retrieval adalah vector space model. Dalam pengembangan metode ini,salah satunya dapat dilakukan dengan cara melakukan perluasan terhadap vektor querynya.Perluasan dilakukan dengan menggunakan wordnet pada term-term penyusun querydengan harapan agar hasil dari sistem dapat ditingkatkan
Automatically organising images using concept hierarchies
In this paper we discuss the use of concept hierarchies, an approach to automatically organize a set of documents based upon a set of concepts derived from the documents themselves for image retrieval. Co-occurrence between terms associated with image captions and a statistical relation called subsumption are used to generate term clusters which are organized hierarchically. Previously, the approach has been studied for document retrieval and results have shown that automatically generating hierarchies can help users with their search task. In this paper we present an implementation of concept hierarchies for image retrieval, together with preliminary ad-hoc evaluation. Although our approach requires more investigation, initial results from a prototype system are promising and would appear to provide a useful summary of the search results
Geodata source retrieval by multilingual/semantic query expansion: the Case of Google Translate and WordNet version 3.1
In this article, we examined the potential of the current version of WordNet and Google Translate API to enhance the quality of geodata source retrieval in the Dutch geoinformation portal (PDOK) using semantic keywords for the geographic phenomena requested. Keywords gathered from real users’ questions in natural language extracted in an English corpus. Then, these keywords were expanded using WordNet and Google Translate API. Lastly, the results of query expansion were evaluated compared to a manual gold standard and based on information retrieval metrics. Our study shows that the results of query expansion help users by reformulating good alternative queries
A Method for the Construction and Application of the Term Hierarchy Relationship Residing in Relevance Feedback
In the field of information retrieval, the information of term frequency contained in relevance feedback has been widely used. However, the analysis and application of term frequency does not cover the semantic meaning of the terms, which could make the retrieval results deviate from the user’s searching goal. Consider the semantic meaning of the terms, Wille (1992) had proposed a structured view in the dealing with the term relationships of the terms in the retrieval documents. To enhance the effectiveness of information retrieval by the dealing with the mentioned information of term hierarchy relationship, this study has developed a method of query expansion to extract and apply this information contained in relevance feedback first, and then conducted some formal tests to verify the efficiency of the method in the re-ranking of the retrieved documents. The results of the formal tests show that the proposed method of query expansion is more effective than the Rocchio’s query expansion algorithm. The contribution of this study is the disclosure of the applicability of the information of term hierarchy relationship contained in relevance feedback, and the demonstration of the application of this information
Automatic tagging and geotagging in video collections and communities
Automatically generated tags and geotags hold great promise
to improve access to video collections and online communi-
ties. We overview three tasks offered in the MediaEval 2010
benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features
- …