4 research outputs found
Spatial Search Strategies for Open Government Data: A Systematic Comparison
The increasing availability of open government datasets on the Web calls for
ways to enable their efficient access and searching. There is however an
overall lack of understanding regarding spatial search strategies which would
perform best in this context. To address this gap, this work has assessed the
impact of different spatial search strategies on performance and user relevance
judgment. We harvested machine-readable spatial datasets and their metadata
from three English-based open government data portals, performed metadata
enhancement, developed a prototype and performed both a theoretical and
user-based evaluation. The results highlight that (i) switching between area of
overlap and Hausdorff distance for spatial similarity computation does not have
any substantial impact on performance; and (ii) the use of Hausdorff distance
induces slightly better user relevance ratings than the use of area of overlap.
The data collected and the insights gleaned may serve as a baseline against
which future work can compare.Comment: Paper accepted to GIR'19: 13th Workshop on Geographic Information
Retrieval (Lyon, France
A systematic comparison of spatial search strategies for open government datasets
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesDatasets produced or collected by governments are being made publicly available
for re-use. Open government data portals help realize such reuse by providing list
of datasets and links to access those datasets. This ensures that users can search,
inspect and use the data easily.
With the rapidly increasing size of datasets in open government data portals,
just like it is the case with the web, nding relevant datasets with a query of few
keywords is a challenge. Furthermore, those data portals not only consist of textual
information but also georeferenced data that needs to be searched properly. Currently,
most popular open government data portals like the data.gov.uk and data.gov.ie lack
the support for simultaneous thematic and spatial search. Moreover, the use of query
expansion hasn't also been studied in open government datasets.
In this study we have assessed di erent spatial search strategies and query expansions'
performance and impact on user relevance judgment. To evaluate those
strategies we harvested machine readable spatial datasets and their metadata from
three English based open government data portals, performed metadata enhancement,
developed a prototype and performed theoretical and user evaluation.
According to the results from the evaluations keyword based search strategy returned
limited number of results but the highest relevance rating. In the other hand
aggregated spatial and thematic search improved the number of results of the baseline
keyword based strategy with a 1 second increase in response time and but decreased
relevance rating. Moreover, strategies based on WordNet Synonyms query expansion
exhibited the highest relevance rated rst seven results than all other strategies except
the keyword based baseline strategy in three out of the four query terms.
Regarding the use of Hausdor distance and area of overlap, since documents
were returned as results only if they overlap with the query, the number of results
returned were the same in both spatial similarities. But strategies using Hausdor
distance were of higher relevance rating and average mean than area of overlap based
strategies in three of the four queries.
In conclusion, while the spatial search strategies assessed in this study can be
used to improve the existing keyword based OGDs search approaches, we recommend
OGD developers to also consider using WordNet Synonyms based query expansion
and hausdor distance as a way of improving relevant spatial data discovery in open
government datasets using few keywords and tolerable response time
Ranking de relevância baseado em informações geográficas e sociais.
Recuperação de Informação Geográfica (GIR) é uma área de pesquisa que desenvolve e viabiliza a construção de mecanismos de busca por conteúdos distribuídos pela Internet envolvendo algum contexto geográfico. Os motores de busca geográfica, que são artefatos produzidos na área de GIR, podem ser especificados para trabalhar em diversos contextos (e.g., esportes, concursos públicos), buscando um tratamento adequado ao tipo de documento manipulado. Atualmente, a comunidade científica e o meio comercial vêm concentrando esforços na construção de motores de busca geográfica com o foco em encontrar notícias distribuídas na Internet. Contudo, motores de busca (geográfica ou não) com foco em notícias, deveriam considerar o fator de credibilidade da informação contida nas mesmas no momento de ordená-las. Infelizmente, na maior parte das vezes, isso não acontece. Mensurar a credibilidade de notícias é uma atividade onerosa e complexa, por exigir o conhecimento dos fatos relatados. Dessa forma, os motores de busca acabam deixando a cargo do usuário a responsabilidade em confiar no que está sendo lido. Nesse contexto, esta dissertação propõe um método de ranking de relevância com foco em notícias e baseado em informações colhidas em redes sociais, para valorar um grau de credibilidade e, assim, ordená-las. O valor de credibilidade da notícia é calculado considerando a afinidade dos usuários, que a compartilharam em sua rede social, com as localidades mencionadas na notícia. Por fim, o ranking de relevância proposto é integrado a uma ferramenta de busca e leitura de notícias, denominada GeoSEn News, que viabiliza a consulta por meio de diversas operações espaciais e permite a visualização dos resultados em diferentes perspectivas. Tal ferramenta foi utilizada para avaliar o método proposto através de experimentos utilizando dados colhidos na rede social Twitter e em mídias informativas espalhadas pelo Brasil. A avaliação apresentou resultados promissores e atestou a viabilidade da construção do ranking de relevância que se baseia em informações coletadas em redes sociais.Geographic Information Retrieval is a research field that develops and allows the construction of search engines to retrieve information with geographic context that is available on the Internet. Produced in the GIR field, geographic search engines can be specified to work in many different contexts (e.g., as sports, concerts), seeking proper ways to handle the chosen document type. Nowadays, the scientific community and the commerce are focusing efforts on building geographic search engines to find news over the Internet. However, search engines (geographical or otherwise) focused on news should consider the information credibility factor in the moment of ranking them. Unfortunately, in most cases, it is not what happens. Measure the news credibility is a complex and expensive task since it requires knowledge of the stated facts. Thereby, search engines end up giving the user the responsibility to trust or not what is being read. In this context, this work proposes a relevance ranking method focused in news and based on information collected from social networks, to evaluate a credibility factor and thus, rank them. The news credibility value is calculated considering the affinity of users who have shared it on their social network with the locations mentioned in the news. Lastly, the proposed relevance ranking is integrated with a search engine and reading news tool called GeoSEn News, which enables various spatial operations queries and allows result visualization in different perspectives. Through experiments using data collected in the social network Twitter and informational media throughout Brazil, this tool was used to evaluate the proposed method. The evaluation presented promising results and certified the feasibility of building relevance ranking based on information collected from social networks.Cape