4 research outputs found

    Spatial Search Strategies for Open Government Data: A Systematic Comparison

    Full text link
    The increasing availability of open government datasets on the Web calls for ways to enable their efficient access and searching. There is however an overall lack of understanding regarding spatial search strategies which would perform best in this context. To address this gap, this work has assessed the impact of different spatial search strategies on performance and user relevance judgment. We harvested machine-readable spatial datasets and their metadata from three English-based open government data portals, performed metadata enhancement, developed a prototype and performed both a theoretical and user-based evaluation. The results highlight that (i) switching between area of overlap and Hausdorff distance for spatial similarity computation does not have any substantial impact on performance; and (ii) the use of Hausdorff distance induces slightly better user relevance ratings than the use of area of overlap. The data collected and the insights gleaned may serve as a baseline against which future work can compare.Comment: Paper accepted to GIR'19: 13th Workshop on Geographic Information Retrieval (Lyon, France

    A systematic comparison of spatial search strategies for open government datasets

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesDatasets produced or collected by governments are being made publicly available for re-use. Open government data portals help realize such reuse by providing list of datasets and links to access those datasets. This ensures that users can search, inspect and use the data easily. With the rapidly increasing size of datasets in open government data portals, just like it is the case with the web, nding relevant datasets with a query of few keywords is a challenge. Furthermore, those data portals not only consist of textual information but also georeferenced data that needs to be searched properly. Currently, most popular open government data portals like the data.gov.uk and data.gov.ie lack the support for simultaneous thematic and spatial search. Moreover, the use of query expansion hasn't also been studied in open government datasets. In this study we have assessed di erent spatial search strategies and query expansions' performance and impact on user relevance judgment. To evaluate those strategies we harvested machine readable spatial datasets and their metadata from three English based open government data portals, performed metadata enhancement, developed a prototype and performed theoretical and user evaluation. According to the results from the evaluations keyword based search strategy returned limited number of results but the highest relevance rating. In the other hand aggregated spatial and thematic search improved the number of results of the baseline keyword based strategy with a 1 second increase in response time and but decreased relevance rating. Moreover, strategies based on WordNet Synonyms query expansion exhibited the highest relevance rated rst seven results than all other strategies except the keyword based baseline strategy in three out of the four query terms. Regarding the use of Hausdor distance and area of overlap, since documents were returned as results only if they overlap with the query, the number of results returned were the same in both spatial similarities. But strategies using Hausdor distance were of higher relevance rating and average mean than area of overlap based strategies in three of the four queries. In conclusion, while the spatial search strategies assessed in this study can be used to improve the existing keyword based OGDs search approaches, we recommend OGD developers to also consider using WordNet Synonyms based query expansion and hausdor distance as a way of improving relevant spatial data discovery in open government datasets using few keywords and tolerable response time

    Ranking de relevância baseado em informações geográficas e sociais.

    Get PDF
    Recuperação de Informação Geográfica (GIR) é uma área de pesquisa que desenvolve e viabiliza a construção de mecanismos de busca por conteúdos distribuídos pela Internet envolvendo algum contexto geográfico. Os motores de busca geográfica, que são artefatos produzidos na área de GIR, podem ser especificados para trabalhar em diversos contextos (e.g., esportes, concursos públicos), buscando um tratamento adequado ao tipo de documento manipulado. Atualmente, a comunidade científica e o meio comercial vêm concentrando esforços na construção de motores de busca geográfica com o foco em encontrar notícias distribuídas na Internet. Contudo, motores de busca (geográfica ou não) com foco em notícias, deveriam considerar o fator de credibilidade da informação contida nas mesmas no momento de ordená-las. Infelizmente, na maior parte das vezes, isso não acontece. Mensurar a credibilidade de notícias é uma atividade onerosa e complexa, por exigir o conhecimento dos fatos relatados. Dessa forma, os motores de busca acabam deixando a cargo do usuário a responsabilidade em confiar no que está sendo lido. Nesse contexto, esta dissertação propõe um método de ranking de relevância com foco em notícias e baseado em informações colhidas em redes sociais, para valorar um grau de credibilidade e, assim, ordená-las. O valor de credibilidade da notícia é calculado considerando a afinidade dos usuários, que a compartilharam em sua rede social, com as localidades mencionadas na notícia. Por fim, o ranking de relevância proposto é integrado a uma ferramenta de busca e leitura de notícias, denominada GeoSEn News, que viabiliza a consulta por meio de diversas operações espaciais e permite a visualização dos resultados em diferentes perspectivas. Tal ferramenta foi utilizada para avaliar o método proposto através de experimentos utilizando dados colhidos na rede social Twitter e em mídias informativas espalhadas pelo Brasil. A avaliação apresentou resultados promissores e atestou a viabilidade da construção do ranking de relevância que se baseia em informações coletadas em redes sociais.Geographic Information Retrieval is a research field that develops and allows the construction of search engines to retrieve information with geographic context that is available on the Internet. Produced in the GIR field, geographic search engines can be specified to work in many different contexts (e.g., as sports, concerts), seeking proper ways to handle the chosen document type. Nowadays, the scientific community and the commerce are focusing efforts on building geographic search engines to find news over the Internet. However, search engines (geographical or otherwise) focused on news should consider the information credibility factor in the moment of ranking them. Unfortunately, in most cases, it is not what happens. Measure the news credibility is a complex and expensive task since it requires knowledge of the stated facts. Thereby, search engines end up giving the user the responsibility to trust or not what is being read. In this context, this work proposes a relevance ranking method focused in news and based on information collected from social networks, to evaluate a credibility factor and thus, rank them. The news credibility value is calculated considering the affinity of users who have shared it on their social network with the locations mentioned in the news. Lastly, the proposed relevance ranking is integrated with a search engine and reading news tool called GeoSEn News, which enables various spatial operations queries and allows result visualization in different perspectives. Through experiments using data collected in the social network Twitter and informational media throughout Brazil, this tool was used to evaluate the proposed method. The evaluation presented promising results and certified the feasibility of building relevance ranking based on information collected from social networks.Cape

    Relevance ranking in Geographical Information Retrieval

    No full text
    corecore