956 research outputs found

    Spatial Search Strategies for Open Government Data: A Systematic Comparison

    Full text link
    The increasing availability of open government datasets on the Web calls for ways to enable their efficient access and searching. There is however an overall lack of understanding regarding spatial search strategies which would perform best in this context. To address this gap, this work has assessed the impact of different spatial search strategies on performance and user relevance judgment. We harvested machine-readable spatial datasets and their metadata from three English-based open government data portals, performed metadata enhancement, developed a prototype and performed both a theoretical and user-based evaluation. The results highlight that (i) switching between area of overlap and Hausdorff distance for spatial similarity computation does not have any substantial impact on performance; and (ii) the use of Hausdorff distance induces slightly better user relevance ratings than the use of area of overlap. The data collected and the insights gleaned may serve as a baseline against which future work can compare.Comment: Paper accepted to GIR'19: 13th Workshop on Geographic Information Retrieval (Lyon, France

    Using WordNet for query expansion: ADAPT @ FIRE 2016 microblog track

    Get PDF
    User-generated content on social websites such as Twitter is known to be an important source of real-time information on significant events as they occur, for example natural disasters. Our participation in the FIRE 2016 Microblog track, seeks to exploit WordNet as an external resource for synonym-based query expansion to support improved matching between search topics and the target Tweet collection. The results of our participation in this task show that this is an effective method for use with a standard BM25 based information retrieval system for this task

    Toponym Disambiguation in Information Retrieval

    Full text link
    In recent years, geography has acquired a great importance in the context of Information Retrieval (IR) and, in general, of the automated processing of information in text. Mobile devices that are able to surf the web and at the same time inform about their position are now a common reality, together with applications that can exploit this data to provide users with locally customised information, such as directions or advertisements. Therefore, it is important to deal properly with the geographic information that is included in electronic texts. The majority of such kind of information is contained as place names, or toponyms. Toponym ambiguity represents an important issue in Geographical Information Retrieval (GIR), due to the fact that queries are geographically constrained. There has been a struggle to nd speci c geographical IR methods that actually outperform traditional IR techniques. Toponym ambiguity may constitute a relevant factor in the inability of current GIR systems to take advantage from geographical knowledge. Recently, some Ph.D. theses have dealt with Toponym Disambiguation (TD) from di erent perspectives, from the development of resources for the evaluation of Toponym Disambiguation (Leidner (2007)) to the use of TD to improve geographical scope resolution (Andogah (2010)). The Ph.D. thesis presented here introduces a TD method based on WordNet and carries out a detailed study of the relationship of Toponym Disambiguation to some IR applications, such as GIR, Question Answering (QA) and Web retrieval. The work presented in this thesis starts with an introduction to the applications in which TD may result useful, together with an analysis of the ambiguity of toponyms in news collections. It could not be possible to study the ambiguity of toponyms without studying the resources that are used as placename repositories; these resources are the equivalent to language dictionaries, which provide the di erent meanings of a given word.Buscaldi, D. (2010). Toponym Disambiguation in Information Retrieval [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8912Palanci

    Firsthand Opiates Abuse on Social Media: Monitoring Geospatial Patterns of Interest Through a Digital Cohort

    Get PDF
    In the last decade drug overdose deaths reached staggering proportions in the US. Besides the raw yearly deaths count that is worrisome per se, an alarming picture comes from the steep acceleration of such rate that increased by 21% from 2015 to 2016. While traditional public health surveillance suffers from its own biases and limitations, digital epidemiology offers a new lens to extract signals from Web and Social Media that might be complementary to official statistics. In this paper we present a computational approach to identify a digital cohort that might provide an updated and complementary view on the opioid crisis. We introduce an information retrieval algorithm suitable to identify relevant subspaces of discussion on social media, for mining data from users showing explicit interest in discussions about opioid consumption in Reddit. Moreover, despite the pseudonymous nature of the user base, almost 1.5 million users were geolocated at the US state level, resembling the census population distribution with a good agreement. A measure of prevalence of interest in opiate consumption has been estimated at the state level, producing a novel indicator with information that is not entirely encoded in the standard surveillance. Finally, we further provide a domain specific vocabulary containing informal lexicon and street nomenclature extracted by user-generated content that can be used by researchers and practitioners to implement novel digital public health surveillance methodologies for supporting policy makers in fighting the opioid epidemic.Comment: Proceedings of the 2019 World Wide Web Conference (WWW '19

    Query Expansion Dengan Menggabungkan Metode Ruang Vektor Dan Wordnet Pada Sistem Information Retrieval

    Full text link
    Salah satu metode yang sering digunakan dalam mengukur relevansi dokumen padasistem information retrieval adalah vector space model. Dalam pengembangan metode ini,salah satunya dapat dilakukan dengan cara melakukan perluasan terhadap vektor querynya.Perluasan dilakukan dengan menggunakan wordnet pada term-term penyusun querydengan harapan agar hasil dari sistem dapat ditingkatkan

    Automatically organising images using concept hierarchies

    Get PDF
    In this paper we discuss the use of concept hierarchies, an approach to automatically organize a set of documents based upon a set of concepts derived from the documents themselves for image retrieval. Co-occurrence between terms associated with image captions and a statistical relation called subsumption are used to generate term clusters which are organized hierarchically. Previously, the approach has been studied for document retrieval and results have shown that automatically generating hierarchies can help users with their search task. In this paper we present an implementation of concept hierarchies for image retrieval, together with preliminary ad-hoc evaluation. Although our approach requires more investigation, initial results from a prototype system are promising and would appear to provide a useful summary of the search results

    Geodata source retrieval by multilingual/semantic query expansion: the Case of Google Translate and WordNet version 3.1

    Get PDF
    In this article, we examined the potential of the current version of WordNet and Google Translate API to enhance the quality of geodata source retrieval in the Dutch geoinformation portal (PDOK) using semantic keywords for the geographic phenomena requested. Keywords gathered from real users’ questions in natural language extracted in an English corpus. Then, these keywords were expanded using WordNet and Google Translate API. Lastly, the results of query expansion were evaluated compared to a manual gold standard and based on information retrieval metrics. Our study shows that the results of query expansion help users by reformulating good alternative queries

    A Method for the Construction and Application of the Term Hierarchy Relationship Residing in Relevance Feedback

    Get PDF
    In the field of information retrieval, the information of term frequency contained in relevance feedback has been widely used. However, the analysis and application of term frequency does not cover the semantic meaning of the terms, which could make the retrieval results deviate from the user’s searching goal. Consider the semantic meaning of the terms, Wille (1992) had proposed a structured view in the dealing with the term relationships of the terms in the retrieval documents. To enhance the effectiveness of information retrieval by the dealing with the mentioned information of term hierarchy relationship, this study has developed a method of query expansion to extract and apply this information contained in relevance feedback first, and then conducted some formal tests to verify the efficiency of the method in the re-ranking of the retrieved documents. The results of the formal tests show that the proposed method of query expansion is more effective than the Rocchio’s query expansion algorithm. The contribution of this study is the disclosure of the applicability of the information of term hierarchy relationship contained in relevance feedback, and the demonstration of the application of this information

    Automatic tagging and geotagging in video collections and communities

    Get PDF
    Automatically generated tags and geotags hold great promise to improve access to video collections and online communi- ties. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features
    corecore