1,144 research outputs found

    A Wikipedia powered state-based approach to automatic search query enhancement

    Get PDF
    This paper describes the development and testing of a novel Automatic Search Query Enhancement (ASQE) algorithm, the Wikipedia N Sub-state Algorithm (WNSSA), which utilises Wikipedia as the sole data source for prior knowledge. This algorithm is built upon the concept of iterative states and sub-states, harnessing the power of Wikipedia\u27s data set and link information to identify and utilise reoccurring terms to aid term selection and weighting during enhancement. This algorithm is designed to prevent query drift by making callbacks to the user\u27s original search intent by persisting the original query between internal states with additional selected enhancement terms. The developed algorithm has shown to improve both short and long queries by providing a better understanding of the query and available data. The proposed algorithm was compared against five existing ASQE algorithms that utilise Wikipedia as the sole data source, showing an average Mean Average Precision (MAP) improvement of 0.273 over the tested existing ASQE algorithms

    A qualitative analysis of the Wikipedia N-Substate Algorithm's Enhancement Terms

    Full text link
    [EN] Automatic Search Query Enhancement (ASQE) is the process of modifying a user submitted search query and identifying terms that can be added or removed to enhance the relevance of documents retrieved from a search engine. ASQE differs from other enhancement approaches as no human interaction is required. ASQE algorithms typically rely on a source of a priori knowledge to aid the process of identifying relevant enhancement terms. This paper describes the results of a qualitative analysis of the enhancement terms generated by the Wikipedia NSubstate Algorithm (WNSSA) for ASQE. The WNSSA utilises Wikipedia as the sole source of a priori knowledge during the query enhancement process. As each Wikipedia article typically represents a single topic, during the enhancement process of the WNSSA, a mapping is performed between the user’s original search query and Wikipedia articles relevant to the query. If this mapping is performed correctly, a collection of potentially relevant terms and acronyms are accessible for ASQE. This paper reviews the results of a qualitative analysis process performed for the individual enhancement term generated for each of the 50 test topics from the TREC-9 Web Topic collection. The contributions of this paper include: (a) a qualitative analysis of generated WNSSA search query enhancement terms and (b) an analysis of the concepts represented in the TREC-9 Web Topics, detailing interpretation issues during query-to-Wikipedia article mapping performed by the WNSSA.Goslin, K.; Hofmann, M. (2019). A qualitative analysis of the Wikipedia N-Substate Algorithm's Enhancement Terms. Journal of Computer-Assisted Linguistic Research. 3(3):67-77. https://doi.org/10.4995/jclr.2019.11159SWORD677733Asfari, Ounas, Doan, Bich-liên, Bourda, Yolaine and Sansonnet, Jean-Paul. 2009. "Personalized Access to Information by Query Reformulation Based on the State of the Current Task and User Profile." Paper presented at Third International Conference on Advances in Semantic Processing, 113-116. IEEE. https://doi.org/10.1109/SEMAPRO.2009.17Bazzanella, Barbara, Stoermer, Heiko, and Bouquet, Paolo. 2010. "Searching for individual entities: A query analysis.", Paper presented at International Conference on Information Reuse & Integration, 115-120. IEEE. https://doi.org/10.1109/IRI.2010.5558955Gao, Jianfeng, Xu , Gu and Xu, Jinxi. 2013. Query expansion using path-constrained random walks. Paper presented at 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13), 563-572. ACM. https://doi.org/10.1145/2484028.2484058Goslin, Kyle, Hofmann, Markus. 2017. "A Comparison of Automatic Search Query Enhancement Algorithms That Utilise Wikipedia as a Source of A Priori Knowledge." Paper presented at 9th Annual Meeting of the Forum for Information Retrieval Evaluation (FIRE'17), 6-13. ACM. https://doi.org/10.1145/3158354.3158356Goslin, Kyle, Hofmann, Markus. 2018. "A Wikipedia powered state-based approach to automatic search query enhancement." Journal of Information Processing & Management 54(4), 726-739. Elsevier. https://doi.org/10.1016/j.ipm.2017.10.001Jansen, Bernard, Spink, Amanda, Bateman, Judy and Saracevic, Tefko. 1998. "Real life information retrieval: a study of user queries on the Web." Paper presented at ACM SIGIR Forum 32, 5-17. ACM. https://doi.org/10.1145/281250.281253Mastora, Anna, Monopoli, Maria and Kapidakis, Sarantos. 2008. "Term selection patterns for formulating queries: a User study focused on term semantics." Paper presented at Third International Conference on Digital Information Management, 125-130. IEEE. https://doi.org/10.1109/ICDIM.2008.4746747Ogilvie, Paul, Voorhees, Ellen and Callan, Jamie. 2009. "On the number of terms used in automatic query expansion." Journal of Information Retrieval 12(6): 666. Springer. https://doi.org/10.1007/s10791-009-9104-1Voorhees, Ellen M. 1994. "Query expansion using lexical-semantic relations." Paper presented at the 17th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '94), 61-69. Springer-Verlag. https://doi.org/10.1007/978-1-4471-2099-5_

    LODE: Linking Digital Humanities Content to the Web of Data

    Full text link
    Numerous digital humanities projects maintain their data collections in the form of text, images, and metadata. While data may be stored in many formats, from plain text to XML to relational databases, the use of the resource description framework (RDF) as a standardized representation has gained considerable traction during the last five years. Almost every digital humanities meeting has at least one session concerned with the topic of digital humanities, RDF, and linked data. While most existing work in linked data has focused on improving algorithms for entity matching, the aim of the LinkedHumanities project is to build digital humanities tools that work "out of the box," enabling their use by humanities scholars, computer scientists, librarians, and information scientists alike. With this paper, we report on the Linked Open Data Enhancer (LODE) framework developed as part of the LinkedHumanities project. With LODE we support non-technical users to enrich a local RDF repository with high-quality data from the Linked Open Data cloud. LODE links and enhances the local RDF repository without compromising the quality of the data. In particular, LODE supports the user in the enhancement and linking process by providing intuitive user-interfaces and by suggesting high-quality linking candidates using tailored matching algorithms. We hope that the LODE framework will be useful to digital humanities scholars complementing other digital humanities tools

    A comparison of automatic search query enhancement algorithms that utilise Wikipedia as a source of a priori knowledge

    Get PDF
    This paper describes the benchmarking and analysis of five Automatic Search Query Enhancement (ASQE) algorithms that utilise Wikipedia as the sole source for a priori knowledge. The contributions of this paper include: 1) A comprehensive review into current ASQE algorithms that utilise Wikipedia as the sole source for a priori knowledge; 2) benchmarking of five existing ASQE algorithms using the TREC-9 Web Topics on the ClueWeb12 data set and 3) analysis of the results from the benchmarking process to identify the strengths and weaknesses each algorithm. During the benchmarking process, 2,500 relevance assessments were performed. Results of these tests are analysed using the Average Precision @10 per query and Mean Average Precision @10 per algorithm. From this analysis we show that the scope of a priori knowledge utilised during enhancement and the available term weighting methods available from Wikipedia can further aid the ASQE process. Although approaches taken by the algorithms are still relevant, an over dependence on weighting schemes and data sources used can easily impact results of an ASQE algorithm

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    Semantic enrichment for enhancing LAM data and supporting digital humanities. Review article

    Get PDF
    With the rapid development of the digital humanities (DH) field, demands for historical and cultural heritage data have generated deep interest in the data provided by libraries, archives, and museums (LAMs). In order to enhance LAM data’s quality and discoverability while enabling a self-sustaining ecosystem, “semantic enrichment” becomes a strategy increasingly used by LAMs during recent years. This article introduces a number of semantic enrichment methods and efforts that can be applied to LAM data at various levels, aiming to support deeper and wider exploration and use of LAM data in DH research. The real cases, research projects, experiments, and pilot studies shared in this article demonstrate endless potential for LAM data, whether they are structured, semi-structured, or unstructured, regardless of what types of original artifacts carry the data. Following their roadmaps would encourage more effective initiatives and strengthen this effort to maximize LAM data’s discoverability, use- and reuse-ability, and their value in the mainstream of DH and Semantic Web

    Conference Programme v1 & Book of Abstracts

    Get PDF
    7th INTERNATIONAL CONFERENCE ON MEANING AND KNOWLEDGE REPRESENTATION MKR2018 @ ITB Dublin 4, 5 and 6 July, 201

    Semantic enrichment for enhancing LAM data and supporting digital humanities. Review article

    Get PDF
    With the rapid development of the digital humanities (DH) field, demands for historical and cultural heritage data have generated deep interest the data provided by libraries, archives, and museums (LAMs). In order to enhance LAM data’s quality and discoverability while enabling a self-sustaining ecosystem, “semantic enrichment” becomes a strategy increasingly used by LAMs during recent years. This article introduces a number of semantic enrichment methods and efforts that can be applied to LAM data at various levels, aiming to support deeper and wider exploration and use of LAM data in DH research. The real cases, research projects, experiments, and pilot studies shared in this article demonstrate endless potential for LAM data, whether they are structured, semi-structured, or unstructured, regardless of what types of original artifacts carry the data. Following their roadmaps would encourage more effective initiatives and strengthen this effort to maximize LAM data’s discoverability, use- and reuse-ability, and their value in the mainstream of DH and Semantic Web

    Developing Knowledge Models of Social Media: A Case Study on LinkedIn

    Get PDF
    User Generated Content (UGC) exchanged via large Social Network is considered a very important knowledge source about all aspects of the social engagements (e.g. interests, events, personal information, personal preferences, social experience, skills etc.). However this data is inherently unstructured or semi-structured. In this paper, we describe the results of a case study on LinkedIn Ireland public profiles. The study investigated how the available knowledge could be harvested from LinkedIn in a novel way by developing and applying a reusable knowledge model using linked open data vocabularies and semantic web. In addition, the paper discusses the crawling and data normalisation strategies that we developed, so that high quality metadata could be extracted from the LinkedIn public profiles. Apart from the search engine in LinkedIn.com itself, there are no well known publicly available endpoints that allow users to query knowledge concerning the interests of individuals on LinkedIn. In particular, we present a system that extracts and converts information from raw web pages of LinkedIn public profiles into a machine-readable, interoperable format using data mining and Semantic Web technologies. The outcomes of our research can be summarized as follows: (1) A reusable knowledge model which can represent LinkedIn public users and company profiles using linked data vocabularies and structured data, (2) a public SPARQL endpoint to access structured data about Irish industry and public profiles, (3) a scalable data crawling strategy and mashup based data normalisation approach. The proposed data mining and knowledge representation proposed in this paper are evaluated in four ways: (1) We evaluate metadata quality using automated techniques, such as data completeness and data linkage. (2) Data accuracy is evaluated via user studies. In particular, accuracy is evaluated by comparison of manually entered metadata fields and the metadata which was automatically extracted. (3) User perceived metadata quality is measured by asking users to rate the automatically extracted metadata in user studies. (4) Finally, the paper discusses how the extracted metadata suits for a user interface design. Overall, the evaluations show that the extracted metadata is of high quality and meets the requirements of a data visualisation user interface
    corecore