311 research outputs found

    Opportunistic linked data querying through approximate membership metadata

    Get PDF
    Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface

    What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way

    Full text link
    From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC-PM installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric.Comment: Accepted for SIGIR2020, 10 page

    An efficient query optimization strategy for spatio-temporal queries in video databases

    Get PDF
    Cataloged from PDF version of article.The interest for multimedia database management systems has grown rapidly due to the need for the storage of huge volumes of multimedia data in computer systems. An important building block of a multimedia database system is the query processor, and a query optimizer embedded to the query processor is needed to answer user queries efficiently. Query optimization problem has been widely studied for conventional database systems; however it is a new research area for multimedia database systems. Due to the differences in query processing strategies, query optimization techniques used in multimedia database systems are different from those used in traditional databases. In this paper, a query optimization strategy is proposed for processing spatio-temporal queries in video database systems. The proposed strategy includes reordering algorithms to be applied on query execution tree. The performance results obtained by testing the reordering algorithms on different query sets are also presented. (C) 2003 Elsevier Inc. All rights reserved

    Optimizing Analytical Queries over Semantic Web Sources

    Get PDF

    Novel techniques for location-cloaked applications

    Get PDF
    Location cloaking has been shown to be cost-effective in mitigating location privacy and safety risks. This strategy, however, has significant impact on the applications that rely on location information. They may suffer efficiency loss; some may not even work with reduced location resolution. This research investigates two problems. 1) How to process location-cloaked queries. Processing such queries incurs significant more workload for both server and client. While the server needs to retrieve more query results and transmit them to the client, the client downloading these results wastes its battery power because most of them are useless. To address these problems, we propose a suite of novel techniques including query decomposition, scheduling, and personalized air indexing. These techniques are integrated into a single unified platform that is capable of handling various types of queries. 2) How a node V can verify whether or not another node P indeed locates in a cloaking region it claims. This problem is challenging due to the fact that the process of location verification may allow V to refine P\u27s location within the region. We identify two types of attacks, transmission coverage attack and distance bounding attack. In the former, V refines a cloaking region by adjusting its transmission range to partially overlap with the region, whereas in the latter, by measuring the round trip time of its communication with P. We present two corresponding counter strategies, and built on top of them, propose a novel technique that allows P to participate in location verification while providing a certain level of guarantee that its cloaking region will not be refined during the process

    Efficient and effective retrieval using Higher-Order proximity models

    Get PDF
    Information Retrieval systems are widely used to retrieve documents that are relevant to a user's information need. Systems leveraging proximity heuristics to estimate the relevance of a document have shown to be effective. However, the computational cost of proximity-based models is rarely considered, which is an important concern over large-scale document collections. The large-scale collections also make collection-based evaluation challenging since only a small number of documents are judged given the limited budget. Effectiveness, efficiency and reliable evaluation are coherent components that should be considered when developing a good retrieval system.This thesis makes several contributions from the three aspects. Many proximity-based retrieval models are effective, but it is also important to find efficient solutions to extract proximity features, especially for models using higher-order proximity statistics. We therefore propose a one-pass algorithm based on the PlaneSweep approach. We demonstrate that the new one-pass algorithm reduces the cost of capturing a full dependency relation of a query, regardless of the input representations. Although our proposed methods can capture higher-ordered proximity features efficiently, the trade-offs between effectiveness and efficiency when using proximity-based models remains largely unexplored. We consider different variants of proximity statistics and demonstrate that using local proximity statistics can achieve an improved trade-off between effectiveness and efficiency. Another important aspect in IR is reliable system comparisons. We conduct a series of experiments that explore the interaction between pooling and evaluation depth, interactions between evaluation metrics and evaluation depth and also correlations between two different evaluation metrics. We show that different evaluation configurations on large test collections, where only a limited number of relevance labels are available, can lead to different system comparison conclusions. We also demonstrate the pitfalls of choosing an arbitrary evaluation depth regardless of the metrics employed and the pooling depth of the test collections. Lastly, we provide suggestions on the evaluation configurations for the reliable comparisons of retrieval systems on large test collections. On these large test collections, a shallow judgment pool may be employed as assumed budgets are often limited, which may lead to an imprecise evaluation of system performance, especially when a deep evaluation metric is used. We propose an estimation framework for estimating deep metric score on shallow judgment pools. With an initial shallow judgment pool, rank-level estimators are designed to estimate the effectiveness gain at each ranking. Based on the rank-level estimations, we propose an optimization framework to obtain a more precise score estimate

    Temporal multimodal video and lifelog retrieval

    Get PDF
    The past decades have seen exponential growth of both consumption and production of data, with multimedia such as images and videos contributing significantly to said growth. The widespread proliferation of smartphones has provided everyday users with the ability to consume and produce such content easily. As the complexity and diversity of multimedia data has grown, so has the need for more complex retrieval models which address the information needs of users. Finding relevant multimedia content is central in many scenarios, from internet search engines and medical retrieval to querying one's personal multimedia archive, also called lifelog. Traditional retrieval models have often focused on queries targeting small units of retrieval, yet users usually remember temporal context and expect results to include this. However, there is little research into enabling these information needs in interactive multimedia retrieval. In this thesis, we aim to close this research gap by making several contributions to multimedia retrieval with a focus on two scenarios, namely video and lifelog retrieval. We provide a retrieval model for complex information needs with temporal components, including a data model for multimedia retrieval, a query model for complex information needs, and a modular and adaptable query execution model which includes novel algorithms for result fusion. The concepts and models are implemented in vitrivr, an open-source multimodal multimedia retrieval system, which covers all aspects from extraction to query formulation and browsing. vitrivr has proven its usefulness in evaluation campaigns and is now used in two large-scale interdisciplinary research projects. We show the feasibility and effectiveness of our contributions in two ways: firstly, through results from user-centric evaluations which pit different user-system combinations against one another. Secondly, we perform a system-centric evaluation by creating a new dataset for temporal information needs in video and lifelog retrieval with which we quantitatively evaluate our models. The results show significant benefits for systems that enable users to specify more complex information needs with temporal components. Participation in interactive retrieval evaluation campaigns over multiple years provides insight into possible future developments and challenges of such campaigns
    corecore