38 research outputs found

    Analogy Queries in Information Systems -A New Challenge -Pre-Publication Copy -to appear in JIKM volume 12 issue 3

    Get PDF
    Abstract Besides the tremendous progress in Web-related technologies, interfaces to access the Web or large information systems have largely stayed at the level of keyword searches and categorical browsing. This paper introduces analogy queries as one of the essential techniques required to bridge the gap between today's interfaces and future interaction paradigms. The intuitive concept of analogies is directly derived from human cognition and communication practices, and is in fact often considered to be the core concept of human cognition. In brief, analogies form abstract relationships between concepts, which can be used to efficiently exchange information and knowledge needs or transmit even complex concepts including important connotations in a strictly humancentered and natural fashion. Building analogy-enabled information systems opens up a number of interesting scientific challenges, e.g., how does communication using analogies work? How can this process be represented? How can information systems understand what a user provided analogy actually means? How can analogies be discovered? This paper aims at discussing some of these questions and is intended as a corner stone of future research efforts

    Crowdsourcing for Query Processing on Web Data: A Case Study on the Skyline Operator

    Get PDF
    In recent years, crowdsourcing has become a powerful tool to bring human intelligence into information processing. This is especially important forWeb data which in contrast to well-maintained databases is almost always incomplete and may be distributed over a variety of sources. Crowdsourcing allows to tackle many problems which are not yet attainable using machine-based algorithms alone: in particular, it allows to perform database operators on incomplete data as human workers can be used to provide values during runtime. As this can become costly quickly, elaborate optimization is required. In this paper, we showcase how such optimizations can be performed for the popular skyline operator for preference queries. We present some heuristics-based approaches and compare them to crowdsourcing-based approaches using sophisticated optimization techniques while especially focusing on result correctness

    Valentine: Evaluating Matching Techniques for Dataset Discovery

    Full text link
    Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods

    Just ask a human? –Controlling Quality in Relational Similarity and Analogy Processing using the Crowd

    No full text
    Abstract: Advancingsemantically meaningful andhuman-centeredinteractionparadigms for large information systems is one of the central challenges of current information system research. Here, systems which can capture different notions of ‘similarity ’ betweenentities promise to be particularly interesting. Whilesimple entity similarityhas been addresses numerous times, relational similaritybetween entities and especially the closely related challenge of processing analogies remain hard to approach algorithmically due to the semantic ambiguity often involved in thesetasks. In this paper, we will thereforeemployhuman workers via crowd-sourcing toestablish aperformance baseline. Then, wefurther improve onthis baseline by combiningthe feedback of multiple workers in ameaningful fashion. Duetothe ambiguous nature of analogies and relational similarity, traditional crowd-sourcing quality control techniques are less effective and therefore wedevelop novel techniques paying respect to the intrinsic consensual nature of the task at hand. These works will further pave theway for buildingtruehybrid systems with human workers and heuristic algorithms combiningtheir individual strength.

    Efficient Computation of Trade-Off Skylines

    No full text
    When selecting alternatives from large amounts of data, trade-offs play a vital role in everyday decision making. In databases this is primarily reflected by the top-k retrieval paradigm. But recently it has been convincingly argued that it is almost impossible for users to provide meaningful scoring functions for top-k retrieval, subsequently leading to the adoption of the skyline paradigm. Here users just specify the relevant attributes in a query and all suboptimal alternatives are filtered following the Pareto semantics. Up to now the intuitive concept of compensation, however, cannot be used in skyline queries, which also contributes to the often unmanageably large result set sizes. In this paper we discuss an innovative and efficient method for computing skylines allowing the use of qualitative trade-offs. Such trade-offs compare examples from the database on a focused subset of attributes. Thus, users can provide information on how much they are willing to sacrifice to gain an improvement in some other attribute(s). Our contribution is the design of the first skyline algorithm allowing for qualitative compensation across attributes. Moreover, we also provide an novel trade-off representation structure to speed up retrieval. Indeed our experiments show efficient performance allowing for focused skyline sets in practical applications. Moreover, we show that the necessary amount of object comparisons can be sped up by an order of magnitude using our indexing techniques

    Efficient Skyline Refinement using Trade-Offs

    No full text
    Skyline Queries have received a lot of attention due to their intuitive query formulation. Following the concept of Pareto optimality all ‘best’ database items satisfying different aspects of the query are returned to the user. However, this often results in huge result set sizes. In everyday’s life users face the same problem. But here, when confronted with a too large variety of choices users tend to focus only on some aspects of the attribute space at a time and try to figure out acceptable compromises between these attributes. Such trade-offs are not reflected by the Pareto paradigm. Incorporating them into user preferences and adjusting skyline results accordingly thus needs special algorithms beyond traditional skylining. In this paper we propose a novel algorithm for efficiently incorporating such typical trade-off information into preference orders. Our experiments on both real world and synthetic data sets show the impact of our techniques: impractical skyline sizes efficiently become manageable with a minimum amount of user interaction. Additionally, we also design a method to elicit especially interesting trade-offs promising a high reduction of skyline sizes. At any point, the user can choose whether to provide individual trade-offs, or accept those suggested by the system. The benefit of incorporating trade-offs into the strict Pareto semantics is clear: result sets become manageable, while additionally getting more focused on the users’ information needs
    corecore