400 research outputs found

    Improving package recommendations through query relaxation

    Full text link
    Recommendation systems aim to identify items that are likely to be of interest to users. In many cases, users are interested in package recommendations as collections of items. For example, a dietitian may wish to derive a dietary plan as a collection of recipes that is nutritionally balanced, and a travel agent may want to produce a vacation package as a coordinated collection of travel and hotel reservations. Recent work has explored extending recommendation systems to support packages of items. These systems need to solve complex combinatorial problems, enforcing various properties and constraints defined on sets of items. Introducing constraints on packages makes recommendation queries harder to evaluate, but also harder to express: Queries that are under-specified produce too many answers, whereas queries that are over-specified frequently miss interesting solutions. In this paper, we study query relaxation techniques that target package recommendation systems. Our work offers three key insights: First, even when the original query result is not empty, relaxing constraints can produce preferable solutions. Second, a solution due to relaxation can only be preferred if it improves some property specified by the query. Third, relaxation should not treat all constraints as equals: some constraints are more important to the users than others. Our contributions are threefold: (a) we define the problem of deriving package recommendations through query relaxation, (b) we design and experimentally evaluate heuristics that relax query constraints to derive interesting packages, and (c) we present a crowd study that evaluates the sensitivity of real users to different kinds of constraints and demonstrates that query relaxation is a powerful tool in diversifying package recommendations

    A Model-Driven Approach to Automate Data Visualization in Big Data Analytics

    Get PDF
    In big data analytics, advanced analytic techniques operate on big data sets aimed at complementing the role of traditional OLAP for decision making. To enable companies to take benefit of these techniques despite the lack of in-house technical skills, the H2020 TOREADOR Project adopts a model-driven architecture for streamlining analysis processes, from data preparation to their visualization. In this paper we propose a new approach named SkyViz focused on the visualization area, in particular on (i) how to specify the user's objectives and describe the dataset to be visualized, (ii) how to translate this specification into a platform-independent visualization type, and (iii) how to concretely implement this visualization type on the target execution platform. To support step (i) we define a visualization context based on seven prioritizable coordinates for assessing the user's objectives and conceptually describing the data to be visualized. To automate step (ii) we propose a skyline-based technique that translates a visualization context into a set of most-suitable visualization types. Finally, to automate step (iii) we propose a skyline-based technique that, with reference to a specific platform, finds the best bindings between the columns of the dataset and the graphical coordinates used by the visualization type chosen by the user. SkyViz can be transparently extended to include more visualization types on the one hand, more visualization coordinates on the other. The paper is completed by an evaluation of SkyViz based on a case study excerpted from the pilot applications of the TOREADOR Project

    Sistem Rekomendasi Dua Arah untuk Pemilihan Dosen Pembimbing Menggunakan Data Histori dan Skyline View Queries

    Get PDF
    Pemilihan dosen pembimbing merupakan salah satu faktor yang mempengaruhi proses penyelesaian tugas akhir. Pada mekanisme pemilihan dosen pembimbing, sering kali mahasiswa sendiri belum memahami dengan jelas kemampuan dirinya serta topik apa yang akan dipilihnya, sehingga nama calon dosen pembimbing yang diusulkan mahasiswa umumnya belum mempertimbangkan hal tersebut. Mekanisme seperti ini juga menyebabkan terjadinya penumpukan calon bimbingan pada dosen tertentu dan kekurangan bimbingan pada dosen yang lain, meskipun keduanya memiliki latar belakang keilmuan yang mirip.  Pada saat yang sama, umumnya dosen pembimbing tidak pernah ditanya preferensinya terhadap mahasiswa seperti apa yang sesuai untuk topik penelitian yang akan ditawarkan. Sistem rekomendasi yang ada biasanya hanya mempertimbangkan preferensi salah satu pihak saja, dari sisi dosen saja ataupun sisi mahasiswa saja. Penelitian ini membangun sistem rekomendasi dua arah baik dari sisi dosen maupun dari sisi mahasiswa menggunakan skyline view queries. Skyline view queries merekomendasikan dosen yang dominan kepada mahasiswa sesuai dengan preferensi mahasiswa, dan merekomendasikan mahasiswa yang dominan kepada dosen sesuai dengan preferensi dosen. Untuk mendapatkan preferensi dari kedua sisi, digunakan teknik text mining dan clustering pada data histori nilai akademik dan topik penelitian dari mahasiswa yang sudah lulus sebagai acuan untuk mahasiswa yang akan memilih dosen pembimbing. Hasil percobaan menunjukkan bahwa  penggabungan metode skyline view queries dengan profil akademik dan data histori dapat mengatasi permasalahan penumpukan calon bimbingan pada dosen tertentu serta dapat memberikan rekomendasi yang sesuai dengan kemampuan akademik dan preferensi mahasiswa dan dosen. AbstractSelection of thesis supervisor is a factor that have an effect on the final thesis process. In the process of choosing thesis supervisor, student often has not clearly recognize his/her capability and topic that will be researched. Therefore, this issue is likely not considered when the student propose his/her thesis supervisor. This selection process typically also makes one supervisor is proposed by many student while other supervisor is proposed by less student, even though both supervisor has similar scientific background. At the same time, generally the thesis supervisor has never been asked his/her student preferences related to the supervisor’s research topics. Existing recommendation systems usually consider preferences from one party, either supervisor’s or student’s preferences. This research develop a two-way recommendation system, considering both supervisor’s and student’s preferences using skyline view queries. Skyline view queries recommend dominant supervisor to student based on student’s preferences, and recommend dominant student to supervisor based on supervisor’s preferences. To acquire preferences from both party, text mining techniques and clustering is used on student’s historical academic scores data and data of research topics from graduated student as reference for student in choosing thesis supervisor. Experiment results show that using skyline view queries method on student’s academic profile and historical data can overcome the issue of one supervisor is proposed by too many students. In addition, the results shows that the method can also give appropriate recommendation based on student’s academic portfolio and student’s and supervisor’s preferences

    VegaProf: Profiling Vega Visualizations

    Full text link
    Vega is a popular domain-specific language (DSL) for visualization specification. At runtime, Vega's DSL is first transformed into a dataflow graph and then functions to render visualization primitives. While the Vega abstraction of implementation details simplifies visualization creation, it also makes Vega visualizations challenging to debug and profile without adequate tools. Our formative interviews with three practitioners at Sigma Computing showed that existing developer tools are not suited for visualization profiling as they are disconnected from the semantics of the Vega DSL specification and its resulting dataflow graph. We introduce VegaProf, the first performance profiler for Vega visualizations. VegaProf effectively instruments the Vega library by associating the declarative specification with its compilation and execution. Using interactive visualizations, VegaProf enables visualization engineers to interactively profile visualization performance at three abstraction levels: function, dataflow graph, and visualization specification. Our evaluation through two use cases and feedback from five visualization engineers at Sigma Computing shows that VegaProf makes visualization profiling tractable and actionable.Comment: Submitted to EuroVis'2

    Discrimination-aware data transformations

    Get PDF
    A deep use of people-related data in automated decision processes might lead to an amplification of inequities already implicit in real world data. Nowadays, the development of technological solutions satisfying nondiscriminatory requirements is therefore one of the main challenges for the data management and data analytics communities. Nondiscrimination can be characterized in terms of different properties, like fairness, diversity, and coverage. Such properties should be achieved through a holistic approach, incrementally enforcing nondiscrimination constraints along all the stages of the data processing life-cycle, through individually independent choices rather than as a constraint on the final result. In this respect, the design of discrimination-aware solutions for the initial phases of the data processing pipeline (like data preparation), is extremely relevant: the sooner you spot the problem fewer problems you will get in the last analytical steps of the chain. In this PhD thesis, we are interested in nondiscrimination constraints defined in terms of coverage. Coverage aims at guaranteeing that the input dataset includes enough examples for each (protected) category of interest, thus increasing diversity to limit the introduction of bias during the next analytical steps. While coverage constraints have been mainly used for repairing raw datasets, we investigate their effects on data transformations, during data preparation, through query execution. To this aim, we propose coverage-based queries, as a means to achieve coverage constraint satisfaction on the result of data transformations defined in terms of selection-based queries, and specific algorithms for their processing. The proposed solutions rely on query rewriting, a key approach for enforcing specific constraints while guaranteeing transparency and avoiding disparate treatment discrimination. As far as we know and according to recent surveys in this domain, no other solutions addressing coverage-based rewriting during data transformations have been proposed so far. To guarantee a good compromise between efficiency and accuracy, both precise and approximate algorithms for coverage-based query processing are proposed. The results of an extensive experimental evaluation, carried out on both synthetic and real datasets, shows the effectiveness and the efficiency of the proposed approaches. Coverage-based queries can be easily integrated in relational machine learning data processing environments; to show their applicability, we integrate some of the designed algorithms in a machine learning data processing Python toolkit

    A Review and Characterization of Progressive Visual Analytics

    Get PDF
    Progressive Visual Analytics (PVA) has gained increasing attention over the past years. It brings the user into the loop during otherwise long-running and non-transparent computations by producing intermediate partial results. These partial results can be shown to the user for early and continuous interaction with the emerging end result even while it is still being computed. Yet as clear-cut as this fundamental idea seems, the existing body of literature puts forth various interpretations and instantiations that have created a research domain of competing terms, various definitions, as well as long lists of practical requirements and design guidelines spread across different scientific communities. This makes it more and more difficult to get a succinct understanding of PVA’s principal concepts, let alone an overview of this increasingly diverging field. The review and discussion of PVA presented in this paper address these issues and provide (1) a literature collection on this topic, (2) a conceptual characterization of PVA, as well as (3) a consolidated set of practical recommendations for implementing and using PVA-based visual analytics solutions

    A systematic literature review of skyline query processing over data stream

    Get PDF
    Recently, skyline query processing over data stream has gained a lot of attention especially from the database community owing to its own unique challenges. Skyline queries aims at pruning a search space of a potential large multi-dimensional set of objects by keeping only those objects that are not worse than any other. Although an abundance of skyline query processing techniques have been proposed, there is a lack of a Systematic Literature Review (SLR) on current research works pertinent to skyline query processing over data stream. In regard to this, this paper provides a comparative study on the state-of-the-art approaches over the period between 2000 and 2022 with the main aim to help readers understand the key issues which are essential to consider in relation to processing skyline queries over streaming data. Seven digital databases were reviewed in accordance with the Preferred Reporting Items for Systematic Reviews (PRISMA) procedures. After applying both the inclusion and exclusion criteria, 23 primary papers were further examined. The results show that the identified skyline approaches are driven by the need to expedite the skyline query processing mainly due to the fact that data streams are time varying (time sensitive), continuous, real time, volatile, and unrepeatable. Although, these skyline approaches are tailored made for data stream with a common aim, their solutions vary to suit with the various aspects being considered, which include the type of skyline query, type of streaming data, type of sliding window, query processing technique, indexing technique as well as the data stream environment employed. In this paper, a comprehensive taxonomy is developed along with the key aspects of each reported approach, while several open issues and challenges related to the topic being reviewed are highlighted as recommendation for future research direction
    • …
    corecore