24 research outputs found

    SeeDB: automatically generating query visualizations

    Get PDF
    Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We demonstrate SeeDB, a system that partially automates this task: given a query, SeeDB explores the space of all possible visualizations, and automatically identifies and recommends to the analyst those visualizations it finds to be most "interesting" or "useful". In our demonstration, conference attendees will see SeeDB in action for a variety of queries on multiple real-world datasets

    GenBase: A Complex Analytics Genomics Benchmark

    Get PDF
    This paper introduces a new benchmark, designed to test database management system (DBMS) performance on a mix of data management tasks (joins, filters, etc.) and complex analytics (regression, singular value decomposition, etc.) Such mixed workloads are prevalent in a number of application areas, including most science workloads and web analytics. As a specific use case, we have chosen genomics data for our benchmark, and have constructed a collection of typical tasks in this area. In addition to being representative of a mixed data management and analytics workload, this benchmark is also meant to scale to large dataset sizes and multiple nodes across a cluster. Besides presenting this benchmark, we have run it on a variety of storage systems including traditional row stores, newer column stores, Hadoop, and an array DBMS. We present performance numbers on all systems on single and multiple nodes, and show that performance differs by orders of magnitude between the various solutions. In addition, we demonstrate that most platforms have scalability issues. We also test offloading the analytics onto a coprocessor. The intent of this benchmark is to focus research interest in this area; to this end, all of our data, data generators, and scripts are available on our web site

    Seedb: Automatically generating query visualizations.

    Get PDF
    ABSTRACT Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We demonstrate SEEDB, a system that partially automates this task: given a query, SEEDB explores the space of all possible visualizations, and automatically identifies and recommends to the analyst those visualizations it finds to be most "interesting" or "useful". In our demonstration, conference attendees will see SEEDB in action for a variety of queries on multiple real-world datasets

    Neural Interactive Collaborative Filtering

    Full text link
    In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, i.e., recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we propose to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods

    Adaptive-Step Graph Meta-Learner for Few-Shot Graph Classification

    Full text link
    Graph classification aims to extract accurate information from graph-structured data for classification and is becoming more and more important in graph learning community. Although Graph Neural Networks (GNNs) have been successfully applied to graph classification tasks, most of them overlook the scarcity of labeled graph data in many applications. For example, in bioinformatics, obtaining protein graph labels usually needs laborious experiments. Recently, few-shot learning has been explored to alleviate this problem with only given a few labeled graph samples of test classes. The shared sub-structures between training classes and test classes are essential in few-shot graph classification. Exiting methods assume that the test classes belong to the same set of super-classes clustered from training classes. However, according to our observations, the label spaces of training classes and test classes usually do not overlap in real-world scenario. As a result, the existing methods don't well capture the local structures of unseen test classes. To overcome the limitation, in this paper, we propose a direct method to capture the sub-structures with well initialized meta-learner within a few adaptation steps. More specifically, (1) we propose a novel framework consisting of a graph meta-learner, which uses GNNs based modules for fast adaptation on graph data, and a step controller for the robustness and generalization of meta-learner; (2) we provide quantitative analysis for the framework and give a graph-dependent upper bound of the generalization error based on our framework; (3) the extensive experiments on real-world datasets demonstrate that our framework gets state-of-the-art results on several few-shot graph classification tasks compared to baselines

    Infrastructure for model management and model diagnosis

    No full text
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 147-159).Building ML-based workflows in the real world is a trial-and-error, iterative process where an ML developer builds tens to hundreds of workflows before arriving at one that meets some task-specific acceptance criteria. This iterative process of workflow building is laborious for several reasons including the large variety of available ML models, the time required to train the workflow, difficulty keeping track of workflows built during the modeling process, and the time required for debugging trained workflows. In this thesis, we are primarily interested in two problems with the repetitive modeling process: first, how to manage ML-based workflows generated over multiple iterations of the modeling process, and second, how to efficiently debug or diagnose trained ML-based workflows. In this work, we study these questions from a systems perspective and propose novel software systems and techniques to address them. Specifically, our contributions are: 1. We propose MODELDB, a system to track provenance and performance of ML-based workflows. 2. We propose MISTIQUE, a system to store ML-based workflow intermediates in order to speed up model debugging tasks, and 3. We provide examples of new diagnostic techniques that can be designed using the data in MISTIQUE.by Manasi Vartak.Ph. D

    Visualizing database queries

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.25Cataloged from PDF version of thesis.Includes bibliographical references (pages 50-52).Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We propose SEEDB, a system that partially automates this task: given a query, SEEDB explores the space of all possible visualizations, and automatically identifies and recommends to the analyst those visualizations it finds to be most "interesting" or "useful".by Manasi Vartak.S.M
    corecore