Search CORE

24 research outputs found

SeeDB: automatically generating query visualizations

Author: Aditya Parameswaran
Manasi Vartak
Mit
Neoklis Polyzotis
Samuel Madden
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2014
Field of study

Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We demonstrate SeeDB, a system that partially automates this task: given a query, SeeDB explores the space of all possible visualizations, and automatically identifies and recommends to the analyst those visualizations it finds to be most "interesting" or "useful". In our demonstration, conference attendees will see SeeDB in action for a variety of queries on multiple real-world datasets

CiteSeerX

DSpace@MIT

Crossref

GenBase: A Complex Analytics Genomics Benchmark

Author: Madden Samuel
Satish Nadathur Rajagopalan
Stonebraker Michael
Sundaram Narayanan
Taft Rebecca
Vartak Manasi
Publication venue
Publication date: 20/11/2013
Field of study

This paper introduces a new benchmark, designed to test database management system (DBMS) performance on a mix of data management tasks (joins, filters, etc.) and complex analytics (regression, singular value decomposition, etc.) Such mixed workloads are prevalent in a number of application areas, including most science workloads and web analytics. As a specific use case, we have chosen genomics data for our benchmark, and have constructed a collection of typical tasks in this area. In addition to being representative of a mixed data management and analytics workload, this benchmark is also meant to scale to large dataset sizes and multiple nodes across a cluster. Besides presenting this benchmark, we have run it on a variety of storage systems including traditional row stores, newer column stores, Hadoop, and an array DBMS. We present performance numbers on all systems on single and multiple nodes, and show that performance differs by orders of magnitude between the various solutions. In addition, we demonstrate that most platforms have scalability issues. We also test offloading the analytics onto a coprocessor. The intent of this benchmark is to focus research interest in this area; to this end, all of our data, data generators, and scripts are available on our web site

DSpace@MIT

Seedb: Automatically generating query visualizations.

Author: Aditya Parameswaran
Manasi Vartak
Mit
Neoklis Polyzotis
Samuel Madden
Publication venue
Publication date: 01/01/2014
Field of study

ABSTRACT Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We demonstrate SEEDB, a system that partially automates this task: given a query, SEEDB explores the space of all possible visualizations, and automatically identifies and recommends to the analyst those visualizations it finds to be most "interesting" or "useful". In our demonstration, conference attendees will see SEEDB in action for a variety of queries on multiple real-world datasets

CiteSeerX

Neural Interactive Collaborative Filtering

Author: Chapelle Olivier
Finn Chelsea
Gu Yulong
Hoyer Patrik O
Kingma Diederik P
Koch Gregory
Mnih Andriy
Qin Lijing
Rendle Steffen
Santoro Adam
Vartak Manasi
Vaswani Ashish
Zou Lixin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/07/2020
Field of study

In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, i.e., recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we propose to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Adaptive-Step Graph Meta-Learner for Few-Shot Graph Classification

Author: Bartlett Peter L.
Borgwardt K. M.
Dahl G. E.
Finn Chelsea
Gao Hongyang
Lee Hoyeop
Liu Lu
Pan Feiyang
Santoro Adam
Shervashidze Nino
Shervashidze Nino
Snell Jake
Tianyi Zhou LU LIU
Tong Che Ruixiang ZHANG
Vartak Manasi
Yang Ling
Zhang Chao
Zhou Fan
Publication venue
Publication date: 23/06/2020
Field of study

Graph classification aims to extract accurate information from graph-structured data for classification and is becoming more and more important in graph learning community. Although Graph Neural Networks (GNNs) have been successfully applied to graph classification tasks, most of them overlook the scarcity of labeled graph data in many applications. For example, in bioinformatics, obtaining protein graph labels usually needs laborious experiments. Recently, few-shot learning has been explored to alleviate this problem with only given a few labeled graph samples of test classes. The shared sub-structures between training classes and test classes are essential in few-shot graph classification. Exiting methods assume that the test classes belong to the same set of super-classes clustered from training classes. However, according to our observations, the label spaces of training classes and test classes usually do not overlap in real-world scenario. As a result, the existing methods don't well capture the local structures of unseen test classes. To overcome the limitation, in this paper, we propose a direct method to capture the sub-structures with well initialized meta-learner within a few adaptation steps. More specifically, (1) we propose a novel framework consisting of a graph meta-learner, which uses GNNs based modules for fast adaptation on graph data, and a step controller for the robustness and generalization of meta-learner; (2) we provide quantitative analysis for the framework and give a graph-dependent upper bound of the generalization error based on our framework; (3) the extensive experiments on real-world datasets demonstrate that our framework gets state-of-the-art results on several few-shot graph classification tasks compared to baselines

arXiv.org e-Print Archive

Crossref

Infrastructure for model management and model diagnosis

Author: Vartak Manasi
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2018
Field of study

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 147-159).Building ML-based workflows in the real world is a trial-and-error, iterative process where an ML developer builds tens to hundreds of workflows before arriving at one that meets some task-specific acceptance criteria. This iterative process of workflow building is laborious for several reasons including the large variety of available ML models, the time required to train the workflow, difficulty keeping track of workflows built during the modeling process, and the time required for debugging trained workflows. In this thesis, we are primarily interested in two problems with the repetitive modeling process: first, how to manage ML-based workflows generated over multiple iterations of the modeling process, and second, how to efficiently debug or diagnose trained ML-based workflows. In this work, we study these questions from a systems perspective and propose novel software systems and techniques to address them. Specifically, our contributions are: 1. We propose MODELDB, a system to track provenance and performance of ML-based workflows. 2. We propose MISTIQUE, a system to store ML-based workflow intermediates in order to speed up model debugging tasks, and 3. We provide examples of new diagnostic techniques that can be designed using the data in MISTIQUE.by Manasi Vartak.Ph. D

DSpace@MIT

Visualizing database queries

Author: Vartak Manasi
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2014
Field of study

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.25Cataloged from PDF version of thesis.Includes bibliographical references (pages 50-52).Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We propose SEEDB, a system that partially automates this task: given a query, SEEDB explores the space of all possible visualizations, and automatically identifies and recommends to the analyst those visualizations it finds to be most "interesting" or "useful".by Manasi Vartak.S.M

DSpace@MIT