85,329 research outputs found

    DataHub: Collaborative Data Science & Dataset Version Management at Scale

    Get PDF
    Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

    Measuring and Managing Answer Quality for Online Data-Intensive Services

    Full text link
    Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers; the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the EasyRec Recommendation Engine, and the OpenEphyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor

    Abmash: Mashing Up Legacy Web Applications by Automated Imitation of Human Actions

    Get PDF
    Many business web-based applications do not offer applications programming interfaces (APIs) to enable other applications to access their data and functions in a programmatic manner. This makes their composition difficult (for instance to synchronize data between two applications). To address this challenge, this paper presents Abmash, an approach to facilitate the integration of such legacy web applications by automatically imitating human interactions with them. By automatically interacting with the graphical user interface (GUI) of web applications, the system supports all forms of integrations including bi-directional interactions and is able to interact with AJAX-based applications. Furthermore, the integration programs are easy to write since they deal with end-user, visual user-interface elements. The integration code is simple enough to be called a "mashup".Comment: Software: Practice and Experience (2013)

    Escrow: A large-scale web vulnerability assessment tool

    Get PDF
    The reliance on Web applications has increased rapidly over the years. At the same time, the quantity and impact of application security vulnerabilities have grown as well. Amongst these vulnerabilities, SQL Injection has been classified as the most common, dangerous and prevalent web application flaw. In this paper, we propose Escrow, a large-scale SQL Injection detection tool with an exploitation module that is light-weight, fast and platform-independent. Escrow uses a custom search implementation together with a static code analysis module to find potential target web applications. Additionally, it provides a simple to use graphical user interface (GUI) to navigate through a vulnerable remote database. Escrow is implementation-agnostic, i.e. It can perform analysis on any web application regardless of the server-side implementation (PHP, ASP, etc.). Using our tool, we discovered that it is indeed possible to identify and exploit at least 100 databases per 100 minutes, without prior knowledge of their underlying implementation. We observed that for each query sent, we can scan and detect dozens of vulnerable web applications in a short space of time, while providing a means for exploitation. Finally, we provide recommendations for developers to defend against SQL injection and emphasise the need for proactive assessment and defensive coding practices
    corecore