85,329 research outputs found
DataHub: Collaborative Data Science & Dataset Version Management at Scale
Relational databases have limited support for data collaboration, where teams
collaboratively curate and analyze large datasets. Inspired by software version
control systems like git, we propose (a) a dataset version control system,
giving users the ability to create, branch, merge, difference and search large,
divergent collections of datasets, and (b) a platform, DataHub, that gives
users the ability to perform collaborative data analysis building on this
version control system. We outline the challenges in providing dataset version
control at scale.Comment: 7 page
Measuring and Managing Answer Quality for Online Data-Intensive Services
Online data-intensive services parallelize query execution across distributed
software components. Interactive response time is a priority, so online query
executions return answers without waiting for slow running components to
finish. However, data from these slow components could lead to better answers.
We propose Ubora, an approach to measure the effect of slow running components
on the quality of answers. Ubora randomly samples online queries and executes
them twice. The first execution elides data from slow components and provides
fast online answers; the second execution waits for all components to complete.
Ubora uses memoization to speed up mature executions by replaying network
messages exchanged between components. Our systems-level implementation works
for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the
EasyRec Recommendation Engine, and the OpenEphyra question answering system.
Ubora computes answer quality much faster than competing approaches that do not
use memoization. With Ubora, we show that answer quality can and should be used
to guide online admission control. Our adaptive controller processed 37% more
queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor
Abmash: Mashing Up Legacy Web Applications by Automated Imitation of Human Actions
Many business web-based applications do not offer applications programming
interfaces (APIs) to enable other applications to access their data and
functions in a programmatic manner. This makes their composition difficult (for
instance to synchronize data between two applications). To address this
challenge, this paper presents Abmash, an approach to facilitate the
integration of such legacy web applications by automatically imitating human
interactions with them. By automatically interacting with the graphical user
interface (GUI) of web applications, the system supports all forms of
integrations including bi-directional interactions and is able to interact with
AJAX-based applications. Furthermore, the integration programs are easy to
write since they deal with end-user, visual user-interface elements. The
integration code is simple enough to be called a "mashup".Comment: Software: Practice and Experience (2013)
Escrow: A large-scale web vulnerability assessment tool
The reliance on Web applications has increased rapidly over the years. At the same time, the quantity and impact of application security vulnerabilities have grown as well. Amongst these vulnerabilities, SQL Injection has been classified as the most common, dangerous and prevalent web application flaw. In this paper, we propose Escrow, a large-scale SQL Injection detection tool with an exploitation module that is light-weight, fast and platform-independent. Escrow uses a custom search implementation together with a static code analysis module to find potential target web applications. Additionally, it provides a simple to use graphical user interface (GUI) to navigate through a vulnerable remote database. Escrow is implementation-agnostic, i.e. It can perform analysis on any web application regardless of the server-side implementation (PHP, ASP, etc.). Using our tool, we discovered that it is indeed possible to identify and exploit at least 100 databases per 100 minutes, without prior knowledge of their underlying implementation. We observed that for each query sent, we can scan and detect dozens of vulnerable web applications in a short space of time, while providing a means for exploitation. Finally, we provide recommendations for developers to defend against SQL injection and emphasise the need for proactive assessment and defensive coding practices
- …