2 research outputs found
AutoPipeline: Synthesize Data Pipelines By-Target Using Reinforcement Learning and Search
Recent work has made significant progress in helping users to automate single
data preparation steps, such as string-transformations and table-manipulation
operators (e.g., Join, GroupBy, Pivot, etc.). We in this work propose to
automate multiple such steps end-to-end, by synthesizing complex data pipelines
with both string transformations and table-manipulation operators. We propose a
novel "by-target" paradigm that allows users to easily specify the desired
pipeline, which is a significant departure from the traditional by-example
paradigm. Using by-target, users would provide input tables (e.g., csv or json
files), and point us to a "target table" (e.g., an existing database table or
BI dashboard) to demonstrate how the output from the desired pipeline would
schematically "look like". While the problem is seemingly underspecified, our
unique insight is that implicit table constraints such as FDs and keys can be
exploited to significantly constrain the space to make the problem tractable.
We develop an Auto-Pipeline system that learns to synthesize pipelines using
reinforcement learning and search. Experiments on large numbers of real
pipelines crawled from GitHub suggest that Auto-Pipeline can successfully
synthesize 60-70% of these complex pipelines (up to 10 steps) in 10-20 seconds
on average
Example-Driven User Intent Discovery: Empowering Users to Cross the SQL Barrier Through Query by Example
Traditional data systems require specialized technical skills where users
need to understand the data organization and write precise queries to access
data. Therefore, novice users who lack technical expertise face hurdles in
perusing and analyzing data. Existing tools assist in formulating queries
through keyword search, query recommendation, and query auto-completion, but
still require some technical expertise. An alternative method for accessing
data is Query by Example (QBE), where users express their data exploration
intent simply by providing examples of their intended data. We study a
state-of-the-art QBE system called SQuID, and contrast it with traditional SQL
querying. Our comparative user studies demonstrate that users with varying
expertise are significantly more effective and efficient with SQuID than SQL.
We find that SQuID eliminates the barriers in studying the database schema,
formalizing task semantics, and writing syntactically correct SQL queries, and
thus, substantially alleviates the need for technical expertise in data
exploration