53,614 research outputs found
Recommending Comprehensive Solutions for Programming Tasks by Mining Crowd Knowledge
Developers often search for relevant code examples on the web for their
programming tasks. Unfortunately, they face two major problems. First, the
search is impaired due to a lexical gap between their query (task description)
and the information associated with the solution. Second, the retrieved
solution may not be comprehensive, i.e., the code segment might miss a succinct
explanation. These problems make the developers browse dozens of documents in
order to synthesize an appropriate solution. To address these two problems, we
propose CROKAGE (Crowd Knowledge Answer Generator), a tool that takes the
description of a programming task (the query) and provides a comprehensive
solution for the task. Our solutions contain not only relevant code examples
but also their succinct explanations. Our proposed approach expands the task
description with relevant API classes from Stack Overflow Q&A threads and then
mitigates the lexical gap problems. Furthermore, we perform natural language
processing on the top quality answers and then return such programming
solutions containing code examples and code explanations unlike earlier
studies. We evaluate our approach using 48 programming queries and show that it
outperforms six baselines including the state-of-art by a statistically
significant margin. Furthermore, our evaluation with 29 developers using 24
tasks (queries) confirms the superiority of CROKAGE over the state-of-art tool
in terms of relevance of the suggested code examples, benefit of the code
explanations and the overall solution quality (code + explanation).Comment: Accepted at ICPC, 12 pages, 201
Bootstrapping Cookbooks for APIs from Crowd Knowledge on Stack Overflow
Well established libraries typically have API documentation. However, they
frequently lack examples and explanations, possibly making difficult their
effective reuse. Stack Overflow is a question-and-answer website oriented to
issues related to software development. Despite the increasing adoption of
Stack Overflow, the information related to a particular topic (e.g., an API) is
spread across the website. Thus, Stack Overflow still lacks organization of the
crowd knowledge available on it. Our target goal is to address the problem of
the poor quality documentation for APIs by providing an alternative artifact to
document them based on the crowd knowledge available on Stack Overflow, called
crowd cookbook. A cookbook is a recipe-oriented book, and we refer to our
cookbook as crowd cookbook since it contains content generated by a crowd. The
cookbooks are meant to be used through an exploration process, i.e. browsing.
In this paper, we present a semi-automatic approach that organizes the crowd
knowledge available on Stack Overflow to build cookbooks for APIs. We have
generated cookbooks for three APIs widely used by the software development
community: SWT, LINQ and QT. We have also defined desired properties that crowd
cookbooks must meet, and we conducted an evaluation of the cookbooks against
these properties with human subjects. The results showed that the cookbooks
built using our approach, in general, meet those properties. As a highlight,
most of the recipes were considered appropriate to be in the cookbooks and have
self-contained information. We concluded that our approach is capable to
produce adequate cookbooks automatically, which can be as useful as manually
produced cookbooks. This opens an opportunity for API designers to enrich
existent cookbooks with the different points of view from the crowd, or even to
generate initial versions of new cookbooks.Comment: Accepted at Information and Software Technology - Journal - Elsevier.
16 page
A Survey on Data Collection for Machine Learning: a Big Data -- AI Integration Perspective
Data collection is a major bottleneck in machine learning and an active
research topic in multiple communities. There are largely two reasons data
collection has recently become a critical issue. First, as machine learning is
becoming more widely-used, we are seeing new applications that do not
necessarily have enough labeled data. Second, unlike traditional machine
learning, deep learning techniques automatically generate features, which saves
feature engineering costs, but in return may require larger amounts of labeled
data. Interestingly, recent research in data collection comes not only from the
machine learning, natural language, and computer vision communities, but also
from the data management community due to the importance of handling large
amounts of data. In this survey, we perform a comprehensive study of data
collection from a data management point of view. Data collection largely
consists of data acquisition, data labeling, and improvement of existing data
or models. We provide a research landscape of these operations, provide
guidelines on which technique to use when, and identify interesting research
challenges. The integration of machine learning and data management for data
collection is part of a larger trend of Big data and Artificial Intelligence
(AI) integration and opens many opportunities for new research.Comment: 20 page
Computational Thinking with the Web Crowd using CodeMapper
It has been argued that computational thinking should precede computer
programming in the course of a career in computing. This argument is the basis
for the slogan "logic first, syntax later" and the development of many cryptic
syntax removed programming languages such as Scratch!, Blockly and Visual
Logic. The goal is to focus on the structuring of the semantic relationships
among the logical building blocks to yield solutions to computational problems.
While this approach is helping novice programmers and early learners, the gap
between computational thinking and professional programming using high level
languages such as C++, Python and Java is quite wide. It is wide enough for
about one third students in first college computer science classes to drop out
or fail. In this paper, we introduce a new programming platform, called the
CodeMapper, in which learners are able to build computational logic in
independent modules and aggregate them to create complex modules. Code{\em
Mapper} is an abstract development environment in which rapid visual
prototyping of small to substantially large systems is possible by combining
already developed independent modules in logical steps. The challenge we
address involves supporting a visual development environment in which
"annotated code snippets" authored by the masses in social computing sites such
as SourceForge, StackOverflow or GitHub can also be used as is into prototypes
and mapped to real executable programs. CodeMapper thus facilitates soft
transition from visual programming to syntax driven programming without having
to practice syntax too heavily.Comment: 8 page
Iris: A Conversational Agent for Complex Tasks
Today's conversational agents are restricted to simple standalone commands.
In this paper, we present Iris, an agent that draws on human conversational
strategies to combine commands, allowing it to perform more complex tasks that
it has not been explicitly designed to support: for example, composing one
command to "plot a histogram" with another to first "log-transform the data".
To enable this complexity, we introduce a domain specific language that
transforms commands into automata that Iris can compose, sequence, and execute
dynamically by interacting with a user through natural language, as well as a
conversational type system that manages what kinds of commands can be combined.
We have designed Iris to help users with data science tasks, a domain that
requires support for command combination. In evaluation, we find that data
scientists complete a predictive modeling task significantly faster (2.6 times
speedup) with Iris than a modern non-conversational programming environment.
Iris supports the same kinds of commands as today's agents, but empowers users
to weave together these commands to accomplish complex goals
Using StackOverflow content to assist in code review
An important goal for programmers is to minimize cost of identifying and
correcting defects in source code. Code review is commonly used for identifying
programming defects. However, manual code review has some shortcomings: a) it
is time consuming, b) outcomes are subjective and depend on the skills of
reviewers. An automated approach for assisting in code reviews is thus highly
desirable. We present a tool for assisting in code review and results from our
experiments evaluating the tool in different scenarios. The tool leveraged
content available from professional programmer support forums (e.g.
StackOverflow.com) to determine potential defectiveness of a given piece of
source code. The defectiveness is expressed on the scale of {Likely defective,
Neutral, Unlikely to be defective}. Basic idea employed in the tool is to: a)
Identify a set P of discussion posts on StackOverflow such that each p in P
contains source code fragment(s) which sufficiently resemble the input code C
being reviewed. b) Determine the likelihood of C being defective by considering
all p in P . A novel aspect of our approach is to use document fingerprinting
for comparing two pieces of source code. Our choice of document fingerprinting
technique is inspired by source code plagiarism detection tools where it has
proven to be very successful. In the experiments that we performed to verify
effectiveness of our approach source code samples from more than 300 GitHub
open source repositories were taken as input. A precision of more than 90% in
identifying correct/relevant results has been achieved.Comment: Keywords: Code Review, StackOverflow, Software Development, Crowd
Knowledge, Automated Software Engineerin
Crowdsourced Behavior-Driven Development: Implementing Microservices through Microtasks
Key to the effectiveness of crowdsourcing approaches for software engineering
is workflow design, describing how complex work is organized into small,
relatively independent microtasks. In this paper, we introduce a
Behavior-Driven Development (BDD) workflow for accomplishing programming work
through self-contained microtasks, implemented as a preconfigured environment
called Crowd Microservices. In our approach, a client, acting on behalf of a
software team, describes a microservice as a set of endpoints with paths,
requests, and responses. A crowd then implements the endpoints, identifying
individual endpoint behaviors which they test, implement, and debug, creating
new functions and interacting with persistence APIs as needed. To evaluate our
approach, we conducted a feasibility study in which a small crowd worked to
implement a small ToDo microservice. The crowd created an implementation with
only four defects, completing 350 microtasks and implementing 13 functions. We
discuss the implications of these findings for incorporating crowdsourced
programming contributions into traditional software projects
RoboBrain: Large-Scale Knowledge Engine for Robots
In this paper we introduce a knowledge engine, which learns and shares
knowledge representations, for robots to carry out a variety of tasks. Building
such an engine brings with it the challenge of dealing with multiple data
modalities including symbols, natural language, haptic senses, robot
trajectories, visual features and many others. The \textit{knowledge} stored in
the engine comes from multiple sources including physical interactions that
robots have while performing tasks (perception, planning and control),
knowledge bases from the Internet and learned representations from several
robotics research groups.
We discuss various technical aspects and associated challenges such as
modeling the correctness of knowledge, inferring latent information and
formulating different robotic tasks as queries to the knowledge engine. We
describe the system architecture and how it supports different mechanisms for
users and robots to interact with the engine. Finally, we demonstrate its use
in three important research areas: grounding natural language, perception, and
planning, which are the key building blocks for many robotic tasks. This
knowledge engine is a collaborative effort and we call it RoboBrain.Comment: 10 pages, 9 figure
Computing trading strategies based on financial sentiment data using evolutionary optimization
In this paper we apply evolutionary optimization techniques to compute
optimal rule-based trading strategies based on financial sentiment data. The
sentiment data was extracted from the social media service StockTwits to
accommodate the level of bullishness or bearishness of the online trading
community towards certain stocks. Numerical results for all stocks from the Dow
Jones Industrial Average (DJIA) index are presented and a comparison to
classical risk-return portfolio selection is provided
CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant
We propose a large scale semantic parsing dataset focused on
instruction-driven communication with an agent in Minecraft. We describe the
data collection process which yields additional 35K human generated
instructions with their semantic annotations. We report the performance of
three baseline models and find that while a dataset of this size helps us train
a usable instruction parser, it still poses interesting generalization
challenges which we hope will help develop better and more robust models
- …