53,614 research outputs found

    Recommending Comprehensive Solutions for Programming Tasks by Mining Crowd Knowledge

    Full text link
    Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These problems make the developers browse dozens of documents in order to synthesize an appropriate solution. To address these two problems, we propose CROKAGE (Crowd Knowledge Answer Generator), a tool that takes the description of a programming task (the query) and provides a comprehensive solution for the task. Our solutions contain not only relevant code examples but also their succinct explanations. Our proposed approach expands the task description with relevant API classes from Stack Overflow Q&A threads and then mitigates the lexical gap problems. Furthermore, we perform natural language processing on the top quality answers and then return such programming solutions containing code examples and code explanations unlike earlier studies. We evaluate our approach using 48 programming queries and show that it outperforms six baselines including the state-of-art by a statistically significant margin. Furthermore, our evaluation with 29 developers using 24 tasks (queries) confirms the superiority of CROKAGE over the state-of-art tool in terms of relevance of the suggested code examples, benefit of the code explanations and the overall solution quality (code + explanation).Comment: Accepted at ICPC, 12 pages, 201

    Bootstrapping Cookbooks for APIs from Crowd Knowledge on Stack Overflow

    Full text link
    Well established libraries typically have API documentation. However, they frequently lack examples and explanations, possibly making difficult their effective reuse. Stack Overflow is a question-and-answer website oriented to issues related to software development. Despite the increasing adoption of Stack Overflow, the information related to a particular topic (e.g., an API) is spread across the website. Thus, Stack Overflow still lacks organization of the crowd knowledge available on it. Our target goal is to address the problem of the poor quality documentation for APIs by providing an alternative artifact to document them based on the crowd knowledge available on Stack Overflow, called crowd cookbook. A cookbook is a recipe-oriented book, and we refer to our cookbook as crowd cookbook since it contains content generated by a crowd. The cookbooks are meant to be used through an exploration process, i.e. browsing. In this paper, we present a semi-automatic approach that organizes the crowd knowledge available on Stack Overflow to build cookbooks for APIs. We have generated cookbooks for three APIs widely used by the software development community: SWT, LINQ and QT. We have also defined desired properties that crowd cookbooks must meet, and we conducted an evaluation of the cookbooks against these properties with human subjects. The results showed that the cookbooks built using our approach, in general, meet those properties. As a highlight, most of the recipes were considered appropriate to be in the cookbooks and have self-contained information. We concluded that our approach is capable to produce adequate cookbooks automatically, which can be as useful as manually produced cookbooks. This opens an opportunity for API designers to enrich existent cookbooks with the different points of view from the crowd, or even to generate initial versions of new cookbooks.Comment: Accepted at Information and Software Technology - Journal - Elsevier. 16 page

    A Survey on Data Collection for Machine Learning: a Big Data -- AI Integration Perspective

    Full text link
    Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.Comment: 20 page

    Computational Thinking with the Web Crowd using CodeMapper

    Full text link
    It has been argued that computational thinking should precede computer programming in the course of a career in computing. This argument is the basis for the slogan "logic first, syntax later" and the development of many cryptic syntax removed programming languages such as Scratch!, Blockly and Visual Logic. The goal is to focus on the structuring of the semantic relationships among the logical building blocks to yield solutions to computational problems. While this approach is helping novice programmers and early learners, the gap between computational thinking and professional programming using high level languages such as C++, Python and Java is quite wide. It is wide enough for about one third students in first college computer science classes to drop out or fail. In this paper, we introduce a new programming platform, called the CodeMapper, in which learners are able to build computational logic in independent modules and aggregate them to create complex modules. Code{\em Mapper} is an abstract development environment in which rapid visual prototyping of small to substantially large systems is possible by combining already developed independent modules in logical steps. The challenge we address involves supporting a visual development environment in which "annotated code snippets" authored by the masses in social computing sites such as SourceForge, StackOverflow or GitHub can also be used as is into prototypes and mapped to real executable programs. CodeMapper thus facilitates soft transition from visual programming to syntax driven programming without having to practice syntax too heavily.Comment: 8 page

    Iris: A Conversational Agent for Complex Tasks

    Full text link
    Today's conversational agents are restricted to simple standalone commands. In this paper, we present Iris, an agent that draws on human conversational strategies to combine commands, allowing it to perform more complex tasks that it has not been explicitly designed to support: for example, composing one command to "plot a histogram" with another to first "log-transform the data". To enable this complexity, we introduce a domain specific language that transforms commands into automata that Iris can compose, sequence, and execute dynamically by interacting with a user through natural language, as well as a conversational type system that manages what kinds of commands can be combined. We have designed Iris to help users with data science tasks, a domain that requires support for command combination. In evaluation, we find that data scientists complete a predictive modeling task significantly faster (2.6 times speedup) with Iris than a modern non-conversational programming environment. Iris supports the same kinds of commands as today's agents, but empowers users to weave together these commands to accomplish complex goals

    Using StackOverflow content to assist in code review

    Full text link
    An important goal for programmers is to minimize cost of identifying and correcting defects in source code. Code review is commonly used for identifying programming defects. However, manual code review has some shortcomings: a) it is time consuming, b) outcomes are subjective and depend on the skills of reviewers. An automated approach for assisting in code reviews is thus highly desirable. We present a tool for assisting in code review and results from our experiments evaluating the tool in different scenarios. The tool leveraged content available from professional programmer support forums (e.g. StackOverflow.com) to determine potential defectiveness of a given piece of source code. The defectiveness is expressed on the scale of {Likely defective, Neutral, Unlikely to be defective}. Basic idea employed in the tool is to: a) Identify a set P of discussion posts on StackOverflow such that each p in P contains source code fragment(s) which sufficiently resemble the input code C being reviewed. b) Determine the likelihood of C being defective by considering all p in P . A novel aspect of our approach is to use document fingerprinting for comparing two pieces of source code. Our choice of document fingerprinting technique is inspired by source code plagiarism detection tools where it has proven to be very successful. In the experiments that we performed to verify effectiveness of our approach source code samples from more than 300 GitHub open source repositories were taken as input. A precision of more than 90% in identifying correct/relevant results has been achieved.Comment: Keywords: Code Review, StackOverflow, Software Development, Crowd Knowledge, Automated Software Engineerin

    Crowdsourced Behavior-Driven Development: Implementing Microservices through Microtasks

    Full text link
    Key to the effectiveness of crowdsourcing approaches for software engineering is workflow design, describing how complex work is organized into small, relatively independent microtasks. In this paper, we introduce a Behavior-Driven Development (BDD) workflow for accomplishing programming work through self-contained microtasks, implemented as a preconfigured environment called Crowd Microservices. In our approach, a client, acting on behalf of a software team, describes a microservice as a set of endpoints with paths, requests, and responses. A crowd then implements the endpoints, identifying individual endpoint behaviors which they test, implement, and debug, creating new functions and interacting with persistence APIs as needed. To evaluate our approach, we conducted a feasibility study in which a small crowd worked to implement a small ToDo microservice. The crowd created an implementation with only four defects, completing 350 microtasks and implementing 13 functions. We discuss the implications of these findings for incorporating crowdsourced programming contributions into traditional software projects

    RoboBrain: Large-Scale Knowledge Engine for Robots

    Full text link
    In this paper we introduce a knowledge engine, which learns and shares knowledge representations, for robots to carry out a variety of tasks. Building such an engine brings with it the challenge of dealing with multiple data modalities including symbols, natural language, haptic senses, robot trajectories, visual features and many others. The \textit{knowledge} stored in the engine comes from multiple sources including physical interactions that robots have while performing tasks (perception, planning and control), knowledge bases from the Internet and learned representations from several robotics research groups. We discuss various technical aspects and associated challenges such as modeling the correctness of knowledge, inferring latent information and formulating different robotic tasks as queries to the knowledge engine. We describe the system architecture and how it supports different mechanisms for users and robots to interact with the engine. Finally, we demonstrate its use in three important research areas: grounding natural language, perception, and planning, which are the key building blocks for many robotic tasks. This knowledge engine is a collaborative effort and we call it RoboBrain.Comment: 10 pages, 9 figure

    Computing trading strategies based on financial sentiment data using evolutionary optimization

    Full text link
    In this paper we apply evolutionary optimization techniques to compute optimal rule-based trading strategies based on financial sentiment data. The sentiment data was extracted from the social media service StockTwits to accommodate the level of bullishness or bearishness of the online trading community towards certain stocks. Numerical results for all stocks from the Dow Jones Industrial Average (DJIA) index are presented and a comparison to classical risk-return portfolio selection is provided

    CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant

    Full text link
    We propose a large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft. We describe the data collection process which yields additional 35K human generated instructions with their semantic annotations. We report the performance of three baseline models and find that while a dataset of this size helps us train a usable instruction parser, it still poses interesting generalization challenges which we hope will help develop better and more robust models
    corecore