222 research outputs found

    Automatic query reformulation for code search using crowdsourced knowledge

    Get PDF
    Ministry of Education, Singapore under its Academic Research Funding Tier

    Supporting Source Code Search with Context-Aware and Semantics-Driven Query Reformulation

    Get PDF
    Software bugs and failures cost trillions of dollars every year, and could even lead to deadly accidents (e.g., Therac-25 accident). During maintenance, software developers fix numerous bugs and implement hundreds of new features by making necessary changes to the existing software code. Once an issue report (e.g., bug report, change request) is assigned to a developer, she chooses a few important keywords from the report as a search query, and then attempts to find out the exact locations in the software code that need to be either repaired or enhanced. As a part of this maintenance, developers also often select ad hoc queries on the fly, and attempt to locate the reusable code from the Internet that could assist them either in bug fixing or in feature implementation. Unfortunately, even the experienced developers often fail to construct the right search queries. Even if the developers come up with a few ad hoc queries, most of them require frequent modifications which cost significant development time and efforts. Thus, construction of an appropriate query for localizing the software bugs, programming concepts or even the reusable code is a major challenge. In this thesis, we overcome this query construction challenge with six studies, and develop a novel, effective code search solution (BugDoctor) that assists the developers in localizing the software code of interest (e.g., bugs, concepts and reusable code) during software maintenance. In particular, we reformulate a given search query (1) by designing novel keyword selection algorithms (e.g., CodeRank) that outperform the traditional alternatives (e.g., TF-IDF), (2) by leveraging the bug report quality paradigm and source document structures which were previously overlooked and (3) by exploiting the crowd knowledge and word semantics derived from Stack Overflow Q&A site, which were previously untapped. Our experiment using 5000+ search queries (bug reports, change requests, and ad hoc queries) suggests that our proposed approach can improve the given queries significantly through automated query reformulations. Comparison with 10+ existing studies on bug localization, concept location and Internet-scale code search suggests that our approach can outperform the state-of-the-art approaches with a significant margin

    RACK: Code Search in the IDE Using Crowdsourced Knowledge

    Get PDF
    Traditional code search engines often do not perform well with natural language queries since they mostly apply keyword matching. These engines thus require carefully designed queries containing information about programming APIs for code search. Unfortunately, existing studies suggest that preparing an effective query for code search is both challenging and time consuming for the developers. In this paper, we propose a novel code search tool--RACK--that returns relevant source code for a given code search query written in natural language text. The tool first translates the query into a list of relevant API classes by mining keyword-API associations from the crowdsourced knowledge of Stack Overflow, and then applies the reformulated query to GitHub code search API for collecting relevant results. Once a query related to a programming task is submitted, the tool automatically mines relevant code snippets from thousands of open-source projects, and displays them as a ranked list within the context of the developer's programming environment--the IDE. Tool page: http://www.usask.ca/~masud.rahman/rackComment: The 39th International Conference on Software Engineering (Companion volume) (ICSE 2017), pp. 51--54, Buenos Aires, Argentina, May, 201

    A Systematic Review of Automated Query Reformulations in Source Code Search

    Full text link
    Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies attempt to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations.Comment: 81 pages, accepted at TOSE

    Boosting API Recommendation with Implicit Feedback

    Get PDF
    Developers often need to use appropriate APIs to program efficiently, but it is usually a difficult task to identify the exact one they need from a vast of candidates. To ease the burden, a multitude of API recommendation approaches have been proposed. However, most of the currently available API recommenders do not support the effective integration of users' feedback into the recommendation loop. In this paper, we propose a framework, BRAID (Boosting RecommendAtion with Implicit FeeDback), which leverages learning-to-rank and active learning techniques to boost recommendation performance. By exploiting users' feedback information, we train a learning-to-rank model to re-rank the recommendation results. In addition, we speed up the feedback learning process with active learning. Existing query-based API recommendation approaches can be plugged into BRAID. We select three state-of-the-art API recommendation approaches as baselines to demonstrate the performance enhancement of BRAID measured by Hit@k (Top-k), MAP, and MRR. Empirical experiments show that, with acceptable overheads, the recommendation performance improves steadily and substantially with the increasing percentage of feedback data, comparing with the baselines.Comment: 15 pages, 4 figure

    Anticipating Information Needs Based on Check-in Activity

    Full text link
    In this work we address the development of a smart personal assistant that is capable of anticipating a user's information needs based on a novel type of context: the person's activity inferred from her check-in records on a location-based social network. Our main contribution is a method that translates a check-in activity into an information need, which is in turn addressed with an appropriate information card. This task is challenging because of the large number of possible activities and related information needs, which need to be addressed in a mobile dashboard that is limited in size. Our approach considers each possible activity that might follow after the last (and already finished) activity, and selects the top information cards such that they maximize the likelihood of satisfying the user's information needs for all possible future scenarios. The proposed models also incorporate knowledge about the temporal dynamics of information needs. Using a combination of historical check-in data and manual assessments collected via crowdsourcing, we show experimentally the effectiveness of our approach.Comment: Proceedings of the 10th ACM International Conference on Web Search and Data Mining (WSDM '17), 201
    • …
    corecore