307 research outputs found

    An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs

    Full text link
    Developers increasingly rely on API tutorials to facilitate software development. However, it remains a challenging task for them to discover relevant API tutorial fragments explaining unfamiliar APIs. Existing supervised approaches suffer from the heavy burden of manually preparing corpus-specific annotated data and features. In this study, we propose a novel unsupervised approach, namely Fragment Recommender for APIs with PageRank and Topic model (FRAPT). FRAPT can well address two main challenges lying in the task and effectively determine relevant tutorial fragments for APIs. In FRAPT, a Fragment Parser is proposed to identify APIs in tutorial fragments and replace ambiguous pronouns and variables with related ontologies and API names, so as to address the pronoun and variable resolution challenge. Then, a Fragment Filter employs a set of nonexplanatory detection rules to remove non-explanatory fragments, thus address the non-explanatory fragment identification challenge. Finally, two correlation scores are achieved and aggregated to determine relevant fragments for APIs, by applying both topic model and PageRank algorithm to the retained fragments. Extensive experiments over two publicly open tutorial corpora show that, FRAPT improves the state-of-the-art approach by 8.77% and 12.32% respectively in terms of F-Measure. The effectiveness of key components of FRAPT is also validated.Comment: 11 pages, 8 figures, In Proc. of 39rd IEEE International Conference on Software Engineering (ICSE'17

    A more accurate model for finding tutorial segments explaining APIs

    Get PDF
    Developers prefer to utilize third-party libraries when they implement some functionalities and Application Programming Interfaces (APIs) are frequently used by them. Facing an unfamiliar API, developers tend to consult tutorials as learning resources. Unfortunately, the segments explaining a specific API scatter across tutorials. Hence, it remains a challenging issue to find the relevant segments. In this study, we propose a more accurate model to find the exact tutorial fragments explaining APIs. This new model consists of a text classifier with domain specific features. More specifically, we discover two important indicators to complement traditional text based features, namely co-occurrence APIs and knowledge based API extensions. In addition, we incorporate Word2Vec, a semantic similarity metric to enhance the new model. Extensive experiments over two publicly available tutorial datasets show that our new model could find up to 90% fragments explaining APIs and improve the state-of-the-art model by up to 30% in terms of F-measure.Comment: 11 pages, 11 figures, In Proc. of 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER'16), pp.157-16

    ASSESSING THE QUALITY OF SOFTWARE DEVELOPMENT TUTORIALS AVAILABLE ON THE WEB

    Get PDF
    Both expert and novice software developers frequently access software development resources available on the Web in order to lookup or learn new APIs, tools and techniques. Software quality is affected negatively when developers fail to find high-quality information relevant to their problem. While there is a substantial amount of freely available resources that can be accessed online, some of the available resources contain information that suffers from error proneness, copyright infringement, security concerns, and incompatible versions. Use of such toxic information can have a strong negative effect on developer’s efficacy. This dissertation focuses specifically on software tutorials, aiming to automatically evaluate the quality of such documents available on the Web. In order to achieve this goal, we present two contributions: 1) scalable detection of duplicated code snippets; 2) automatic identification of valid version ranges. Software tutorials consist of a combination of source code snippets and natural language text. The code snippets in a tutorial can originate from different sources, perhaps carrying stringent licensing requirements or known security vulnerabilities. Developers, typically unaware of this, can reuse these code snippets in their project. First, in this thesis, we present our work on a Web-scale code clone search technique that is able to detect duplicate code snippets between large scale document and source code corpora in order to trace toxic code snippets. As software libraries and APIs evolve over time, existing software development tutorials can become outdated. It is difficult for software developers and especially novices to determine the expected version of the software implicit in a specific tutorial in order to decide whether the tutorial is applicable to their software development environment. To overcome this challenge, in this thesis we present a novel technique for automatic identification of the valid version range of software development tutorials on the Web

    Recognizing and understanding user behaviors from screencasts

    Get PDF
    User interacts with computers or mobile devices, leading to user behaviors on screen. In the context of software engineering, analyzing user behavior enables many applications such as intelligent bug fix, code completion and knowledge recommendation for developers. Such technique can be extended to more general knowledge worker environment, in which users have to manipulate devices according to specific guidelines. Existing works rely heavily on software instrumentation to obtain user actions from operation systems, which is hard to deploy and maintain. In addition, considering the security and privacy of some scenarios, non-intrusive is the major requirement to be included in the system. In this work, we leverage Computer Vision and Natural Language Processing techniques to recognize and understand user behaviors from screencasts, which is a non-intrusive and cross-platform method. We first recognize 10 categories of low level user actions such as mouse moving and type text, then summarize them to higher level abstractions (i.e. line-granularity coding steps). We also try to interpret user interaction with applications by multi-task learning and generate structured language descriptions (i.e. command, widget and location). Finally, unsupervised learning method is introduced for GUI linting problem, which is taken as a case study of user behavior analysis. To train the deep neural networks, we collect diverse video data from YouTube, Twitch and Bugzilla, and manually label them to build the dataset. The experiment results demonstrate the high performance of proposed method, and the user study validate the practical applications of many downstream tasks

    Mining Semantic Loop Idioms

    Get PDF
    corecore