648 research outputs found
Searching, Selecting, and Synthesizing Source Code Components
As programmers develop software, they instinctively sense that source code exists that could be reused if found --- many programming tasks are common to many software projects across different domains. oftentimes, a programmer will attempt to create new software from this existing source code, such as third-party libraries or code from online repositories. Unfortunately, several major challenges make it difficult to locate the relevant source code and to reuse it. First, there is a fundamental mismatch between the high-level intent reflected in the descriptions of source code, and the low-level implementation details. This mismatch is known as the concept assignment problem , and refers to the frequent case when the keywords from comments or identifiers in code do not match the features implemented in the code. Second, even if relevant source code is found, programmers must invest significant intellectual effort into understanding how to reuse the different functions, classes, or other components present in the source code. These components may be specific to a particular application, and difficult to reuse.;One key source of information that programmers use to understand source code is the set of relationships among the source code components. These relationships are typically structural data, such as function calls or class instantiations. This structural data has been repeatedly suggested as an alternative to textual analysis for search and reuse, however as yet no comprehensive strategy exists for locating relevant and reusable source code. In my research program, I harness this structural data in a unified approach to creating and evolving software from existing components. For locating relevant source code, I present a search engine for finding applications based on the underlying Application Programming Interface (API) calls, and a technique for finding chains of relevant function invocations from repositories of millions of lines of code. Next, for reusing source code, I introduce a system to facilitate building software prototypes from existing packages, and an approach to detecting similar software applications
SUPPORTING DEVELOPER-ONBOARDING WITH ENHANCED RESOURCE FINDING AND VISUAL EXPLORATION
Understanding the basic structure of a code base and a development team are essential to get new developers up to speed in a software development project. Developers do so through the process of early experimentation with code and the creation of mental models of technical and social structures in a project. However, getting up-to-speed in a new project can be challenging due to difficulties in: finding the right place to begin explorations, expanding the focus to determine relevant resources for tasks, and identifying dependencies across project elements to gain a high-level overview of project structures. In this thesis, I first identified six challenges that developers face during the process of developer onboarding from recent research studies and informal interviews with developers. To address these challenges, I implemented automated tool support with enhanced resource finding and visual exploration. Specifically, I proposed six functional requirements for supporting developers onboarding. I then extended the project tool Tesseract to support these functionalities to help novice developers and relevant resources (files, developers, bugs, etc.) and understand project structures when joining a new project. To understand how the onboarding functionalities work in supporting developers\u27 onboarding process, I conducted a user study with typical onboarding tasks requiring early experimentation and internalizing project structures. The results indicated that enhanced search features, the ability to explore semantic relationships across repositories, and network-centric visualizations of project structures were very effective in supporting onboarding
HopSkipJumpAttack: A Query-Efficient Decision-Based Attack
The goal of a decision-based adversarial attack on a trained model is to
generate adversarial examples based solely on observing output labels returned
by the targeted model. We develop HopSkipJumpAttack, a family of algorithms
based on a novel estimate of the gradient direction using binary information at
the decision boundary. The proposed family includes both untargeted and
targeted attacks optimized for and similarity metrics
respectively. Theoretical analysis is provided for the proposed algorithms and
the gradient direction estimate. Experiments show HopSkipJumpAttack requires
significantly fewer model queries than Boundary Attack. It also achieves
competitive performance in attacking several widely-used defense mechanisms.
(HopSkipJumpAttack was named Boundary Attack++ in a previous version of the
preprint.
Recommended from our members
Integrating multiple document features in language models for expert finding
We argue that expert finding is sensitive to multiple document features in an organizational intranet. These document features include multiple levels of associations between experts and a query topic from sentence, paragraph, up to document levels, document authority information such as the PageRank, indegree, and URL length of documents, and internal document structures that indicate the experts' relationship with the content of documents. Our assumption is that expert finding can largely benefit from the incorporation of these document features. However, existing language modeling approaches for expert finding have not sufficiently taken into account these document features. We propose a novel language modeling approach, which integrates multiple document features, for expert finding. Our experiments on two large scale TREC Enterprise Track datasets, i.e., the W3C and CSIRO datasets, demonstrate that the natures of the two organizational intranets and two types of expert finding tasks, i.e., key contact finding for CSIRO and knowledgeable person finding for W3C, influence the effectiveness of different document features. Our work provides insights into which document features work for certain types of expert finding tasks, and helps design expert finding strategies that are effective for different scenarios. Our main contribution is to develop an effective formal method for modeling multiple document features in expert finding, and conduct a systematic investigation of their effects. It is worth noting that our novel approach achieves better results in terms of MAP than previous language model based approaches and the best automatic runs in both the TREC2006 and TREC2007 expert search tasks, respectively
- …