4,024 research outputs found
Opportunities in Software Engineering Research for Web API Consumption
Nowadays, invoking third party code increasingly involves calling web
services via their web APIs, as opposed to the more traditional scenario of
downloading a library and invoking the library's API. However, there are also
new challenges for developers calling these web APIs. In this paper, we
highlight a broad set of these challenges and argue for resulting opportunities
for software engineering research to support developers in consuming web APIs.
We outline two specific research threads in this context: (1) web API
specification curation, which enables us to know the signatures of web APIs,
and (2) static analysis that is capable of extracting URLs, HTTP methods etc.
of web API calls. Furthermore, we present new work on how we combine (1) and
(2) to provide IDE support for application developers consuming web APIs. As
web APIs are used broadly, research in supporting the consumption of web APIs
offers exciting opportunities.Comment: Erik Wittern and Annie Ying are both first author
Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks
Malware still constitutes a major threat in the cybersecurity landscape, also
due to the widespread use of infection vectors such as documents. These
infection vectors hide embedded malicious code to the victim users,
facilitating the use of social engineering techniques to infect their machines.
Research showed that machine-learning algorithms provide effective detection
mechanisms against such threats, but the existence of an arms race in
adversarial settings has recently challenged such systems. In this work, we
focus on malware embedded in PDF files as a representative case of such an arms
race. We start by providing a comprehensive taxonomy of the different
approaches used to generate PDF malware, and of the corresponding
learning-based detection systems. We then categorize threats specifically
targeted against learning-based PDF malware detectors, using a well-established
framework in the field of adversarial machine learning. This framework allows
us to categorize known vulnerabilities of learning-based PDF malware detectors
and to identify novel attacks that may threaten such systems, along with the
potential defense mechanisms that can mitigate the impact of such threats. We
conclude the paper by discussing how such findings highlight promising research
directions towards tackling the more general challenge of designing robust
malware detectors in adversarial settings
Community standards for open cell migration data
Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration
From Query to Usable Code: An Analysis of Stack Overflow Code Snippets
Enriched by natural language texts, Stack Overflow code snippets are an
invaluable code-centric knowledge base of small units of source code. Besides
being useful for software developers, these annotated snippets can potentially
serve as the basis for automated tools that provide working code solutions to
specific natural language queries.
With the goal of developing automated tools with the Stack Overflow snippets
and surrounding text, this paper investigates the following questions: (1) How
usable are the Stack Overflow code snippets? and (2) When using text search
engines for matching on the natural language questions and answers around the
snippets, what percentage of the top results contain usable code snippets?
A total of 3M code snippets are analyzed across four languages: C\#, Java,
JavaScript, and Python. Python and JavaScript proved to be the languages for
which the most code snippets are usable. Conversely, Java and C\# proved to be
the languages with the lowest usability rate. Further qualitative analysis on
usable Python snippets shows the characteristics of the answers that solve the
original question. Finally, we use Google search to investigate the alignment
of usability and the natural language annotations around code snippets, and
explore how to make snippets in Stack Overflow an adequate base for future
automatic program generation.Comment: 13th IEEE/ACM International Conference on Mining Software
Repositories, 11 page
DyPyBench: A Benchmark of Executable Python Software
Python has emerged as one of the most popular programming languages,
extensively utilized in domains such as machine learning, data analysis, and
web applications. Python's dynamic nature and extensive usage make it an
attractive candidate for dynamic program analysis. However, unlike for other
popular languages, there currently is no comprehensive benchmark suite of
executable Python projects, which hinders the development of dynamic analyses.
This work addresses this gap by presenting DyPyBench, the first benchmark of
Python projects that is large scale, diverse, ready to run (i.e., with fully
configured and prepared test suites), and ready to analyze (by integrating with
the DynaPyt dynamic analysis framework). The benchmark encompasses 50 popular
opensource projects from various application domains, with a total of 681k
lines of Python code, and 30k test cases. DyPyBench enables various
applications in testing and dynamic analysis, of which we explore three in this
work: (i) Gathering dynamic call graphs and empirically comparing them to
statically computed call graphs, which exposes and quantifies limitations of
existing call graph construction techniques for Python. (ii) Using DyPyBench to
build a training data set for LExecutor, a neural model that learns to predict
values that otherwise would be missing at runtime. (iii) Using dynamically
gathered execution traces to mine API usage specifications, which establishes a
baseline for future work on specification mining for Python. We envision
DyPyBench to provide a basis for other dynamic analyses and for studying the
runtime behavior of Python code
Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development
Mobile devices and platforms have become an established target for modern
software developers due to performant hardware and a large and growing user
base numbering in the billions. Despite their popularity, the software
development process for mobile apps comes with a set of unique, domain-specific
challenges rooted in program comprehension. Many of these challenges stem from
developer difficulties in reasoning about different representations of a
program, a phenomenon we define as a "language dichotomy". In this paper, we
reflect upon the various language dichotomies that contribute to open problems
in program comprehension and development for mobile apps. Furthermore, to help
guide the research community towards effective solutions for these problems, we
provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference
on Program Comprehension (ICPC'18
- …