12,174 research outputs found
Dynamic Information Flow Analysis in Ruby
With the rapid increase in usage of the internet and online applications, there is a huge demand for applications to handle data privacy and integrity. Applications are already complex with business logic; adding the data safety logic would make them more complicated. The more complex the code becomes, the more possibilities it opens for security-critical bugs. To solve this conundrum, we can push this data safety handling feature to the language level rather than the application level. With a secure language, developers can write their application without having to worry about data security.
This project introduces dynamic information flow analysis in Ruby. I extend the JRuby implementation, which is a widely used implementation of Ruby written in Java. Information flow analysis classifies variables used in the program into different security levels and monitors the data flow across levels. Ruby currently supports data integrity by a tainting mechanism. This project extends this tainting mechanism to handle implicit data flows, enabling it to protect confidentiality as well as integrity. Experimental results based on Ruby benchmarks are presented in this paper, which show that: This project protects confidentiality but at the cost of 1.2 - 10 times slowdown in execution time
What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing
Driven by new software development processes and testing in clouds, system
and integration testing nowadays tends to produce enormous number of alarms.
Such test alarms lay an almost unbearable burden on software testing engineers
who have to manually analyze the causes of these alarms. The causes are
critical because they decide which stakeholders are responsible to fix the bugs
detected during the testing. In this paper, we present a novel approach that
aims to relieve the burden by automating the procedure. Our approach, called
Cause Analysis Model, exploits information retrieval techniques to efficiently
infer test alarm causes based on test logs. We have developed a prototype and
evaluated our tool on two industrial datasets with more than 14,000 test
alarms. Experiments on the two datasets show that our tool achieves an accuracy
of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by
up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per
cause analysis. Due to the attractive experimental results, our industrial
partner, a leading information and communication technology company in the
world, has deployed the tool and it achieves an average accuracy of 72% after
two months of running, nearly three times more accurate than a previous
strategy based on regular expressions.Comment: 12 page
CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning
To accelerate software development, much research has been performed to help
people understand and reuse the huge amount of available code resources. Two
important tasks have been widely studied: code retrieval, which aims to
retrieve code snippets relevant to a given natural language query from a code
base, and code annotation, where the goal is to annotate a code snippet with a
natural language description. Despite their advancement in recent years, the
two tasks are mostly explored separately. In this work, we investigate a novel
perspective of Code annotation for Code retrieval (hence called `CoaCor'),
where a code annotation model is trained to generate a natural language
annotation that can represent the semantic meaning of a given code snippet and
can be leveraged by a code retrieval model to better distinguish relevant code
snippets from others. To this end, we propose an effective framework based on
reinforcement learning, which explicitly encourages the code annotation model
to generate annotations that can be used for the retrieval task. Through
extensive experiments, we show that code annotations generated by our framework
are much more detailed and more useful for code retrieval, and they can further
improve the performance of existing code retrieval models significantly.Comment: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 201
Stack Overflow: A Code Laundering Platform?
Developers use Question and Answer (Q&A) websites to exchange knowledge and
expertise. Stack Overflow is a popular Q&A website where developers discuss
coding problems and share code examples. Although all Stack Overflow posts are
free to access, code examples on Stack Overflow are governed by the Creative
Commons Attribute-ShareAlike 3.0 Unported license that developers should obey
when reusing code from Stack Overflow or posting code to Stack Overflow. In
this paper, we conduct a case study with 399 Android apps, to investigate
whether developers respect license terms when reusing code from Stack Overflow
posts (and the other way around). We found 232 code snippets in 62 Android apps
from our dataset that were potentially reused from Stack Overflow, and 1,226
Stack Overflow posts containing code examples that are clones of code released
in 68 Android apps, suggesting that developers may have copied the code of
these apps to answer Stack Overflow questions. We investigated the licenses of
these pieces of code and observed 1,279 cases of potential license violations
(related to code posting to Stack overflow or code reuse from Stack overflow).
This paper aims to raise the awareness of the software engineering community
about potential unethical code reuse activities taking place on Q&A websites
like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software
Analysis, Evolution, and Reengineering (SANER
Automatic generation of hardware Tree Classifiers
Machine Learning is growing in popularity and spreading across different fields for various applications. Due to this trend, machine learning algorithms use different hardware platforms and are being experimented to obtain high test accuracy and throughput. FPGAs are well-suited hardware platform for machine learning because of its re-programmability and lower power consumption. Programming using FPGAs for machine learning algorithms requires substantial engineering time and effort compared to software implementation. We propose a software assisted design flow to program FPGA for machine learning algorithms using our hardware library. The hardware library is highly parameterized and it accommodates Tree Classifiers. As of now, our library consists of the components required to implement decision trees and random forests. The whole automation is wrapped around using a python script which takes you from the first step of having a dataset and design choices to the last step of having a hardware descriptive code for the trained machine learning model
- …
