4,778 research outputs found
Automated Crowdturfing Attacks and Defenses in Online Review Systems
Malicious crowdsourcing forums are gaining traction as sources of spreading
misinformation online, but are limited by the costs of hiring and managing
human workers. In this paper, we identify a new class of attacks that leverage
deep learning language models (Recurrent Neural Networks or RNNs) to automate
the generation of fake online reviews for products and services. Not only are
these attacks cheap and therefore more scalable, but they can control rate of
content output to eliminate the signature burstiness that makes crowdsourced
campaigns easy to detect.
Using Yelp reviews as an example platform, we show how a two phased review
generation and customization attack can produce reviews that are
indistinguishable by state-of-the-art statistical detectors. We conduct a
survey-based user study to show these reviews not only evade human detection,
but also score high on "usefulness" metrics by users. Finally, we develop novel
automated defenses against these attacks, by leveraging the lossy
transformation introduced by the RNN training and generation cycle. We consider
countermeasures against our mechanisms, show that they produce unattractive
cost-benefit tradeoffs for attackers, and that they can be further curtailed by
simple constraints imposed by online service providers
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Binary code analysis allows analyzing binary code without having access to
the corresponding source code. A binary, after disassembly, is expressed in an
assembly language. This inspires us to approach binary analysis by leveraging
ideas and techniques from Natural Language Processing (NLP), a rich area
focused on processing text of various natural languages. We notice that binary
code analysis and NLP share a lot of analogical topics, such as semantics
extraction, summarization, and classification. This work utilizes these ideas
to address two important code similarity comparison problems. (I) Given a pair
of basic blocks for different instruction set architectures (ISAs), determining
whether their semantics is similar or not; and (II) given a piece of code of
interest, determining if it is contained in another piece of assembly code for
a different ISA. The solutions to these two problems have many applications,
such as cross-architecture vulnerability discovery and code plagiarism
detection. We implement a prototype system INNEREYE and perform a comprehensive
evaluation. A comparison between our approach and existing approaches to
Problem I shows that our system outperforms them in terms of accuracy,
efficiency and scalability. And the case studies utilizing the system
demonstrate that our solution to Problem II is effective. Moreover, this
research showcases how to apply ideas and techniques from NLP to large-scale
binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium
201
Shape-Based Plagiarism Detection for Flowchart Figures in Texts
Plagiarism detection is well known phenomenon in the academic arena. Copying
other people is considered as serious offence that needs to be checked. There
are many plagiarism detection systems such as turn-it-in that has been
developed to provide this checks. Most, if not all, discard the figures and
charts before checking for plagiarism. Discarding the figures and charts
results in look holes that people can take advantage. That means people can
plagiarized figures and charts easily without the current plagiarism systems
detecting it. There are very few papers which talks about flowcharts plagiarism
detection. Therefore, there is a need to develop a system that will detect
plagiarism in figures and charts. This paper presents a method for detecting
flow chart figure plagiarism based on shape-based image processing and
multimedia retrieval. The method managed to retrieve flowcharts with ranked
similarity according to different matching sets.Comment: 12 page
Citation sentence reuse behavior of scientists: A case study on massive bibliographic text dataset of computer science
Our current knowledge of scholarly plagiarism is largely based on the
similarity between full text research articles. In this paper, we propose an
innovative and novel conceptualization of scholarly plagiarism in the form of
reuse of explicit citation sentences in scientific research articles. Note that
while full-text plagiarism is an indicator of a gross-level behavior, copying
of citation sentences is a more nuanced micro-scale phenomenon observed even
for well-known researchers. The current work poses several interesting
questions and attempts to answer them by empirically investigating a large
bibliographic text dataset from computer science containing millions of lines
of citation sentences. In particular, we report evidences of massive copying
behavior. We also present several striking real examples throughout the paper
to showcase widespread adoption of this undesirable practice. In contrast to
the popular perception, we find that copying tendency increases as an author
matures. The copying behavior is reported to exist in all fields of computer
science; however, the theoretical fields indicate more copying than the applied
fields
- …