1,016 research outputs found
R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections
The influence of Deep Learning on image identification and natural language
processing has attracted enormous attention globally. The convolution neural
network that can learn without prior extraction of features fits well in
response to the rapid iteration of Android malware. The traditional solution
for detecting Android malware requires continuous learning through
pre-extracted features to maintain high performance of identifying the malware.
In order to reduce the manpower of feature engineering prior to the condition
of not to extract pre-selected features, we have developed a coloR-inspired
convolutional neuRal networks (CNN)-based AndroiD malware Detection (R2-D2)
system. The system can convert the bytecode of classes.dex from Android archive
file to rgb color code and store it as a color image with fixed size. The color
image is input to the convolutional neural network for automatic feature
extraction and training. The data was collected from Jan. 2017 to Aug 2017.
During the period of time, we have collected approximately 2 million of benign
and malicious Android apps for our experiments with the help from our research
partner Leopard Mobile Inc. Our experiment results demonstrate that the
proposed system has accurate security analysis on contracts. Furthermore, we
keep our research results and experiment materials on http://R2D2.TWMAN.ORG.Comment: Verison 2018/11/15, IEEE BigData 2018, Seattle, WA, USA, Dec 10-13,
2018. (Accepted
Data-Driven and Deep Learning Methodology for Deceptive Advertising and Phone Scams Detection
The advance of smartphones and cellular networks boosts the need of mobile
advertising and targeted marketing. However, it also triggers the unseen
security threats. We found that the phone scams with fake calling numbers of
very short lifetime are increasingly popular and have been used to trick the
users. The harm is worldwide. On the other hand, deceptive advertising
(deceptive ads), the fake ads that tricks users to install unnecessary apps via
either alluring or daunting texts and pictures, is an emerging threat that
seriously harms the reputation of the advertiser. To counter against these two
new threats, the conventional blacklist (or whitelist) approach and the machine
learning approach with predefined features have been proven useless.
Nevertheless, due to the success of deep learning in developing the highly
intelligent program, our system can efficiently and effectively detect phone
scams and deceptive ads by taking advantage of our unified framework on deep
neural network (DNN) and convolutional neural network (CNN). The proposed
system has been deployed for operational use and the experimental results
proved the effectiveness of our proposed system. Furthermore, we keep our
research results and release experiment material on
http://DeceptiveAds.TWMAN.ORG and http://PhoneScams.TWMAN.ORG if there is any
update.Comment: 6 pages, TAAI 2017 versio
ELECTRA is a Zero-Shot Learner, Too
Recently, for few-shot or even zero-shot learning, the new paradigm
"pre-train, prompt, and predict" has achieved remarkable achievements compared
with the "pre-train, fine-tune" paradigm. After the success of prompt-based
GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa)
prompt learning methods became popular and widely used. However, another
efficient pre-trained discriminative model, ELECTRA, has probably been
neglected. In this paper, we attempt to accomplish several NLP tasks in the
zero-shot scenario using a novel our proposed replaced token detection
(RTD)-based prompt learning method. Experimental results show that ELECTRA
model based on RTD-prompt learning achieves surprisingly state-of-the-art
zero-shot performance. Numerically, compared to MLM-RoBERTa-large and
MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8.4% and 13.7%
improvement on all 15 tasks. Especially on the SST-2 task, our
RTD-ELECTRA-large achieves an astonishing 90.1% accuracy without any training
data. Overall, compared to the pre-trained masked language models, the
pre-trained replaced token detection model performs better in zero-shot
learning. The source code is available at:
https://github.com/nishiwen1214/RTD-ELECTRA.Comment: The source code is available at:
https://github.com/nishiwen1214/RTD-ELECTR
Robustness Study of Free-Text Speaker Identification and Verification
Usable free-text speaker identification and verification systems must exhibit robustness under varying operational conditions. We studied the degree of robustness provided by various signal processing techniques - spectrum subtraction, bandpass liftering, RASTA filtering, ISDCN, and stereo database normalization. The experiments were performed on a widely used, challenging long distance telephone database. This database consists of data recorded at two different sites, with data from one site much poorer in quality than the other; further, the recording equipment had been inadvertently changed for the later half of the sessions resulting in a significantly changed environment. Our study identifies the combination of techniques that provides consistent and significant improvements; our results surpass other published results on the same task. We further verified the results on two other databases and achieved consistent improvements. Detailed results on exhaustive experimentation are presented along with appropriate discussions
Low Complexity CELP Speech Coding at 4.8 kbps
Low bit rate, high quality speech coding is a vital part in voice telecommunication systems. The introduction of CELP (1982) (Codebook Excited Linear Prediction) speech coding provides a feasible way to compress speech data to 4.8 kbps with high quality, but the formidable computational complexity required for real-time processing has prevented its wide application. In this thesis, we reduce the computational complexity to 5 MIPS (million instructions per second), which can be handled by even inexpensive DSP chips, while maintaining the same high quality. We hope our contribution can finally make CELP coding a widely applicable technology
A New Deterministic Codebook Structure for CELP Speech Coding
Low bit rate, high quality speech coding is a vital part in voicetelecommunication systems. The introduction of CELP (1984) (Codebook Excited Linear Prediction) speech coding provided a feasible way to compress speech data to 4.8 kbps with high quality, but the formidable computational complexity required for real-time processing has prevented its wide application. Using the new deterministic codebook, we reduce the computational complexity of codebook search, which originally accounted for 2/3 of the computational complexity, to negligible. Based on this reduction, we produce an algorithm with complexity of about 5MIPS. It can be implemented in even inexpensive DSP chips, while maintaining the same high quality. In addition to extremely simpleencoding and decoding schemes, this codebook also provides optimalerror tolerance and it doesn't require codebook storage.We hope that this contribution can finally make CELP speech coding a widely applicable and practical technology
- …