25 research outputs found
Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection
In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology
and framework for efficient and effective real-time malware detection,
leveraging the best of conventional machine learning (ML) and deep learning
(DL) algorithms. In PROPEDEUTICA, all software processes in the system start
execution subjected to a conventional ML detector for fast classification. If a
piece of software receives a borderline classification, it is subjected to
further analysis via more performance expensive and more accurate DL methods,
via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays
to the execution of software subjected to deep learning analysis as a way to
"buy time" for DL analysis and to rate-limit the impact of possible malware in
the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and
877 commonly used benign software samples from various categories for the
Windows OS. Our results show that the false positive rate for conventional ML
methods can reach 20%, and for modern DL methods it is usually below 6%.
However, the classification time for DL can be 100X longer than conventional ML
methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional
ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the
percentage of software subjected to DL analysis was approximately 40% on
average. Further, the application of delays in software subjected to ML reduced
the detection time by approximately 10%. Finally, we found and discussed a
discrepancy between the detection accuracy offline (analysis after all traces
are collected) and on-the-fly (analysis in tandem with trace collection). Our
insights show that conventional ML and modern DL-based malware detectors in
isolation cannot meet the needs of efficient and effective malware detection:
high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure
Improving Question Generation with Sentence-level Semantic Matching and Answer Position Inferring
Taking an answer and its context as input, sequence-to-sequence models have
made considerable progress on question generation. However, we observe that
these approaches often generate wrong question words or keywords and copy
answer-irrelevant words from the input. We believe that lacking global question
semantics and exploiting answer position-awareness not well are the key root
causes. In this paper, we propose a neural question generation model with two
concrete modules: sentence-level semantic matching and answer position
inferring. Further, we enhance the initial state of the decoder by leveraging
the answer-aware gated fusion mechanism. Experimental results demonstrate that
our model outperforms the state-of-the-art (SOTA) models on SQuAD and MARCO
datasets. Owing to its generality, our work also improves the existing models
significantly.Comment: Revised version of paper accepted to Thirty-fourth AAAI Conference on
Artificial Intelligenc