Search CORE

79,856 research outputs found

Mimic Learning to Generate a Shareable Network Intrusion Detection Model

Author: Baza Mohamed
Fouda Mostafa M.
Mahmoud Mohamed
Nabil Mahmoud
Shafee Ahmed
Talbert Douglas A.
Publication venue
Publication date: 18/02/2020
Field of study

Purveyors of malicious network attacks continue to increase the complexity and the sophistication of their techniques, and their ability to evade detection continues to improve as well. Hence, intrusion detection systems must also evolve to meet these increasingly challenging threats. Machine learning is often used to support this needed improvement. However, training a good prediction model can require a large set of labelled training data. Such datasets are difficult to obtain because privacy concerns prevent the majority of intrusion detection agencies from sharing their sensitive data. In this paper, we propose the use of mimic learning to enable the transfer of intrusion detection knowledge through a teacher model trained on private data to a student model. This student model provides a mean of publicly sharing knowledge extracted from private data without sharing the data itself. Our results confirm that the proposed scheme can produce a student intrusion detection model that mimics the teacher model without requiring access to the original dataset

arXiv.org e-Print Archive

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

Author: Anderson Hyrum S.
Roth Phil
Publication venue
Publication date: 16/04/2018
Field of study

This paper describes EMBER: a labeled benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files. The dataset includes features extracted from 1.1M binary files: 900K training samples (300K malicious, 300K benign, 300K unlabeled) and 200K test samples (100K malicious, 100K benign). To accompany the dataset, we also release open source code for extracting features from additional binaries so that additional sample features can be appended to the dataset. This dataset fills a void in the information security machine learning community: a benign/malicious dataset that is large, open and general enough to cover several interesting use cases. We enumerate several use cases that we considered when structuring the dataset. Additionally, we demonstrate one use case wherein we compare a baseline gradient boosted decision tree model trained using LightGBM with default settings to MalConv, a recently published end-to-end (featureless) deep learning model for malware detection. Results show that even without hyper-parameter optimization, the baseline EMBER model outperforms MalConv. The authors hope that the dataset, code and baseline model provided by EMBER will help invigorate machine learning research for malware detection, in much the same way that benchmark datasets have advanced computer vision research

arXiv.org e-Print Archive

Multiclass Road Sign Detection using Multiplicative Kernel

Author: Zadrija Valentina
Šegvić Siniša
Publication venue
Publication date: 01/10/2013
Field of study

We consider the problem of multiclass road sign detection using a classification function with multiplicative kernel comprised from two kernels. We show that problems of detection and within-foreground classification can be jointly solved by using one kernel to measure object-background differences and another one to account for within-class variations. The main idea behind this approach is that road signs from different foreground variations can share features that discriminate them from backgrounds. The classification function training is accomplished using SVM, thus feature sharing is obtained through support vector sharing. Training yields a family of linear detectors, where each detector corresponds to a specific foreground training sample. The redundancy among detectors is alleviated using k-medoids clustering. Finally, we report detection and classification results on a set of road sign images obtained from a camera on a moving vehicle.Comment: Part of the Proceedings of the Croatian Computer Vision Workshop, CCVW 2013, Year

arXiv.org e-Print Archive

Differentially Private Collaborative Intrusion Detection Systems For VANETs

Author: Zhang Tao
Zhu Quanyan
Publication venue
Publication date: 02/05/2020
Field of study

Vehicular ad hoc network (VANET) is an enabling technology in modern transportation systems for providing safety and valuable information, and yet vulnerable to a number of attacks from passive eavesdropping to active interfering. Intrusion detection systems (IDSs) are important devices that can mitigate the threats by detecting malicious behaviors. Furthermore, the collaborations among vehicles in VANETs can improve the detection accuracy by communicating their experiences between nodes. To this end, distributed machine learning is a suitable framework for the design of scalable and implementable collaborative detection algorithms over VANETs. One fundamental barrier to collaborative learning is the privacy concern as nodes exchange data among them. A malicious node can obtain sensitive information of other nodes by inferring from the observed data. In this paper, we propose a privacy-preserving machine-learning based collaborative IDS (PML-CIDS) for VANETs. The proposed algorithm employs the alternating direction method of multipliers (ADMM) to a class of empirical risk minimization (ERM) problems and trains a classifier to detect the intrusions in the VANETs. We use the differential privacy to capture the privacy notation of the PML-CIDS and propose a method of dual variable perturbation to provide dynamic differential privacy. We analyze theoretical performance and characterize the fundamental tradeoff between the security and privacy of the PML-CIDS. We also conduct numerical experiments using the NSL-KDD dataset to corroborate the results on the detection accuracy, security-privacy tradeoffs, and design

arXiv.org e-Print Archive

Constrained Generative Adversarial Network Ensembles for Sharable Synthetic Data Generation

Author: Bigelow Matthew
Dikici Engin
Erdal Barbaros Selnur
Prevedello Luciano M.
White Richard D.
Publication venue
Publication date: 28/02/2020
Field of study

The sharing of medical imaging datasets between institutions, and even inside the same institution, is limited by various regulations/legal barriers. Although these limitations are necessities for protecting patient privacy and setting strict boundaries for data ownership, medical research projects that require large datasets suffer considerably as a result. Machine learning has been revolutionized with the emerging deep neural network approaches over recent years, making the data-related limitations even a larger problem as these novel techniques commonly require immense imaging datasets. This paper introduces constrained Generative Adversarial Network ensembles (cGANe) to address this problem by altering the representation of the imaging data, whereas containing the significant information, enabling the reproduction of similar research results elsewhere with the sharable data. Accordingly, a framework representing the generation of a cGANe is described, and the approach is validated for the generation of synthetic 3D brain metastatic region data from T1-weighted contrast-enhanced MRI studies. For 90% brain metastases (BM) detection sensitivity, our previously reported detection algorithm produced on average 9.12 false-positive BM detections per patient after training with the original data, whereas producing 9.53 false-positives after training with the cGANe generated synthetic data. Although the applicability of the introduced approach needs further validation studies with a range of medical imaging data types, the results suggest that the BM-detection algorithm can achieve comparable performance by using cGANe generated synthetic data. Hence, the generalization of the proposed approach for various modalities may occur in the near future

arXiv.org e-Print Archive

Taxonomy driven indicator scoring in MISP threat intelligence platforms

Author: Dulaunoy Alexandre
Iklody Andras
Mokaddem Sami
Wagener Gerard
Publication venue
Publication date: 08/02/2019
Field of study

IT security community is recently facing a change of trend from closed to open working groups and from restrictive information to full information disclosure and sharing. One major feature for this trend change is the number of incidents and various Indicators of compromise (IoC) that appear on a daily base, which can only be faced and solved in a collaborative way. Sharing information is key to stay on top of the threats. To cover the needs of having a medium for information sharing, different initiatives were taken such as the Open Source Threat Intelligence and Sharing Platform called MISP. At current state, this sharing and collection platform has become far more than a malware information sharing platform. It includes all kind of IoCs, malware and vulnerabilities, but also financial threat or fraud information. Hence, the volume of information is increasing and evolving. In this paper we present implemented distributed data interaction methods for MISP followed by a generic scoring model for decaying information that is shared within MISP communities. As the MISP community members do not have the same objectives, use cases and implementations of the scoring model are discussed. A commonly encountered use case in practice is the detection of indicators of compromise in operational networks.Comment: 10 pages, 13 figures. arXiv admin note: substantial text overlap with arXiv:1803.1105

arXiv.org e-Print Archive

A Survey on the Security of Pervasive Online Social Networks (POSNs)

Author: Choudhary Gaurav
Gupta Takshi
Sharma Vishal
Publication venue
Publication date: 19/06/2018
Field of study

Pervasive Online Social Networks (POSNs) are the extensions of Online Social Networks (OSNs) which facilitate connectivity irrespective of the domain and properties of users. POSNs have been accumulated with the convergence of a plethora of social networking platforms with a motivation of bridging their gap. Over the last decade, OSNs have visually perceived an altogether tremendous amount of advancement in terms of the number of users as well as technology enablers. A single OSN is the property of an organization, which ascertains smooth functioning of its accommodations for providing a quality experience to their users. However, with POSNs, multiple OSNs have coalesced through communities, circles, or only properties, which make service-provisioning tedious and arduous to sustain. Especially, challenges become rigorous when the focus is on the security perspective of cross-platform OSNs, which are an integral part of POSNs. Thus, it is of utmost paramountcy to highlight such a requirement and understand the current situation while discussing the available state-of-the-art. With the modernization of OSNs and convergence towards POSNs, it is compulsory to understand the impact and reach of current solutions for enhancing the security of users as well as associated services. This survey understands this requisite and fixates on different sets of studies presented over the last few years and surveys them for their applicability to POSNs...Comment: 39 Pages, 10 Figure

arXiv.org e-Print Archive

Combating Fake News: A Survey on Identification and Mitigation Techniques

Author: Jiang He
Liu Yan
Qian Feng
Ruchansky Natali
Sharma Karishma
Zhang Ming
Publication venue
Publication date: 18/01/2019
Field of study

The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news, and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users' engagements with the news on social media, there has been a rising interest in proactive intervention strategies to counter the spread of misinformation and its impact on society. In this survey, we describe the modern-day problem of fake news and, in particular, highlight the technical challenges associated with it. We discuss existing methods and techniques applicable to both identification and mitigation, with a focus on the significant advances in each method and their advantages and limitations. In addition, research has often been limited by the quality of existing datasets and their specific application contexts. To alleviate this problem, we comprehensively compile and summarize characteristic features of available datasets. Furthermore, we outline new directions of research to facilitate future development of effective and interdisciplinary solutions

arXiv.org e-Print Archive

A Survey on Malicious Domains Detection through DNS Data Analysis

Author: Dacier Marc
Khalil Issa
Yu Ting
Zhauniarovich Yury
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/05/2018
Field of study

Malicious domains are one of the major resources required for adversaries to run attacks over the Internet. Due to the important role of the Domain Name System (DNS), extensive research has been conducted to identify malicious domains based on their unique behavior reflected in different phases of the life cycle of DNS queries and responses. Existing approaches differ significantly in terms of intuitions, data analysis methods as well as evaluation methodologies. This warrants a thorough systematization of the approaches and a careful review of the advantages and limitations of every group. In this paper, we perform such an analysis. In order to achieve this goal, we present the necessary background knowledge on DNS and malicious activities leveraging DNS. We describe a general framework of malicious domain detection techniques using DNS data. Applying this framework, we categorize existing approaches using several orthogonal viewpoints, namely (1) sources of DNS data and their enrichment, (2) data analysis methods, and (3) evaluation strategies and metrics. In each aspect, we discuss the important challenges that the research community should address in order to fully realize the power of DNS data analysis to fight against attacks leveraging malicious domains.Comment: 35 pages, to appear in ACM CSU

arXiv.org e-Print Archive

ScreenAvoider: Protecting Computer Screens from Ubiquitous Cameras

Author: Chen Dennis
Crandall David
Kapadia Apu
Korayem Mohammed
Templeman Robert
Publication venue
Publication date: 27/11/2014
Field of study

We live and work in environments that are inundated with cameras embedded in devices such as phones, tablets, laptops, and monitors. Newer wearable devices like Google Glass, Narrative Clip, and Autographer offer the ability to quietly log our lives with cameras from a `first person' perspective. While capturing several meaningful and interesting moments, a significant number of images captured by these wearable cameras can contain computer screens. Given the potentially sensitive information that is visible on our displays, there is a need to guard computer screens from undesired photography. People need protection against photography of their screens, whether by other people's cameras or their own cameras. We present ScreenAvoider, a framework that controls the collection and disclosure of images with computer screens and their sensitive content. ScreenAvoider can detect images with computer screens with high accuracy and can even go so far as to discriminate amongst screen content. We also introduce a ScreenTag system that aids in the identification of screen content, flagging images with highly sensitive content such as messaging applications or email webpages. We evaluate our concept on realistic lifelogging datasets, showing that ScreenAvoider provides a practical and useful solution that can help users manage their privacy

arXiv.org e-Print Archive