Search CORE

992 research outputs found

Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

Author: Andrii Shalaginov
D Krishna Sandeep Reddy
Farid Daryabar
Igor Santos
Reinaldo Jose Mangialardo
Smita Naval
Steve Watson
Teuvo Kohonen
Yanfang Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/08/2018
Field of study

Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

arXiv.org e-Print Archive

Crossref

PPS-ADS: A Framework for Privacy-Preserved and Secured Distributed System Architecture for Handling Big Data

Author: Ahad Mohd Abdul
Biswas Ranjit
Publication venue: 'Insight Society'
Publication date: 31/08/2018
Field of study

The exponential expansion of Big Data in 7V’s (velocity, variety, veracity, value, variability and visualization) brings forth new challenges to security, reliability, availability and privacy of these data sets. Traditional security techniques and algorithms fail to complement this gigantic big data. This paper aims to improve the recently proposed Atrain Distributed System (ADS) by incorporating new features which will cater to the end-to-end availability and security aspects of the big data in the distributed system. The paper also integrates the concept of Software Defined Networking (SDN) in ADS to effectively control and manage the routing of the data item in the ADS. The storage of data items in the ADS is done on the basis of the type of data (structured or unstructured), the capacity of the distributed system (or coach) and the distance of coach from the pilot computer (PC). In order to maintain the consistency of data and to eradicate the possible loss of data, the concept of “forward positive” and “backward positive” acknowledgment is proposed. Furthermore, we have incorporated “Twofish” cryptographic technique to encrypt the big data in the ADS. Issues like “data ownership”, “data security, “data privacy” and data reliability” are pivotal while handling the big data. The current paper presents a framework for a privacy-preserved architecture for handling the big data in an effective manner

International Journal on Advanced Science, Engineering and Information Technology

Framework and Algorithms for Operator-Managed Content Caching

Author: Pavlou G
Psaras I
Saino L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2020
Field of study

We propose a complete framework targeting operator-driven content caching that can be equally applied to both ISP-operated Content Delivery Networks (CDNs) and future Information-Centric Networks (ICNs). In contrast to previous proposals in this area, our solution leverages operators’ control on cache placement and content routing, managing to considerably reduce network operating costs by minimizing the amount of transit traffic and balancing load among available network resources. In addition, our solution provides two key advantages over previous proposals. First, it allows for a simple computation of the optimal cache placement. Second, it provides knobs for operators to fine-tune performance. We validate our design through both analytical modeling and trace-driven simulations and show that our proposed solution achieves on average twice as many cache hits in comparison to previously proposed techniques, without increasing delivery latency. In addition, we show that the proposed framework achieves 19-33% better load balancing across links and caching nodes, being also robust to traffic spikes

UCL Discovery

Recommended from our members

Retrieving information from heterogeneous freight data sources to answer natural language queries

Author: Seedah Dan Paapanyin Kofi
Publication venue
Publication date: 09/02/2015
Field of study

textThe ability to retrieve accurate information from databases without an extensive knowledge of the contents and organization of each database is extremely beneficial to the dissemination and utilization of freight data. The challenges, however, are: 1) correctly identifying only the relevant information and keywords from questions when dealing with multiple sentence structures, and 2) automatically retrieving, preprocessing, and understanding multiple data sources to determine the best answer to user’s query. Current named entity recognition systems have the ability to identify entities but require an annotated corpus for training which in the field of transportation planning does not currently exist. A hybrid approach which combines multiple models to classify specific named entities was therefore proposed as an alternative. The retrieval and classification of freight related keywords facilitated the process of finding which databases are capable of answering a question. Values in data dictionaries can be queried by mapping keywords to data element fields in various freight databases using ontologies. A number of challenges still arise as a result of different entities sharing the same names, the same entity having multiple names, and differences in classification systems. Dealing with ambiguities is required to accurately determine which database provides the best answer from the list of applicable sources. This dissertation 1) develops an approach to identify and classifying keywords from freight related natural language queries, 2) develops a standardized knowledge representation of freight data sources using an ontology that both computer systems and domain experts can utilize to identify relevant freight data sources, and 3) provides recommendations for addressing ambiguities in freight related named entities. Finally, the use of knowledge base expert systems to intelligently sift through data sources to determine which ones provide the best answer to a user’s question is proposed.Civil, Architectural, and Environmental Engineerin

Texas ScholarWorks

Hadoop neural network for parallel and distributed feature selection

Author: Austin
Bentz
Borthakur
Casasent
Chu
Dash
Fayyad
Fisher
Forman
Franks
Guyon
Guyon
Hall
Hall
Hall
Hall
Han
Hebb
Hodge
Hodge
Hodge
Hodge
Hodge
Hodge
Jim Austin
Jolliffe
Kohavi
Kumar
Liu
Liu
Liu
McCallum
Palm
Quinlan
Quinlan
Reggiani
Rutman
Shvachko
Simon O’Keefe
Sun
Victoria J. Hodge
Weeks
Weeks
Wettscherek
Willshaw
Witten
Zhang
Zikopoulos
Publication venue: 'Elsevier BV'
Publication date: 01/06/2016
Field of study

In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop

Elsevier - Publisher Connector

Crossref

White Rose Research Online

Container Handling Algorithms and Outbound Heavy Truck Movement Modeling for Seaport Container Transshipment Terminals

Author: Hussein Mazen I.
Publication venue: UWM Digital Commons
Publication date: 01/12/2012
Field of study

This research is divided into four main parts. The first part considers the basic block relocation problem (BRP) in which a set of shipping containers is retrieved using the minimum number of moves by a single gantry crane that handles cargo in the storage area in a container terminal. For this purpose a new algorithm called the look ahead algorithm has been created and tested. The look ahead algorithm is applicable under limited and unlimited stacking height conditions. The look ahead algorithm is compared to the existing algorithms in the literature. The experimental results show that the look ahead algorithm is more efficient than any other algorithm in the literature. The second part of this research considers an extension of the BRP called the block relocation problem with weights (BRP-W). The main goal is to minimize the total fuel consumption of the crane to retrieve all the containers in a bay and to minimize the movements of the heavy containers. The trolleying, hoisting, and lowering movements of the containers are explicitly considered in this part. The twelve parameters to quantify various preferences when moving individual containers are defined. Near-optimal values of the twelve parameters for different bay configurations are found using a genetic algorithm. The third part introduces a shipping cost model that can estimate the cost of shipping specific commodity groups using one freight transportation mode-trucking- from any origin to any destination inside the United States. The model can also be used to estimate general shipping costs for different economic sectors, with significant ramifications for public policy. The last part mimics heavy truck movements for shipping different kinds of containerized commodities between a container terminal and different facilities. The highly detailed cost model from part three is used to evaluate the effect of public policies on truckers\u27 route choices. In particular, the influence of time, distance, and tolls on truckers\u27 route selection is investigated.

University of Wisconsin-Milwaukee

Object detection, recognition and re-identification in video footage

Author: Martins Irhebhude (7170065)
Publication venue
Publication date: 01/01/2015
Field of study

There has been a significant number of security concerns in recent times; as a result, security cameras have been installed to monitor activities and to prevent crimes in most public places. These analysis are done either through video analytic or forensic analysis operations on human observations. To this end, within the research context of this thesis, a proactive machine vision based military recognition system has been developed to help monitor activities in the military environment. The proposed object detection, recognition and re-identification systems have been presented in this thesis. A novel technique for military personnel recognition is presented in this thesis. Initially the detected camouflaged personnel are segmented using a grabcut segmentation algorithm. Since in general a camouflaged personnel's uniform appears to be similar both at the top and the bottom of the body, an image patch is initially extracted from the segmented foreground image and used as the region of interest. Subsequently the colour and texture features are extracted from each patch and used for classification. A second approach for personnel recognition is proposed through the recognition of the badge on the cap of a military person. A feature matching metric based on the extracted Speed Up Robust Features (SURF) from the badge on a personnel's cap enabled the recognition of the personnel's arm of service. A state-of-the-art technique for recognising vehicle types irrespective of their view angle is also presented in this thesis. Vehicles are initially detected and segmented using a Gaussian Mixture Model (GMM) based foreground/background segmentation algorithm. A Canny Edge Detection (CED) stage, followed by morphological operations are used as pre-processing stage to help enhance foreground vehicular object detection and segmentation. Subsequently, Region, Histogram Oriented Gradient (HOG) and Local Binary Pattern (LBP) features are extracted from the refined foreground vehicle object and used as features for vehicle type recognition. Two different datasets with variant views of front/rear and angle are used and combined for testing the proposed technique. For night-time video analytics and forensics, the thesis presents a novel approach to pedestrian detection and vehicle type recognition. A novel feature acquisition technique named, CENTROG, is proposed for pedestrian detection and vehicle type recognition in this thesis. Thermal images containing pedestrians and vehicular objects are used to analyse the performance of the proposed algorithms. The video is initially segmented using a GMM based foreground object segmentation algorithm. A CED based pre-processing step is used to enhance segmentation accuracy prior using Census Transforms for initial feature extraction. HOG features are then extracted from the Census transformed images and used for detection and recognition respectively of human and vehicular objects in thermal images. Finally, a novel technique for people re-identification is proposed in this thesis based on using low-level colour features and mid-level attributes. The low-level colour histogram bin values were normalised to 0 and 1. A publicly available dataset (VIPeR) and a self constructed dataset have been used in the experiments conducted with 7 clothing attributes and low-level colour histogram features. These 7 attributes are detected using features extracted from 5 different regions of a detected human object using an SVM classifier. The low-level colour features were extracted from the regions of a detected human object. These 5 regions are obtained by human object segmentation and subsequent body part sub-division. People are re-identified by computing the Euclidean distance between a probe and the gallery image sets. The experiments conducted using SVM classifier and Euclidean distance has proven that the proposed techniques attained all of the aforementioned goals. The colour and texture features proposed for camouflage military personnel recognition surpasses the state-of-the-art methods. Similarly, experiments prove that combining features performed best when recognising vehicles in different views subsequent to initial training based on multi-views. In the same vein, the proposed CENTROG technique performed better than the state-of-the-art CENTRIST technique for both pedestrian detection and vehicle type recognition at night-time using thermal images. Finally, we show that the proposed 7 mid-level attributes and the low-level features results in improved performance accuracy for people re-identification

Loughborough University Institutional Repository

Modeling Suspicious Email Detection using Enhanced Feature Selection

Author: Karampelas Panagiotis
Memon Nasrullah
Nizamani Sarwat
Wiil Uffe Kock
Publication venue
Publication date: 01/01/2012
Field of study

The paper presents a suspicious email detection model which incorporates enhanced feature selection. In the paper we proposed the use of feature selection strategies along with classification technique for terrorists email detection. The presented model focuses on the evaluation of machine learning algorithms such as decision tree (ID3), logistic regression, Na\"ive Bayes (NB), and Support Vector Machine (SVM) for detecting emails containing suspicious content. In the literature, various algorithms achieved good accuracy for the desired task. However, the results achieved by those algorithms can be further improved by using appropriate feature selection mechanisms. We have identified the use of a specific feature selection scheme that improves the performance of the existing algorithms

arXiv.org e-Print Archive

University of Southern Denmark Research Output