392 research outputs found

    Image-based malware classification hybrid framework based on space-filling curves

    Get PDF
    There exists a never-ending ā€œarms raceā€ between malware analysts and adversarial malicious code developers as malevolent programs evolve and countermeasures are developed to detect and eradicate them. Malware has become more complex in its intent and capabilities over time, which has prompted the need for constant improvement in detection and defence methods. Of particular concern are the anti-analysis obfuscation techniques, such as packing and encryption, that are employed by malware developers to evade detection and thwart the analysis process. In such cases, malware is generally impervious to basic analysis methods and so analysts must use more invasive techniques to extract signatures for classification, which are inevitably not scalable due to their complexity. In this article, we present a hybrid framework for malware classification designed to overcome the challenges incurred by current approaches. The framework incorporates novel static and dynamic malware analysis methods, where static malware executables and dynamic process memory dumps are converted to images mapped through space-filling curves, from which visual features are extracted for classification. The framework is less invasive than traditional analysis methods in that there is no reverse engineering required, nor does it suffer from the obfuscation limitations of static analysis. On a dataset of 13,599 obfuscated and non-obfuscated malware samples from 23 families, the framework outperformed both static and dynamic standalone methods with precision, recall and accuracy scores of 97.6%, 97.6% and 97.6% respectively

    A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification

    Get PDF
    kk Nearest Neighbors (kkNN) is one of the most widely used supervised learning algorithms to classify Gaussian distributed data, but it does not achieve good results when it is applied to nonlinear manifold distributed data, especially when a very limited amount of labeled samples are available. In this paper, we propose a new graph-based kkNN algorithm which can effectively handle both Gaussian distributed data and nonlinear manifold distributed data. To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by constructing an RR-level nearest-neighbor strengthened tree over the graph, and then compute a TRW matrix for similarity measurement purposes. After this, the nearest neighbors are identified according to the TRW matrix and the class label of a query point is determined by the sum of all the TRW weights of its nearest neighbors. To deal with online situations, we also propose a new algorithm to handle sequential samples based a local neighborhood reconstruction. Comparison experiments are conducted on both synthetic data sets and real-world data sets to demonstrate the validity of the proposed new kkNN algorithm and its improvements to other version of kkNN algorithms. Given the widespread appearance of manifold structures in real-world problems and the popularity of the traditional kkNN algorithm, the proposed manifold version kkNN shows promising potential for classifying manifold-distributed data.Comment: 32 pages, 12 figures, 7 table

    Secure and private fingerprint-based authentication

    Get PDF
    This thesis studies the requirements and processes involved in building an authentication system using the fingerprint biometric, where the fingerprint template is protected during storage and during comparison. The principles developed in this thesis can be easily extended to authentication systems using other biometric modalities. Most existing biometric authentication systems store their template securely using an encryption function. However, in order to perform matching, the enrolled template must be decrypted. It is at this point that the authentication system is most vulnerable as the entire enrolled template is exposed. A biometric is irreplaceable if compromised and can also reveal sensitive information about an individual. If biometric systems are taken up widely, the template could also be used as an individual's digital identifier. Compromise in that case, violates an individual's right to privacy as their transactions in all systems where they used that compromised biometric can be tracked. Therefore securing a biometric template during comparison as well as storage in an authentication system is imperative. Eight different fingerprint template representation techniques, where templates were treated as a set of elements derived from the locations and orientations of fingerprint minutiae, were studied. Four main steps to build any biometric based authentication system were identified and each of the eight fingerprint template representations was inducted through the four steps. Two distinct Error Tolerant Cryptographic Constructs based on the set difference metric, were studied for their ability to securely store and compare each of the template types in an authentication system. The first construct was found to be unsuitable for a fundamental reason that would apply to all the template types considered in the research. The second construct did not have the limitation of the first and three algorithms to build authentication systems using the second construct were proposed. It was determined that minutiae-based templates had significant intra sample variation as a result of which a very relaxed matching threshold had to be set in the authentication system. The relaxed threshold caused the authentication systems built using the first two algorithms to reveal enough information about the stored templates to render them insecure. It was found that in cases of such large intra-sample variation, a commonality based match decision was more appropriate. One solution to building a secure authentication system using minutiae-based templates was demonstrated by the third algorithm which used a two stage matching process involving the second cryptographic construct and a commonality based similarity measure in the two stages respectively. This implementation was successful in securing the fingerprint template during comparison as well as storage, with minimal reduction in accuracy when compared to the matching performance without the cryptographic construct. Another solution is to use an efficient commonality based error tolerant cryptographic construct. This thesis lists the desirable characteristics of such a construct as existence of any is unknown to date. This thesis concludes by presenting good guidelines to evaluate the suitability of different cryptographic constructs to protect biometric templates of other modalities in an authentication system

    Improving privacy preserving in modern applications

    Full text link
    The thesis studies the privacy problems in various modern applications, such as recommendation system, Internet of Things, location-based service and crowdsourcing system. The corresponding solutions are proposed, and the proposed solutions not only protect the data privacy with guaranteed privacy level, but also enhancing the data utility

    APIC: A method for automated pattern identification and classification

    Get PDF
    Machine Learning (ML) is a transformative technology at the forefront of many modern research endeavours. The technology is generating a tremendous amount of attention from researchers and practitioners, providing new approaches to solving complex classification and regression tasks. While concepts such as Deep Learning have existed for many years, the computational power for realising the utility of these algorithms in real-world applications has only recently become available. This dissertation investigated the efficacy of a novel, general method for deploying ML in a variety of complex tasks, where best feature selection, data-set labelling, model definition and training processes were determined automatically. Models were developed in an iterative fashion, evaluated using both training and validation data sets. The proposed method was evaluated using three distinct case studies, describing complex classification tasks often requiring significant input from human experts. The results achieved demonstrate that the proposed method compares with, and often outperforms, less general, comparable methods designed specifically for each task. Feature selection, data-set annotation, model design and training processes were optimised by the method, where less complex, comparatively accurate classifiers with lower dependency on computational power and human expert intervention were produced. In chapter 4, the proposed method demonstrated improved efficacy over comparable systems, automatically identifying and classifying complex application protocols traversing IP networks. In chapter 5, the proposed method was able to discriminate between normal and anomalous traffic, maintaining accuracy in excess of 99%, while reducing false alarms to a mere 0.08%. Finally, in chapter 6, the proposed method discovered more optimal classifiers than those implemented by comparable methods, with classification scores rivalling those achieved by state-of-the-art systems. The findings of this research concluded that developing a fully automated, general method, exhibiting efficacy in a wide variety of complex classification tasks with minimal expert intervention, was possible. The method and various artefacts produced in each case study of this dissertation are thus significant contributions to the field of ML

    Data Exfiltration:A Review of External Attack Vectors and Countermeasures

    Get PDF
    AbstractContext One of the main targets of cyber-attacks is data exfiltration, which is the leakage of sensitive or private data to an unauthorized entity. Data exfiltration can be perpetrated by an outsider or an insider of an organization. Given the increasing number of data exfiltration incidents, a large number of data exfiltration countermeasures have been developed. These countermeasures aim to detect, prevent, or investigate exfiltration of sensitive or private data. With the growing interest in data exfiltration, it is important to review data exfiltration attack vectors and countermeasures to support future research in this field. Objective This paper is aimed at identifying and critically analysing data exfiltration attack vectors and countermeasures for reporting the status of the art and determining gaps for future research. Method We have followed a structured process for selecting 108 papers from seven publication databases. Thematic analysis method has been applied to analyse the extracted data from the reviewed papers. Results We have developed a classification of (1) data exfiltration attack vectors used by external attackers and (2) the countermeasures in the face of external attacks. We have mapped the countermeasures to attack vectors. Furthermore, we have explored the applicability of various countermeasures for different states of data (i.e., in use, in transit, or at rest). Conclusion This review has revealed that (a) most of the state of the art is focussed on preventive and detective countermeasures and significant research is required on developing investigative countermeasures that are equally important; (b) Several data exfiltration countermeasures are not able to respond in real-time, which specifies that research efforts need to be invested to enable them to respond in real-time (c) A number of data exfiltration countermeasures do not take privacy and ethical concerns into consideration, which may become an obstacle in their full adoption (d) Existing research is primarily focussed on protecting data in ā€˜in useā€™ state, therefore, future research needs to be directed towards securing data in ā€˜in restā€™ and ā€˜in transitā€™ states (e) There is no standard or framework for evaluation of data exfiltration countermeasures. We assert the need for developing such an evaluation framework
    • ā€¦
    corecore