231 research outputs found

    Advanced Machine Learning Techniques and Meta-Heuristic Optimization for the Detection of Masquerading Attacks in Social Networks

    Get PDF
    According to the report published by the online protection firm Iovation in 2012, cyber fraud ranged from 1 percent of the Internet transactions in North America Africa to a 7 percent in Africa, most of them involving credit card fraud, identity theft, and account takeover or hĀ¼acking attempts. This kind of crime is still growing due to the advantages offered by a non face-to-face channel where a increasing number of unsuspecting victims divulges sensitive information. Interpol classifies these illegal activities into 3 types: ā€¢ Attacks against computer hardware and software. ā€¢ Financial crimes and corruption. ā€¢ Abuse, in the form of grooming or ā€œsexploitationā€. Most research efforts have been focused on the target of the crime developing different strategies depending on the casuistic. Thus, for the well-known phising, stored blacklist or crime signals through the text are employed eventually designing adhoc detectors hardly conveyed to other scenarios even if the background is widely shared. Identity theft or masquerading can be described as a criminal activity oriented towards the misuse of those stolen credentials to obtain goods or services by deception. On March 4, 2005, a million of personal and sensitive information such as credit card and social security numbers was collected by White Hat hackers at Seattle University who just surfed the Web for less than 60 minutes by means of the Google search engine. As a consequence they proved the vulnerability and lack of protection with a mere group of sophisticated search terms typed in the engine whose large data warehouse still allowed showing company or government websites data temporarily cached. As aforementioned, platforms to connect distant people in which the interaction is undirected pose a forcible entry for unauthorized thirds who impersonate the licit user in a attempt to go unnoticed with some malicious, not necessarily economic, interests. In fact, the last point in the list above regarding abuses has become a major and a terrible risk along with the bullying being both by means of threats, harassment or even self-incrimination likely to drive someone to suicide, depression or helplessness. California Penal Code Section 528.5 states: ā€œNotwithstanding any other provision of law, any person who knowingly and without consent credibly impersonates another actual person through or on an Internet Web site or by other electronic means for purposes of harming, intimidating, threatening, or defrauding another person is guilty of a public offense punishable pursuant to subdivision [...]ā€. IV Therefore, impersonation consists of any criminal activity in which someone assumes a false identity and acts as his or her assumed character with intent to get a pecuniary benefit or cause some harm. User profiling, in turn, is the process of harvesting user information in order to construct a rich template with all the advantageous attributes in the field at hand and with specific purposes. User profiling is often employed as a mechanism for recommendation of items or useful information which has not yet considered by the client. Nevertheless, deriving user tendency or preferences can be also exploited to define the inherent behavior and address the problem of impersonation by detecting outliers or strange deviations prone to entail a potential attack. This dissertation is meant to elaborate on impersonation attacks from a profiling perspective, eventually developing a 2-stage environment which consequently embraces 2 levels of privacy intrusion, thus providing the following contributions: ā€¢ The inference of behavioral patterns from the connection time traces aiming at avoiding the usurpation of more confidential information. When compared to previous approaches, this procedure abstains from impinging on the user privacy by taking over the messages content, since it only relies on time statistics of the user sessions rather than on their content. ā€¢ The application and subsequent discussion of two selected algorithms for the previous point resolution: ā€“ A commonly employed supervised algorithm executed as a binary classifier which thereafter has forced us to figure out a method to deal with the absence of labeled instances representing an identity theft. ā€“ And a meta-heuristic algorithm in the search for the most convenient parameters to array the instances within a high dimensional space into properly delimited clusters so as to finally apply an unsupervised clustering algorithm. ā€¢ The analysis of message content encroaching on more private information but easing the user identification by mining discriminative features by Natural Language Processing (NLP) techniques. As a consequence, the development of a new feature extraction algorithm based on linguistic theories motivated by the massive quantity of features often gathered when it comes to texts. In summary, this dissertation means to go beyond typical, ad-hoc approaches adopted by previous identity theft and authorship attribution research. Specifically it proposes tailored solutions to this particular and extensively studied paradigm with the aim at introducing a generic approach from a profiling view, not tightly bound to a unique application field. In addition technical contributions have been made in the course of the solution formulation intending to optimize familiar methods for a better versatility towards the problem at hand. In summary: this Thesis establishes an encouraging research basis towards unveiling subtle impersonation attacks in Social Networks by means of intelligent learning techniques

    TOWARDS A HOLISTIC EFFICIENT STACKING ENSEMBLE INTRUSION DETECTION SYSTEM USING NEWLY GENERATED HETEROGENEOUS DATASETS

    Get PDF
    With the exponential growth of network-based applications globally, there has been a transformation in organizations\u27 business models. Furthermore, cost reduction of both computational devices and the internet have led people to become more technology dependent. Consequently, due to inordinate use of computer networks, new risks have emerged. Therefore, the process of improving the speed and accuracy of security mechanisms has become crucial.Although abundant new security tools have been developed, the rapid-growth of malicious activities continues to be a pressing issue, as their ever-evolving attacks continue to create severe threats to network security. Classical security techniquesfor instance, firewallsare used as a first line of defense against security problems but remain unable to detect internal intrusions or adequately provide security countermeasures. Thus, network administrators tend to rely predominantly on Intrusion Detection Systems to detect such network intrusive activities. Machine Learning is one of the practical approaches to intrusion detection that learns from data to differentiate between normal and malicious traffic. Although Machine Learning approaches are used frequently, an in-depth analysis of Machine Learning algorithms in the context of intrusion detection has received less attention in the literature.Moreover, adequate datasets are necessary to train and evaluate anomaly-based network intrusion detection systems. There exist a number of such datasetsas DARPA, KDDCUP, and NSL-KDDthat have been widely adopted by researchers to train and evaluate the performance of their proposed intrusion detection approaches. Based on several studies, many such datasets are outworn and unreliable to use. Furthermore, some of these datasets suffer from a lack of traffic diversity and volumes, do not cover the variety of attacks, have anonymized packet information and payload that cannot reflect the current trends, or lack feature set and metadata.This thesis provides a comprehensive analysis of some of the existing Machine Learning approaches for identifying network intrusions. Specifically, it analyzes the algorithms along various dimensionsnamely, feature selection, sensitivity to the hyper-parameter selection, and class imbalance problemsthat are inherent to intrusion detection. It also produces a new reliable dataset labeled Game Theory and Cyber Security (GTCS) that matches real-world criteria, contains normal and different classes of attacks, and reflects the current network traffic trends. The GTCS dataset is used to evaluate the performance of the different approaches, and a detailed experimental evaluation to summarize the effectiveness of each approach is presented. Finally, the thesis proposes an ensemble classifier model composed of multiple classifiers with different learning paradigms to address the issue of detection accuracy and false alarm rate in intrusion detection systems

    Tracking and Mitigation of Malicious Remote Control Networks

    Full text link
    Attacks against end-users are one of the negative side effects of todayā€™s networks. The goal of the attacker is to compromise the victimā€™s machine and obtain control over it. This machine is then used to carry out denial-of-service attacks, to send out spam mails, or for other nefarious purposes. From an attackerā€™s point of view, this kind of attack is even more efficient if she manages to compromise a large number of machines in parallel. In order to control all these machines, she establishes a "malicious remote control network", i.e., a mechanism that enables an attacker the control over a large number of compromised machines for illicit activities. The most common type of these networks observed so far are so called "botnets". Since these networks are one of the main factors behind current abuses on the Internet, we need to find novel approaches to stop them in an automated and efficient way. In this thesis we focus on this open problem and propose a general root cause methodology to stop malicious remote control networks. The basic idea of our method consists of three steps. In the first step, we use "honeypots" to collect information. A honeypot is an information system resource whose value lies in unauthorized or illicit use of that resource. This technique enables us to study current attacks on the Internet and we can for example capture samples of autonomous spreading malware ("malicious software") in an automated way. We analyze the collected data to extract information about the remote control mechanism in an automated fashion. For example, we utilize an automated binary analysis tool to find the Command & Control (C&C) server that is used to send commands to the infected machines. In the second step, we use the extracted information to infiltrate the malicious remote control networks. This can for example be implemented by impersonating as a bot and infiltrating the remote control channel. Finally, in the third step we use the information collected during the infiltration phase to mitigate the network, e.g., by shutting down the remote control channel such that the attacker cannot send commands to the compromised machines. In this thesis we show the practical feasibility of this method. We examine different kinds of malicious remote control networks and discuss how we can track all of them in an automated way. As a first example, we study botnets that use a central C&C server: We illustrate how the three steps can be implemented in practice and present empirical measurement results obtained on the Internet. Second, we investigate botnets that use a peer-to-peer based communication channel. Mitigating these botnets is harder since no central C&C server exists which could be taken offline. Nevertheless, our methodology can also be applied to this kind of networks and we present empirical measurement results substantiating our method. Third, we study fast-flux service networks. The idea behind these networks is that the attacker does not directly abuse the compromised machines, but uses them to establish a proxy network on top of these machines to enable a robust hosting infrastructure. Our method can be applied to this novel kind of malicious remote control networks and we present empirical results supporting this claim. We anticipate that the methodology proposed in this thesis can also be used to track and mitigate other kinds of malicious remote control networks

    Command & Control: Understanding, Denying and Detecting - A review of malware C2 techniques, detection and defences

    Full text link
    In this survey, we first briefly review the current state of cyber attacks, highlighting significant recent changes in how and why such attacks are performed. We then investigate the mechanics of malware command and control (C2) establishment: we provide a comprehensive review of the techniques used by attackers to set up such a channel and to hide its presence from the attacked parties and the security tools they use. We then switch to the defensive side of the problem, and review approaches that have been proposed for the detection and disruption of C2 channels. We also map such techniques to widely-adopted security controls, emphasizing gaps or limitations (and success stories) in current best practices.Comment: Work commissioned by CPNI, available at c2report.org. 38 pages. Listing abstract compressed from version appearing in repor

    Techniques for the reverse engineering of banking malware

    Get PDF
    Malware attacks are a signiļ¬cant and frequently reported problem, adversely aļ¬€ecting the productivity of organisations and governments worldwide. The well-documented consequences of malware attacks include ļ¬nancial loss, data loss, reputation damage, infrastructure damage, theft of intellectual property, compromise of commercial negotiations, and national security risks. Mitiga-tion activities involve a signiļ¬cant amount of manual analysis. Therefore, there is a need for automated techniques for malware analysis to identify malicious behaviours. Research into automated techniques for malware analysis covers a wide range of activities. This thesis consists of a series of studies: an anal-ysis of banking malware families and their common behaviours, an emulated command and control environment for dynamic malware analysis, a technique to identify similar malware functions, and a technique for the detection of ransomware. An analysis of the nature of banking malware, its major malware families, behaviours, variants, and inter-relationships are provided in this thesis. In doing this, this research takes a broad view of malware analysis, starting with the implementation of the malicious behaviours through to detailed analysis using machine learning. The broad approach taken in this thesis diļ¬€ers from some other studies that approach malware research in a more abstract sense. A disadvantage of approaching malware research without domain knowledge, is that important methodology questions may not be considered. Large datasets of historical malware samples are available for countermea-sures research. However, due to the age of these samples, the original malware infrastructure is no longer available, often restricting malware operations to initialisation functions only. To address this absence, an emulated command and control environment is provided. This emulated environment provides full control of the malware, enabling the capabilities of the original in-the-wild operation, while enabling feature extraction for research purposes. A major focus of this thesis has been the development of a machine learn-ing function similarity method with a novel feature encoding that increases feature strength. This research develops techniques to demonstrate that the machine learning model trained on similarity features from one program can ļ¬nd similar functions in another, unrelated program. This ļ¬nding can lead to the development of generic similar function classiļ¬ers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra. Further, this research examines the use of API call features for the identi-ļ¬cation of ransomware and shows that a failure to consider malware analysis domain knowledge can lead to weaknesses in experimental design. In this case, we show that existing research has diļ¬ƒculty in discriminating between ransomware and benign cryptographic software. This thesis by publication, has developed techniques to advance the disci-pline of malware reverse engineering, in order to minimize harm due to cyber-attacks on critical infrastructure, government institutions, and industry.Doctor of Philosoph

    Establishing trusted Machine-to-Machine communications in the Internet of Things through the use of behavioural tests

    Get PDF
    Today, the Internet of Things (IoT) is one of the most important emerging technologies. Applicable to several fields, it has the potential to strongly influence peopleā€™s lives. ā€œThingsā€ are mostly embedded machines, and Machine-to-Machine (M2M) communications are used to exchange information. The main aspect of this type of communication is that a ā€œthingā€ needs a mechanism to uniquely identify other ā€œthingsā€ without human intervention. For this purpose, trust plays a key role. Trust can be incorporated in the smartness of ā€œthingsā€ by using mobile ā€œagentsā€. From the study of the IoT ecosystem, a new threat against M2M communications has been identified. This relates to the opportunity for an attacker to employ several forged IoT-embedded machines that can be used to launch attacks. Two ā€œthings-awareā€ detection mechanisms have been proposed and evaluated in this work for incorporation into IoT mobile trust agents. These new mechanisms are based on observing specific thing-related behaviour obtained by using a characterisation algorithm. The first mechanism uses a range of behaviours obtained from real embedded machines, such as threshold values, to detect whether a target machine is forged. This detection mechanism is called machine emulation detection algorithm (MEDA). MEDA takes around 3 minutes to achieve a detection accuracy of 79.21%, with 44.55% of real embedded machines labelled as belonging to forged embedded machines. These results indicated a need to develop a more accurate and faster detection method. Therefore, a second mechanism was created and evaluated. A dataset composed of behaviours from real, virtual and emulated embedded systems that can be part of the IoT was created. This was used for both training and testing classification methods. The results identified Random Forest (RF) as the most efficient method, recognising forged embedded machines in only 5 seconds with a detection rate of around 99.5%. It follows that this solution can be applied in real IoT scenarios with critical conditions. In the final part of this thesis, an attack against these new mechanisms has been proposed. This consists of using a modified kernel of a powerful machine to mimic the behaviour of a real IoT-embedded machine, referred to as a fake timing attack (FTA). Two metrics, mode and median from ping response time, have been found to effectively detect this attack. The final detection method involves combining RF and k-Nearest Neighbour to successfully detect forged embedded machines and FTA in only 40 seconds, with an overall detection performance (ODP) of 99.9% and 93.70% respectively. This method also was evaluated using behaviours from embedded machines that were not present in the training set. The results from that evaluation demonstrate that the proposed solution can detect embedded machines unknown to the method, both real and virtual, with an ODP of 99.96% and 99.92% respectively. In summary, a new algorithm able to detect forged embedded machines easily, quickly and with very high accuracy has been developed. The proposed method addresses the challenge of securing present and future M2M-embedded machines with power-constrained resources and can be applied to real IoT scenarios

    Framework For Modeling Attacker Capabilities with Deception

    Get PDF
    In this research we built a custom experimental range using opensource emulated and custom pure honeypots designed to detect or capture attacker activity. The focus is to test the effectiveness of a deception in its ability to evade detection coupled with attacker skill levels. The range consists of three zones accessible via virtual private networking. The first zone houses varying configurations of opensource emulated honeypots, custom built pure honeypots, and real SSH servers. The second zone acts as a point of presence for attackers. The third zone is for administration and monitoring. Using the range, both a control and participant-based experiment were conducted. We conducted control experiments to baseline and empirically explore honeypot detectability amongst other systems through adversarial testing. We executed a series of tests such as network service sweep, enumeration scanning, and finally manual execution. We also selected participants to serve as cyber attackers against the experiment range of varying skills having unique tactics, techniques and procedures in attempting to detect the honeypots. We have concluded the experiments and performed data analysis. We measure the anticipated threat by presenting the Attacker Bias Perception Profile model. Using this model, each participant is ranked based on their overall threat classification and impact. This model is applied to the results of the participants which helps align the threat to likelihood and impact of a honeypot being detected. The results indicate the pure honeypots are significantly difficult to detect. Emulated honeypots are grouped in different categories based on the detection and skills of the attackers. We developed a framework abstracting the deceptive process, the interaction with system elements, the use of intelligence, and the relationship with attackers. The framework is illustrated by our experiment case studies and the attacker actions, the effects on the system, and impact to the success

    Studying a Virtual Testbed for Unverified Data

    Get PDF
    It is difficult to fully know the effects a piece of software will have on your computer, particularly when the software is distributed by an unknown source. The research in this paper focuses on malware detection, virtualization, and sandbox/honeypot techniques with the goal of improving the security of installing useful, but unverifiable, software. With a combination of these techniques, it should be possible to install software in an environment where it cannot harm a machine, but can be tested to determine its safety. Testing for malware, performance, network connectivity, memory usage, and interoperability can be accomplished without allowing the program to access the base operating system of a machine. After the full effects of the software are understood and it is determined to be safe, it could then be run from, and given access to, the base operating system. This thesis investigates the feasibility of creating a system to verify the security of unknown software while ensuring it will have no negative impact on the host machine
    • ā€¦
    corecore