74 research outputs found
Automatic generation of meta classifiers with large levels for distributed computing and networking
This paper is devoted to a case study of a new construction of classifiers. These classifiers are called automatically generated multi-level meta classifiers, AGMLMC. The construction combines diverse meta classifiers in a new way to create a unified system. This original construction can be generated automatically producing classifiers with large levels. Different meta classifiers are incorporated as low-level integral parts of another meta classifier at the top level. It is intended for the distributed computing and networking. The AGMLMC classifiers are unified classifiers with many parts that can operate in parallel. This make it easy to adopt them in distributed applications. This paper introduces new construction of classifiers and undertakes an experimental study of their performance. We look at a case study of their effectiveness in the special case of the detection and filtering of phishing emails. This is a possible important application area for such large and distributed classification systems. Our experiments investigate the effectiveness of combining diverse meta classifiers into one AGMLMC classifier in the case study of detection and filtering of phishing emails. The results show that new classifiers with large levels achieved better performance compared to the base classifiers and simple meta classifiers classifiers. This demonstrates that the new technique can be applied to increase the performance if diverse meta classifiers are included in the system
Performance evaluation of multi-tier ensemble classifiers for phishing websites
This article is devoted to large multi-tier ensemble classifiers generated as ensembles of ensembles and applied to phishing websites. Our new ensemble construction is a special case of the general and productive multi-tier approach well known in information security. Many efficient multi-tier classifiers have been considered in the literature. Our new contribution is in generating new large systems as ensembles of ensembles by linking a top-tier ensemble to another middletier ensemble instead of a base classifier so that the top~ tier ensemble can generate the whole system. This automatic generation capability includes many large ensemble classifiers in two tiers simultaneously and automatically combines them into one hierarchical unified system so that one ensemble is an integral part of another one. This new construction makes it easy to set up and run such large systems. The present article concentrates on the investigation of performance of these new multi~tier ensembles for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of multi~level ensembles with three tiers. The results presented here demonstrate that new three-tier ensemble classifiers performed better than the base classifiers and standard ensembles included in the system. This example of application to the classification of phishing websites shows that the new method of combining diverse ensemble techniques into a unified hierarchical three-tier ensemble can be applied to increase the performance of classifiers in situations where data can be processed on a large computer
A hybrid semantic similarity feature-based to support multiple ontologies
Pembelajaran Berasaskan Kerja (PBK) merupakan satu kaedah pembelajaran yang menggabungkan pembelajaran teori dan amali secara serentak dalam lapangan kerja sebenar, dengan tujuan untuk melahirkan graduan yang memiliki nilai kebolehkerjaan. Walaupun kaedah ini telah lama dilaksanakan di negara maju seperti Amerika Syarikat dan United Kingdom, tetapi di Malaysia ianya baru dilaksanakan pada tahun 2007 dan hanya melibatkan beberapa buah kolej komuniti pada peringkat awal. Walau bagaimanapun pada tahun 2010, pelaksanaan PBK telah dihentikan di kolej komuniti, dan dipindahkan di politeknik. Antara isu yang berlaku dalam pelaksanaan PBK politeknik semasa dalam industri ialah konsep pelaksanaan PBK, gaya pengajaran dan pembelajaran, kaedah penilaian, hubungan politeknik dengan industri, keseragaman konsep pelaksanaan PBK, isu dan cabaran dalam pelaksanaan PBK, dan perbezaan kaedah pelaksanaan PBK antara politeknik dengan kolej komuniti. Oleh itu, tujuan kajian ini dijalankan ialah untuk meneroka, memahami dan menjelaskan pelaksanaan PBK politeknik bersama industri. Kajian ini dijalankan menggunakan metodologi kajian kes kualitatif. Proses pengumpulan data di lapangan kajian dilaksanakan selama setahun menggunakan tek:nik temubual, pemerhatian dan analisis dokumen. Strategi persampelan variasi maksima, teknik persampelan snowball dan jenis persampelan bertujuan digunakan. Peserta kajian adalah daripada kalangan pengurusan dan pensyarah penyelaras PBK, penyelia industri dan pelajar yang terlibat dengan PBK. Dapatan kajian menunjukkan bahawa pelaksanaan PBK politeknik bersama industri berlaku banyak penambahbaikan dalam pelaksanaannya jika dibandingkan dengan pelaksanaan PBK di kolej komuniti sebelum ini, namun terdapat beberapa isu yang wujud, iaitu melibatkan kurikulum PBK yang tidak selari dengan dasar industri dan kelemahan penyelia industri dalam pengajaran dan pembelajaran
A hybrid semantic similarity feature-based to support multiple ontologies
Pembelajaran Berasaskan Kerja (PBK) merupakan satu kaedah pembelajaran yang menggabungkan pembelajaran teori dan amali secara serentak dalam lapangan kerja sebenar, dengan tujuan untuk melahirkan graduan yang memiliki nilai kebolehkerjaan. Walaupun kaedah ini telah lama dilaksanakan di negara maju seperti Amerika Syarikat dan United Kingdom, tetapi di Malaysia ianya baru dilaksanakan pada tahun 2007 dan hanya melibatkan beberapa buah kolej komuniti pada peringkat awal. Walau bagaimanapun pada tahun 2010, pelaksanaan PBK telah dihentikan di kolej komuniti, dan dipindahkan di politeknik. Antara isu yang berlaku dalam pelaksanaan PBK politeknik semasa dalam industri ialah konsep pelaksanaan PBK, gaya pengajaran dan pembelajaran, kaedah penilaian, hubungan politeknik dengan industri, keseragaman konsep pelaksanaan PBK, isu dan cabaran dalam pelaksanaan PBK, dan perbezaan kaedah pelaksanaan PBK antara politeknik dengan kolej komuniti. Oleh itu, tujuan kajian ini dijalankan ialah untuk meneroka, memahami dan menjelaskan pelaksanaan PBK politeknik bersama industri. Kajian ini dijalankan menggunakan metodologi kajian kes kualitatif. Proses pengumpulan data di lapangan kajian dilaksanakan selama setahun menggunakan tek:nik temubual, pemerhatian dan analisis dokumen. Strategi persampelan variasi maksima, teknik persampelan snowball dan jenis persampelan bertujuan digunakan. Peserta kajian adalah daripada kalangan pengurusan dan pensyarah penyelaras PBK, penyelia industri dan pelajar yang terlibat dengan PBK. Dapatan kajian menunjukkan bahawa pelaksanaan PBK politeknik bersama industri berlaku banyak penambahbaikan dalam pelaksanaannya jika dibandingkan dengan pelaksanaan PBK di kolej komuniti sebelum ini, namun terdapat beberapa isu yang wujud, iaitu melibatkan kurikulum PBK yang tidak selari dengan dasar industri dan kelemahan penyelia industri dalam pengajaran dan pembelajaran
Dynamic adversarial mining - effectively applying machine learning in adversarial non-stationary environments.
While understanding of machine learning and data mining is still in its budding stages, the engineering applications of the same has found immense acceptance and success. Cybersecurity applications such as intrusion detection systems, spam filtering, and CAPTCHA authentication, have all begun adopting machine learning as a viable technique to deal with large scale adversarial activity. However, the naive usage of machine learning in an adversarial setting is prone to reverse engineering and evasion attacks, as most of these techniques were designed primarily for a static setting. The security domain is a dynamic landscape, with an ongoing never ending arms race between the system designer and the attackers. Any solution designed for such a domain needs to take into account an active adversary and needs to evolve over time, in the face of emerging threats. We term this as the ‘Dynamic Adversarial Mining’ problem, and the presented work provides the foundation for this new interdisciplinary area of research, at the crossroads of Machine Learning, Cybersecurity, and Streaming Data Mining. We start with a white hat analysis of the vulnerabilities of classification systems to exploratory attack. The proposed ‘Seed-Explore-Exploit’ framework provides characterization and modeling of attacks, ranging from simple random evasion attacks to sophisticated reverse engineering. It is observed that, even systems having prediction accuracy close to 100%, can be easily evaded with more than 90% precision. This evasion can be performed without any information about the underlying classifier, training dataset, or the domain of application. Attacks on machine learning systems cause the data to exhibit non stationarity (i.e., the training and the testing data have different distributions). It is necessary to detect these changes in distribution, called concept drift, as they could cause the prediction performance of the model to degrade over time. However, the detection cannot overly rely on labeled data to compute performance explicitly and monitor a drop, as labeling is expensive and time consuming, and at times may not be a possibility altogether. As such, we propose the ‘Margin Density Drift Detection (MD3)’ algorithm, which can reliably detect concept drift from unlabeled data only. MD3 provides high detection accuracy with a low false alarm rate, making it suitable for cybersecurity applications; where excessive false alarms are expensive and can lead to loss of trust in the warning system. Additionally, MD3 is designed as a classifier independent and streaming algorithm for usage in a variety of continuous never-ending learning systems. We then propose a ‘Dynamic Adversarial Mining’ based learning framework, for learning in non-stationary and adversarial environments, which provides ‘security by design’. The proposed ‘Predict-Detect’ classifier framework, aims to provide: robustness against attacks, ease of attack detection using unlabeled data, and swift recovery from attacks. Ideas of feature hiding and obfuscation of feature importance are proposed as strategies to enhance the learning framework\u27s security. Metrics for evaluating the dynamic security of a system and recover-ability after an attack are introduced to provide a practical way of measuring efficacy of dynamic security strategies. The framework is developed as a streaming data methodology, capable of continually functioning with limited supervision and effectively responding to adversarial dynamics. The developed ideas, methodology, algorithms, and experimental analysis, aim to provide a foundation for future work in the area of ‘Dynamic Adversarial Mining’, wherein a holistic approach to machine learning based security is motivated
Phishing detection and traceback mechanism
Isredza Rahmi A Hamid’s thesis entitled Phishing Detection and Trackback Mechanism. The thesis investigates detection of phishing attacks through email, novel method to profile the attacker and tracking the attack back to the origin
Machine Learning
Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience
Recommended from our members
Robust behavioral malware detection
Computer security attacks evolve to evade deployed defenses. Recent attacks have ranged from exploiting generic software vulnerabilities in memory-unsafe languages such as buffer overflows and format string vulnerabilities to exploiting logic errors in web applications, through means such as SQL injection and cross-site scripting. Furthermore, recent attacks have focused on escalating privileges
and stealing sensitive information by exploiting new hardware or operating system (OS) interfaces. Computer security attacks are also now relying on social engineering techniques to run malicious programs on victims' machines; instances of such abuse include phishing and watering hole attacks, both of which trick people into running malicious code or divulging confidential information. Thus, traditional computer security methods, such as OS confinement and program analysis, will not prevent new attacks that do not violate OS confinement or present illegal program behaviors.
Another challenge is that traditional security approaches have large trusted code bases (TCBs), which include hardware, OSs, and other software components that implement authentication and authorization logic across a distributed system. This is a vulnerable area because these components are complex and often contain vulnerabilities that undermine the overall system's integrity or confidentiality.
Evasive attacks on vulnerable systems -- especially in instances where trusted components turn malicious -- inspire the creation of defenses that can augment formally specified mechanisms against known threats. Specifically, this thesis advances the state of the art in behavioral malware detection -- detecting previously unknown malware in the very early stages of infection within an enterprise network.
Here we assess three fundamental insights of modern-day attacks and then describe a cross-layer defense against such attacks. First, we make a low-level machine state visible to behavioral analysis, significantly minimizing the TCB and its associated vulnerabilities. Specifically, our behavioral detector utilizes an executable code's dynamic properties, with architectural and micro-architectural states as input. Second, we evaluate behavioral detectors against adaptive adversaries. For this purpose, we introduce a new metric to determine a detector's robustness against malware modifications, which serves as a step toward explainability of machine learning-based malware detectors. Finally, we exploit the fact that attacks spread through only a limited number of vectors and propose new techniques to analyze the resulting dynamic correlations created among machines. These insights show that behavioral detectors can efficiently protect both individual devices and end hosts within enterprise networks. We present three types of such behavioral detectors.
Sherlock protects resource-constrained devices, such as mobile phones and Internet-of-things (IoT) devices, without modifying the software/hardware stack. Sherlock's supervised and unsupervised versions outperform prior work by 24.7% and 12.5% (area under the curve (AUC) metric), respectively, and detects stealthy malware that often evades static analysis tools.
The second behavioral detector, Shape-GD, protects devices within an enterprise network. It monitors devices on the network, aggregates data from weak local detectors, overlays that with network-level information, and then makes early, robust predictions regarding malicious activity. Shape-GD achieves its goals by exploiting latent attack semantics. Specifically, it analyzes communication patterns across multiple devices, partitioning them into neighborhoods. Devices within the same neighborhood are likely to be exposed to the same attack vector. Furthermore, we hypothesize that the conditional distribution of false positives is different from that of true positives; i.e., given a neighborhood of nodes, we can compute the aggregate distributional shape of alert feature vectors from the neighborhood itself and provide robust labels.
We evaluate Shape-GD by emulating a large community of Windows systems using the system call traces from a few thousand malicious and benign applications; we simulate both a phishing attack in a corporate email network as well as a watering hole attack through a popular website. In both scenarios, Shape-GD identifies malware early on (~100 infected nodes in a ~100K-node system for watering hole attacks, and ~10 of ~1,000 for phishing attacks) and robustly (with ~100% global true-positive and ~1% global false-positive rates).
The third behavioral detector, Centurion, detects malware across machines monitored by an anti-virus company. It is able to analyze behavior from 5 million Symantec client machines in real time and discovers malware by correlating file downloads across multiple machines. Compared with a recent local detector that analyzes metadata from file downloads, Centurion reduced the number of false positives from ~1M to ~110K and increased the true-positive rate by a factor of ~2.5. In addition, on average, Centurion detects malware 345 days earlier than commercial anti-virus products.Electrical and Computer Engineerin
Ideal bases in constructions defined by directed graphs
The present article continues the investigation of visible ideal bases in constructions defined using directed graphs. Our main theorem establishes that, for every balanced digraph D and each idempotent semiring R with 1, the incidence semiring ID(R) of the digraph D has a convenient visible ideal basis BD(R). It also shows that the elements of BD(R) can always be used to generate two-sided ideals with the largest possible weight among the weights of all two-sided ideals in the incidence semiring
Ideal Basis in Constructions Defined by Directed Graphs
The present article continues the investigation of visible ideal bases in constructions defined using directed graphs. This notion is motivated by its applications for the design of classication systems. Our main theorem establishes that, for every balanced digraph and each idempotent semiring with identity element, the incidence semiring of the digraph has a convenient visible ideal basis. It also shows that the elements of the basis can always be used to generate ideals with the largest possible weight among the weights of all ideals in the incidence semiring
- …