Search CORE

2,600 research outputs found

Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

Author: Andrii Shalaginov
D Krishna Sandeep Reddy
Farid Daryabar
Igor Santos
Reinaldo Jose Mangialardo
Smita Naval
Steve Watson
Teuvo Kohonen
Yanfang Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/08/2018
Field of study

Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

arXiv.org e-Print Archive

Crossref

Fiber Optic Attenuation Analysis Based on Mamdani Fuzzy Logic in Gambir Area, Central Jakarta

Author: Lenni Lenni
Muwardi Rachmat
Rahmawati Yosy
Sari Ninda
Yuliza Yuliza
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 23/12/2022
Field of study

In this study, the authors conducted an analysis of the quality of fiber optic network maintenance based on attenuation value and maintenance time using fuzzy Mamdani logic and simulated using Matlab software, to improve accuracy in drawing conclusions on maintaining quality. This study uses a quantitative method, in which the author obtains a summary of customer data from PT. Telkom Indonesia in a period of 4 months of observation from August to November 2021. In August there were 776 customers, in September there were 362 customers, in October there were 359 customers, and in November 445 customers who underwent Indihome fiber optic cable maintenance. The test results with the centroid method with an input Handling Time of 1.5 hours and an Attenuation of 15 dB, then the output Repair Quality is 5.5 or categorized as Good. The greater the attenuation value generated, the more time it takes to maintain the IndiHome internet network disturbance. This is due to the many technical maintenance of fiber optic cables carried out by technicians to adjust for damage/trouble in the field. It is expected that maintenance can be carried out routinely in order to avoid fatal internet disturbances on the customer's side, and maximize maintenance time according to the dosage determined by the company, which is less than 3 hours, taking into account the work performance of technicians and also the quality of maintenance

Journal of Education and Learning (EduLearn)

Local feature weighting in nearest prototype classification

Author: Fernández Fernando
Isasi Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

The distance metric is the corner stone of nearest neighbor (NN)-based methods, and therefore, of nearest prototype (NP) algorithms. That is because they classify depending on the similarity of the data. When the data is characterized by a set of features which may contribute to the classification task in different levels, feature weighting or selection is required, sometimes in a local sense. However, local weighting is typically restricted to NN approaches. In this paper, we introduce local feature weighting (LFW) in NP classification. LFW provides each prototype its own weight vector, opposite to typical global weighting methods found in the NP literature, where all the prototypes share the same one. Providing each prototype its own weight vector has a novel effect in the borders of the Voronoi regions generated: They become nonlinear. We have integrated LFW with a previously developed evolutionary nearest prototype classifier (ENPC). The experiments performed both in artificial and real data sets demonstrate that the resulting algorithm that we call LFW in nearest prototype classification (LFW-NPC) avoids overfitting on training data in domains where the features may have different contribution to the classification task in different areas of the feature space. This generalization capability is also reflected in automatically obtaining an accurate and reduced set of prototypes.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Sampling imbalance dataset for software defect prediction using hybrid neuro-fuzzy systems with Naive Bayes classifier

Author: B. Latha
K. Punitha
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2016
Field of study

Predviđanje grešaka u računalnom programu (SDP-software defect prediction) je težak zadatak kad se radi o projektima računalnog programa. Taj je postupak koristan za identifikaciju i lokaciju neispravnosti iz modula. Taj će zadatak postati skuplji uz dodatak složenih mehanizama za ispitivanje i ocjenjivanje kad se poveća veličina modula programa. Daljnje konsistentne i disciplinirane provjere programa nude nekoliko prednosti, na pr. točnost u procjeni troškova i programiranja projekta, povećanje kvalitete postupka i proizvoda. Detaljna analiza metričkih podataka programa također može značajno pomoći u lociranju mogućih grešaka u programskom kodiranju. Osnovni je cilj ovoga rada predstaviti metode za detekciju i otkrivanje grešaka u programu primjenom postupaka strojnog učenja. U radu su korišteni nebalansirani nizovi podaka iz NASA-inog Metrics Data Programa (MDP) i programska metrika niza podataka izabrana je primjenom Genetičkog algoritma metodom Optimizacije kolonije mrava (Ant Colony Optimization -GACO). Postupak uzorkovanja metodom Modified Co Forest - polu-nadgledanog učenja, generira balansirano označene nizove podataka koristeći nebalansirane nizove, a primjenjuje se za učinkoviti postupak otkrivanja greške u programu s Hibridnim Neuro-Fuzzy sustavima za strojno učenje po Naive Bayes metodama. Eksperimentalni rezultati predložene metode dokazuju da je ova metoda za otkrivanje greške u računalnom program učinkovitija od drugih postojećih metoda, s boljim rezultatima u predviđanju greške.Software defect prediction (SDP) is a process with difficult tasks in the case of software projects. The SDP process is useful for the identification and location of defects from the modules. This task will tend to become more costly with the addition of complex testing and evaluation mechanisms, when the software project modules size increases. Further measurement of software in a consistent and disciplined manner offers several advantages like accuracy in the estimation of project costs and schedules, and improving product and process qualities. Detailed analysis of software metric data also gives significant clues about the locations of possible defects in a programming code. The main goal of this proposed work is to introduce software defects detection and prevention methods for identifying defects from software using machine learning approaches. This proposed work used imbalanced datasets from NASA’s Metrics Data Program (MDP) and software metrics of datasets are selected by using Genetic algorithm with Ant Colony Optimization (GACO) method. The sampling process with semi supervised learning Modified Co Forest method generates the balanced labelled using imbalanced datasets, which is used for efficient software defect detection process with machine learning Hybrid Neuro-Fuzzy Systems with Naive Bayes methods. The experimental results of this proposed method proves that this defect detecting machine learning method yields more efficiency and better performance in defect prediction result of software in comparison with the other available methods

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Development of Computational Intelligence Methods to Deal with Classi cation Problems

Author: Ekong Udeme Essien
Publication venue
Publication date: 01/01/2016
Field of study

King's Research Portal

SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers

Author: Albardan Mahmoud
Colot Olivier
Klein John
Publication venue: 'Elsevier BV'
Publication date: 20/02/2020
Field of study

We investigate a problem in which each member of a group of learners is trained separately to solve the same classification task. Each learner has access to a training dataset (possibly with overlap across learners) but each trained classifier can be evaluated on a validation dataset. We propose a new approach to aggregate the learner predictions in the possibility theory framework. For each classifier prediction, we build a possibility distribution assessing how likely the classifier prediction is correct using frequentist probabilities estimated on the validation set. The possibility distributions are aggregated using an adaptive t-norm that can accommodate dependency and poor accuracy of the classifier predictions. We prove that the proposed approach possesses a number of desirable classifier combination robustness properties

arXiv.org e-Print Archive

Hal-Diderot

A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes

Author: Bai Ruibin
Jiang Xudong
Ren Jianfeng
Wang Shihe
Yao Yuan
Publication venue
Publication date: 04/04/2023
Field of study

In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary target of data discretization in classification is to improve the generalization performance. As a result, the data tend to be over-split into many small bins since the data without discretization retain the maximal discriminant information. Thus, we propose a Max-Dependency-Min-Divergence (MDmD) criterion that maximizes both the discriminant information and generalization ability of the discretized data. More specifically, the Max-Dependency criterion maximizes the statistical dependency between the discretized data and the classification variable while the Min-Divergence criterion explicitly minimizes the JS-divergence between the training data and the validation data for a given discretization scheme. The proposed MDmD criterion is technically appealing, but it is difficult to reliably estimate the high-order joint distributions of attributes and the classification variable. We hence further propose a more practical solution, Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute is discretized separately, by simultaneously maximizing the discriminant information and the generalization ability of the discretized data. The proposed MRmD is compared with the state-of-the-art discretization algorithms under the naive Bayes classification framework on 45 machine-learning benchmark datasets. It significantly outperforms all the compared methods on most of the datasets.Comment: Under major revision of Pattern Recognitio

arXiv.org e-Print Archive