3,757 research outputs found

    Falcon Optimization Algorithm for Bayesian Networks Structure Learning

    Get PDF
    In machine-learning, one of the useful scientific models for producing the structure of knowledge is Bayesian network, which can draw probabilistic dependency relationships between variables. The score and search is a method used for learning the structure of a Bayesian network. The authors apply the Falcon Optimization Algorithm (FOA) as a new approach to learning the structure of Bayesian networks. This paper uses the Reversing, Deleting, Moving and Inserting operations to adopt the FOA for approaching the optimal solution of Bayesian network structure. Essentially, the falcon prey search strategy is used in the FOA algorithm. The result of the proposed technique is compared with Pigeon Inspired optimization, Greedy Search, and Simulated Annealing using the BDeu score function. The authors have also examined the performances of the confusion matrix of these techniques utilizing several benchmark data sets. As shown by the evaluations, the proposed method has more reliable performance than the other algorithms including producing better scores and accuracy values

    Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks

    Full text link
    Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents. These infection vectors hide embedded malicious code to the victim users, facilitating the use of social engineering techniques to infect their machines. Research showed that machine-learning algorithms provide effective detection mechanisms against such threats, but the existence of an arms race in adversarial settings has recently challenged such systems. In this work, we focus on malware embedded in PDF files as a representative case of such an arms race. We start by providing a comprehensive taxonomy of the different approaches used to generate PDF malware, and of the corresponding learning-based detection systems. We then categorize threats specifically targeted against learning-based PDF malware detectors, using a well-established framework in the field of adversarial machine learning. This framework allows us to categorize known vulnerabilities of learning-based PDF malware detectors and to identify novel attacks that may threaten such systems, along with the potential defense mechanisms that can mitigate the impact of such threats. We conclude the paper by discussing how such findings highlight promising research directions towards tackling the more general challenge of designing robust malware detectors in adversarial settings

    Development and benchmarking a novel scatter search algorithm for learning probabilistic graphical models in healthcare

    Get PDF
    Healthcare data of small sizes are widespread, and the challenge of building accurate inference models is difficult. Many machine learning algorithms exist, but many are black boxes. Explainable models in healthcare are essential, so healthcare practitioners can understand the developed model and incorporate domain knowledge into the model. Probabilistic graphical models offer a visual way to represent relationships between data. Here we develop a new scatter search algorithm to learn Bayesian networks. This machine learning approach is applied to three case studies to understand the effectiveness in comparison with traditional machine learning techniques. First, a new scatter search approach is presented to construct the structure of a Bayesian network. Statistical tests are used to build small Directed acyclic graphs combined in an iterative process to build up multiple larger graphs. Probability distributions are fitted as the graphs are built up. These graphs are then scored based on classification performance. Once no new solutions can be found, the algorithm finishes. The first study looks at the effectiveness of the scatter search constructed Bayesian network against other machine learning algorithms in the same class. These algorithms are benchmarked against standard datasets from the UCI Machine Learning Repository, which has many published studies. The second study assesses the effectiveness of the scatter search Bayesian network for classifying ovarian cancer patients. Multiple other machine learning algorithms were applied alongside the Bayesian network. All data from this study were collected by clinicians from the Aneurin Bevan University Health Board. The study concluded that machine-learning techniques could be applied to classify patients based on early indicators. The third and final study looked into applying machine learning techniques to no-show breast cancer follow-up patients. Once again, the scatter search Bayesian network was used alongside other machine learning approaches. Socio-demographic and socio-economic factors involving low to middle-income families were used in this study with feature selection techniques to improve machine learning performance. It was found machine learning, when used with feature selection, could classify no-show patients with reasonable accuracy

    Text mining with the WEBSOM

    Get PDF
    The emerging field of text mining applies methods from data mining and exploratory data analysis to analyzing text collections and to conveying information to the user in an intuitive manner. Visual, map-like displays provide a powerful and fast medium for portraying information about large collections of text. Relationships between text items and collections, such as similarity, clusters, gaps and outliers can be communicated naturally using spatial relationships, shading, and colors. In the WEBSOM method the self-organizing map (SOM) algorithm is used to automatically organize very large and high-dimensional collections of text documents onto two-dimensional map displays. The map forms a document landscape where similar documents appear close to each other at points of the regular map grid. The landscape can be labeled with automatically identified descriptive words that convey properties of each area and also act as landmarks during exploration. With the help of an HTML-based interactive tool the ordered landscape can be used in browsing the document collection and in performing searches on the map. An organized map offers an overview of an unknown document collection helping the user in familiarizing herself with the domain. Map displays that are already familiar can be used as visual frames of reference for conveying properties of unknown text items. Static, thematically arranged document landscapes provide meaningful backgrounds for dynamic visualizations of for example time-related properties of the data. Search results can be visualized in the context of related documents. Experiments on document collections of various sizes, text types, and languages show that the WEBSOM method is scalable and generally applicable. Preliminary results in a text retrieval experiment indicate that even when the additional value provided by the visualization is disregarded the document maps perform at least comparably with more conventional retrieval methods.reviewe

    Automated subject classification of textual web documents

    Full text link
    corecore