1,518 research outputs found

    Bayesian Networks for Interpretable Cyberattack Detection

    Get PDF
    The challenge of cyberattack detection can be illustrated by the complexity of the MITRE ATT&CKTM matrix, which catalogues >200 attack techniques (most with multiple sub-techniques). To reliably detect cyberattacks, we propose an evidence-based approach which fuses multiple cyber events over varying time periods to help differentiate normal from malicious behavior. We use Bayesian Networks (BNs) – probabilistic graphical models consisting of a set of variables and their conditional dependencies – for fusion/classification due to their interpretable nature, ability to tolerate sparse or imbalanced data, and resistance to overfitting. Our technique utilizes a small collection of expert-informed cyber intrusion indicators to create a hybrid detection system that combines data-driven training with expert knowledge to form a host-based intrusion detection system (HIDS). We demonstrate a software pipeline for efficiently generating and evaluating various BN classifier architectures for specific datasets and discuss explainability benefits thereof

    Does the Electronic Medical Record (EMR) Adoption Matter? Exploring Patterns of EMR Implementation and its Impact on Hospital Performance

    Get PDF
    We aimed to explore the patterns of electronic medical records (EMR) adoption and its effects on hospital performance. We analyzed hospital-level panel data from 2008 to 2013 using Bayesian regression and the Naïve Bayes model. Our research analysis revealed 38 different adoption patterns for 1,919 hospitals that completed EMR implementation (having all of the four components) and 42 different investment patterns for 1,341 hospitals that could not complete the EMR implementation. We examined the hospitals’ EMR adoption patterns that were not completed; but predicted as completed using the Naïve Bayes model. Our results revealed that the hospitals that completed EMR adoption showed higher performance in terms of patient recommendation and net patient revenue than those that did not complete EMR adoption. More importantly, most of hospitals that observed as “not completed” but predicted as “completed” showed lower performance in terms of patient recommendation as well as net patient revenue

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    What Do Customers Say About My Products? Benchmarking Machine Learning Models for Need Identification

    Get PDF
    Needmining is the process of extracting customer needs from user-generated content by classifying it as either informative or uninformative regarding need content. Contemporary studies achieve this by utilizing machine learning. However, models found in the literature cannot be compared to each other because they use private data for training and testing. This study benchmarks all previously suggested needmining models including CNN, SVM, RNN, and RoBERTa. To ensure an unbiased comparison, this study samples and annotates a dataset of customer reviews for products from 4 different categories from amazon. Henceforth, the dataset is publicly available and serves as a gold-set for future needmining benchmarks. RoBERTa outperformed other classifiers and seems to be best suited for needmining. The relevance of this study is reinforced by the fact that this benchmark creates a different hierarchy between models than otherwise suggested by comparing the results of previous studies

    User Modeling via Machine Learning and Rule-Based Reasoning to Understand and Predict Errors in Survey Systems

    Get PDF
    User modeling is traditionally applied to systems were users have a large degree of control over their goals, the content they view, and the manner in which they navigate through the system. These systems aim to both recommend useful goals to users and to assist them in achieving perceived goals. Systems such as online or telephone surveys are different in that users have only a singular goal of survey completion, extremely limited control over navigation, and content is restricted to prescribed set of survey tasks; changing the user modeling problem to one in which the best means of assisting users is to identify rare-actions hazardous to their singular goal, by observing their interactions with common contexts. With this goal in mind, predictive mechanisms based on a combination of Machine Learning classifiers and survey domain knowledge encapsulated in sets of rules are developed that utilize user behavioral, demographic, and survey state data in order to predict when user actions leading to irreparable harm to the user\u27s singular goal of successful survey completion will occur. We show that despite a large class imbalance problem associated with detecting these actions and their associated users, we are able to predict such actions at a rate better than random guessing and that the application of domain knowledge via rule-sets improves performance further. We also identify traits of surveys and users that are associated with rare-action incidence. For future work, it is recommended that existence of potential sub-concepts related to users who perform these rare-actions be explored, as well as exploring alternative means of identifying such users, and that system adaptations be developed that can prevent users from performing these rare and harmful actions. Advisor: LeenKiat So

    Comparing classification algorithms for prediction on CROBEX data

    Get PDF
    The main objective of this analysis is to evaluate and compare the various classification algorithms for the automatic identification of favourable days for intraday trading using the Croatian stock index CROBEX data. Intra-day trading refers to the acquisition and sale of financial instruments on the same trading day. If the increase between the opening price and the closing price of the same day is substantial enough to earn a profit by purchasing at the opening price and selling at the closing price, the day is considered to be favourable for intra-day trading. The goal is to discover relation between selected financial indicators on a given day and the market situation on the following day i.e. to determine whether a day is favourable for day trading or not. The problem is modelled as a binary classification problem. The idea is to test different algorithms and to give greater attention to those that are more rarely used than traditional statistical methods. Thus, the following algorithms are used: neural network, support vector machine, random forest, as well as k-nearest neighbours and naïve Bayes classifier as classifiers that are more common. The work is an extension of authors’ previous work in which the algorithms are compared on resamples resulting from tuning the algorithms, while here, each derived model is used to make predictions on new data. The results should add to the increasing corpus of stock market prediction research efforts and try to fill some gaps in this field of research for the Croatian market, in particular by using machine learning algorithms
    corecore