571 research outputs found
On Influence of Representations of Discretized Data on Performance of a Decision System
AbstractWhen discretization is used for preprocessing datasets in a decision system different representations of data can be taken into consideration. Typical approach is to use data as it is returned by discretizer, namely as nominal values. But in specific cases such form of data cannot be utilized by next modules of the decision system. Then the possible solution is to convert nominal data again into a numerical form. The paper presents comparison of such approaches applied for different classifiers in stylometry domain
A baseline for unsupervised advanced persistent threat detection in system-level provenance
Advanced persistent threats (APT) are stealthy, sophisticated, and
unpredictable cyberattacks that can steal intellectual property, damage
critical infrastructure, or cause millions of dollars in damage. Detecting APTs
by monitoring system-level activity is difficult because manually inspecting
the high volume of normal system activity is overwhelming for security
analysts. We evaluate the effectiveness of unsupervised batch and streaming
anomaly detection algorithms over multiple gigabytes of provenance traces
recorded on four different operating systems to determine whether they can
detect realistic APT-like attacks reliably and efficiently. This report is the
first detailed study of the effectiveness of generic unsupervised anomaly
detection techniques in this setting
Discretisation of conditions in decision rules induced for continuous
Typically discretisation procedures are implemented as a part of initial pre-processing of data, before knowledge mining is employed. It means that conclusions and observations are based on reduced data, as usually by discretisation some information is discarded. The paper presents a different approach, with taking advantage of discretisation executed after data mining. In the described study firstly decision rules were induced from real-valued features. Secondly, data sets were discretised. Using categories found for attributes, in the
third step conditions included in inferred rules were translated into discrete domain. The properties and performance of rule classifiers were tested in the domain of stylometric analysis of texts, where writing styles were defined through quantitative attributes of continuous nature. The performed experiments show that the proposed processing leads to sets of rules with significantly reduced sizes while maintaining quality of predictions, and allows to test many data discretisation methods at the acceptable computational costs
Graph based Anomaly Detection and Description: A Survey
Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field
- …