Search CORE

571 research outputs found

Schema-agnostic entity retrieval in highly heterogeneous semi-structured environments

Author: Gaugaz Julien
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2015
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

On Influence of Representations of Discretized Data on Performance of a Decision System

Author: Baron Grzegorz
Publication venue: The Author(s). Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractWhen discretization is used for preprocessing datasets in a decision system different representations of data can be taken into consideration. Typical approach is to use data as it is returned by discretizer, namely as nominal values. But in specific cases such form of data cannot be utilized by next modules of the decision system. Then the possible solution is to convert nominal data again into a numerical form. The paper presents comparison of such approaches applied for different classifiers in stylometry domain

Elsevier - Publisher Connector

A baseline for unsupervised advanced persistent threat detection in system-level provenance

Author: Benabderrahmane Sidahmed
Berrada Ghita
Cheney James
Maxwell William
Mookherjee Himan
Theriault Alec
Wright Ryan
Publication venue: 'Elsevier BV'
Publication date: 18/11/2019
Field of study

Advanced persistent threats (APT) are stealthy, sophisticated, and unpredictable cyberattacks that can steal intellectual property, damage critical infrastructure, or cause millions of dollars in damage. Detecting APTs by monitoring system-level activity is difficult because manually inspecting the high volume of normal system activity is overwhelming for security analysts. We evaluate the effectiveness of unsupervised batch and streaming anomaly detection algorithms over multiple gigabytes of provenance traces recorded on four different operating systems to determine whether they can detect realistic APT-like attacks reliably and efficiently. This report is the first detailed study of the effectiveness of generic unsupervised anomaly detection techniques in this setting

arXiv.org e-Print Archive

Edinburgh Research Explorer

Discretisation of conditions in decision rules induced for continuous

Author: Baron Grzegorz
Stańczyk Urszula
Zielosko Beata
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Typically discretisation procedures are implemented as a part of initial pre-processing of data, before knowledge mining is employed. It means that conclusions and observations are based on reduced data, as usually by discretisation some information is discarded. The paper presents a different approach, with taking advantage of discretisation executed after data mining. In the described study firstly decision rules were induced from real-valued features. Secondly, data sets were discretised. Using categories found for attributes, in the third step conditions included in inferred rules were translated into discrete domain. The properties and performance of rule classifiers were tested in the domain of stylometric analysis of texts, where writing styles were defined through quantitative attributes of continuous nature. The performed experiments show that the proposed processing leads to sets of rules with significantly reduced sizes while maintaining quality of predictions, and allows to test many data discretisation methods at the acceptable computational costs

Directory of Open Access Journals

Repozytorium Uniwersytetu Śląskiego RE-BUŚ

A Rule Mining-Based Advanced Persistent Threats Detection System

Author: Benabderrahmane Sidahmed
Berrada Ghita
Cheney James
Valtchev Petko
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 19/08/2021
Field of study

Edinburgh Research Explorer

Graph based Anomaly Detection and Description: A Survey

Author: Danai Koutra
Hanghang Tong
Leman Akoglu
Publication venue
Publication date: 28/04/2014
Field of study

Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

arXiv.org e-Print Archive

CiteSeerX