11 research outputs found

    Efficient Distributed Outlier Detection in Data Streams

    Get PDF
    Anomaly detection is one of the major data mining tasks in modern applications. An element that shows significant deviation from the "usual" behavior is marked as an outlier. This means that this element either corresponds to noise or it requires more careful examination because it may be important. Also, many clustering algorithms are very sensitive to outliers. In any case, outliers must be identified and explored further, meaning that efficient outlier mining techniques are required. In this paper, we focus on distributed density-based outlier detection over multi-dimensional data streams. In particular, we focus on the approximation method for computing the Local Correlation Integral (LOCI) of multi-dimensional points. Each object p is assigned a score score(p) which represents the outlier score of p. Thus, one can select the top-k elements from the dataset that have the highest outlier scores. Our proposal has been implemented in Apache Spark using Scala and experiments have been conducted in a physical cluster running Apache Hadoop 2.7 and Apache Spark 2.4.0. Performance evaluation results demonstrate that the proposed algorithm is efficient and scalable and therefore it can be used to mine outliers in large distributed datasets

    Optimizing the Execution of Product Data Models

    No full text
    The Product Data Model (PDM) is an example of a declarative data-centric approach to modelling information-intensive business processes, which offers flexibility and facilitates process optimization. Declarative approaches are the de facto choice in all modern data-oriented workflows, but they require an optimizer to choose among multiple, alternative execution plans that can produce the desired end product. In PDM business processes, current optimization heuristics suffer from severe limitations regarding both their efficiency and applicability to realistic scenarios, stemming from a lack of consideration for the resource perspective of the processes being modelled and the advances in modern data flow optimizers. This work tackles both of these limitations with the proposal of rank-based operation ordering optimizations tailored to the specificities of PDM, which are also combined with the consideration of the resources available to execute the process operations and parallelism options. Through an extensive evaluation of the proposed solutions, it is showcased that there are significant performance gains from the advanced rank-based operation ordering techniques with the added support of parallel execution. The speedups observed were up to 5.5X compared to the state-of-the-art optimization heuristics

    Efficient Distributed Outlier Detection in Data Streams

    No full text
    Anomaly detection is one of the major data mining tasks in modern applications. An element that shows significant deviation from the "usual" behavior is marked as an outlier. This means that this element either corresponds to noise or it requires more careful examination because it may be important. Also, many clustering algorithms are very sensitive to outliers. In any case, outliers must be identified and explored further, meaning that efficient outlier mining techniques are required. In this paper, we focus on distributed density-based outlier detection over multi-dimensional data streams. In particular, we focus on the approximation method for computing the Local Correlation Integral (LOCI) of multi-dimensional points. Each object p is assigned a score score(p) which represents the outlier score of p. Thus, one can select the top-k elements from the dataset that have the highest outlier scores. Our proposal has been implemented in Apache Spark using Scala and experiments have been conducted in a physical cluster running Apache Hadoop 2.7 and Apache Spark 2.4.0. Performance evaluation results demonstrate that the proposed algorithm is efficient and scalable and therefore it can be used to mine outliers in large distributed datasets

    A Cybersecurity Culture Survey Targeting Healthcare Critical Infrastructures

    No full text
    Recent studies report that cybersecurity breaches noticed in hospitals are associated with low levels of personnel’s cybersecurity awareness. This work aims to assess the cybersecurity culture in healthcare institutions from middle- to low-income EU countries. The evaluation process was designed and performed via anonymous online surveys targeting individually ICT (internet and communication technology) departments and healthcare professionals. The study was conducted in 2019 for a health region in Greece, with a significant number of hospitals and health centers, a large hospital in Portugal, and a medical clinic in Romania, with 53.6% and 6.71% response rates for the ICT and healthcare professionals, respectively. Its findings indicate the necessity of establishing individual cybersecurity departments to monitor assets and attitudes while underlying the importance of continuous security awareness training programs. The analysis of our results assists in comprehending the countermeasures, which have been implemented in the healthcare institutions, and consequently enhancing cybersecurity defense, while reducing the risk surface
    corecore