506 research outputs found

    A Survey on Forensics and Compliance Auditing for Critical Infrastructure Protection

    Get PDF
    The broadening dependency and reliance that modern societies have on essential services provided by Critical Infrastructures is increasing the relevance of their trustworthiness. However, Critical Infrastructures are attractive targets for cyberattacks, due to the potential for considerable impact, not just at the economic level but also in terms of physical damage and even loss of human life. Complementing traditional security mechanisms, forensics and compliance audit processes play an important role in ensuring Critical Infrastructure trustworthiness. Compliance auditing contributes to checking if security measures are in place and compliant with standards and internal policies. Forensics assist the investigation of past security incidents. Since these two areas significantly overlap, in terms of data sources, tools and techniques, they can be merged into unified Forensics and Compliance Auditing (FCA) frameworks. In this paper, we survey the latest developments, methodologies, challenges, and solutions addressing forensics and compliance auditing in the scope of Critical Infrastructure Protection. This survey focuses on relevant contributions, capable of tackling the requirements imposed by massively distributed and complex Industrial Automation and Control Systems, in terms of handling large volumes of heterogeneous data (that can be noisy, ambiguous, and redundant) for analytic purposes, with adequate performance and reliability. The achieved results produced a taxonomy in the field of FCA whose key categories denote the relevant topics in the literature. Also, the collected knowledge resulted in the establishment of a reference FCA architecture, proposed as a generic template for a converged platform. These results are intended to guide future research on forensics and compliance auditing for Critical Infrastructure Protection.info:eu-repo/semantics/publishedVersio

    Colossal Trajectory Mining: A unifying approach to mine behavioral mobility patterns

    Get PDF
    Spatio-temporal mobility patterns are at the core of strategic applications such as urban planning and monitoring. Depending on the strength of spatio-temporal constraints, different mobility patterns can be defined. While existing approaches work well in the extraction of groups of objects sharing fine-grained paths, the huge volume of large-scale data asks for coarse-grained solutions. In this paper, we introduce Colossal Trajectory Mining (CTM) to efficiently extract heterogeneous mobility patterns out of a multidimensional space that, along with space and time dimensions, can consider additional trajectory features (e.g., means of transport or activity) to characterize behavioral mobility patterns. The algorithm is natively designed in a distributed fashion, and the experimental evaluation shows its scalability with respect to the involved features and the cardinality of the trajectory dataset

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Transparent Forecasting Strategies in Database Management Systems

    Get PDF
    Whereas traditional data warehouse systems assume that data is complete or has been carefully preprocessed, increasingly more data is imprecise, incomplete, and inconsistent. This is especially true in the context of big data, where massive amount of data arrives continuously in real-time from vast data sources. Nevertheless, modern data analysis involves sophisticated statistical algorithm that go well beyond traditional BI and, additionally, is increasingly performed by non-expert users. Both trends require transparent data mining techniques that efficiently handle missing data and present a complete view of the database to the user. Time series forecasting estimates future, not yet available, data of a time series and represents one way of dealing with missing data. Moreover, it enables queries that retrieve a view of the database at any point in time - past, present, and future. This article presents an overview of forecasting techniques in database management systems. After discussing possible application areas for time series forecasting, we give a short mathematical background of the main forecasting concepts. We then outline various general strategies of integrating time series forecasting inside a database and discuss some individual techniques from the database community. We conclude this article by introducing a novel forecasting-enabled database management architecture that natively and transparently integrates forecast models

    Fraction-score: a generalized support measure for weighted and maximal co-location pattern mining

    Get PDF
    Co-location patterns, which capture the phenomenon that objects with certain labels are often located in close geographic proximity, are defined based on a support measure which quantifies the prevalence of a pattern candidate in the form of a label set. Existing support measures share the idea of counting the number of instances of a given label set C as its support, where an instance of C is an object set whose objects collectively carry all labels in C and are located close to one another. However, they suffer from various weaknesses, e.g., fail to capture all possible instances, or overlook the cases when multiple instances overlap. In this paper, we propose a new measure called Fraction-Score which counts instances fractionally if they overlap. Fraction-Score captures all possible instances, and handles the cases where instances overlap appropriately (so that the supports defined are more meaningful and anti-monotonic). We develop efficient algorithms to solve the co-location pattern mining problem defined with Fraction-Score. Furthermore, to obtain representative patterns, we develop an efficient algorithm for mining the maximal co-location patterns, which are those patterns without proper superset patterns. We conduct extensive experiments using real and synthetic datasets, which verified the superiority of our proposals

    An evaluation of time series forecasting models on water consumption data: A case study of Greece

    Full text link
    In recent years, the increased urbanization and industrialization has led to a rising water demand and resources, thus increasing the gap between demand and supply. Proper water distribution and forecasting of water consumption are key factors in mitigating the imbalance of supply and demand by improving operations, planning and management of water resources. To this end, in this paper, several well-known forecasting algorithms are evaluated over time series, water consumption data from Greece, a country with diverse socio-economic and urbanization issues. The forecasting algorithms are evaluated on a real-world dataset provided by the Water Supply and Sewerage Company of Greece revealing key insights about each algorithm and its use

    A Novel Frequent Pattern Mining Algorithm for Evaluating Applicability of a Mobile Learning Framework

    Get PDF
    The applicability of a mobile learning system reflects how it works in an actual situation under diverse conditions In previous studies researches for evaluating applicability in learning systems using data mining approaches are challenging to find The main objective of this study is to evaluate the applicability of the proposed mobile learning framework This framework consists of seven independent variables and their influencing factors Initially 1000 students and teachers were allowed to use the mobile learning system developed based on the proposed mobile learning framework The authors implemented the system using Moodle mobile learning environment and used its transaction log file for evaluation Transactional records that were generated due to various user activities with the facilities integrated into the system were extracted These activities were classified under eight different features i e chat forum quiz assignment book video game and app usage in thousand transactional rows A novel pattern mining algorithm namely Binary Total for Pattern Mining BTPM was developed using the above transactional dataset s binary incidence matrix format to test the system applicability Similarly Apriori frequent itemsets mining and Frequent Pattern FP Growth mining algorithms were applied to the same dataset to predict system applicabilit

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    A Federated Computational Workflow for Analysis of DISKOS Digital Palynological Slides.

    Get PDF
    A novel federated computational workflow for analyzing digital palynological slide images is implemented in this thesis. The slide data files, typically exceeding 3GB, present significant data mobility and computation challenges. The novel distributed computational framework is implemented to address privacy concerns and the challenges associated with moving large data. The idea is to move computational to the data location, optimally utilizing local computational capacity and reducing data movement. Trained deep-learning models deployed in a containerized environment leveraging the Docker technology are integrated in the workflow with a user-friendly interface, and users can run processes with the trained models.\\ The workflow processes include reading slide image files, generating tiled images, and identifying and removing undesirable tiles such as blank tiles. Object detection with the watershed segmentation algorithm identifies tiles with potential microfossils. The identified dinoflagellates are classified with a trained convolution neural network (CNN) model. The classification results are sent to the host and shared with the users. The federated computational approach effectively addresses the challenges related to moving and handling large palynological slide images, creating a more efficient, scalable, and distributed pipeline. Collaborative efforts involving domain experts for model training with more annotated slide images will improve the effectiveness of the workflow
    • …
    corecore