Search CORE

3,254 research outputs found

Extracting Patterns from Educational Traces via Clustering and Associated Quality Metrics

Author: A Patrikainen
AP Dempster
B Mirkin
BFA Hompes
DA Jackson
DL Wallace
EB Fowlkes
I Jugo
KR Koedinger
L Kaufman
M Hall
M Meilă
PHA Sneath
PJ Rousseeuw
S Dasgupta
WM Rand
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Data-driven design of intelligent wireless networks: an overview and tutorial

Author: De Poorter Eli
Deschrijver Dirk
Fortuna Carolina
Kulin Merima
Moerman Ingrid
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis

Author: Abdelkefi Atef
Jiang Yuming
Sharma Sachin
Publication venue
Publication date: 24/11/2017
Field of study

In this paper, we propose a novel approach, called SENATUS, for joint traffic anomaly detection and root-cause analysis. Inspired from the concept of a senate, the key idea of the proposed approach is divided into three stages: election, voting and decision. At the election stage, a small number of \nop{traffic flow sets (termed as senator flows)}senator flows are chosen\nop{, which are used} to represent approximately the total (usually huge) set of traffic flows. In the voting stage, anomaly detection is applied on the senator flows and the detected anomalies are correlated to identify the most possible anomalous time bins. Finally in the decision stage, a machine learning technique is applied to the senator flows of each anomalous time bin to find the root cause of the anomalies. We evaluate SENATUS using traffic traces collected from the Pan European network, GEANT, and compare against another approach which detects anomalies using lossless compression of traffic histograms. We show the effectiveness of SENATUS in diagnosing anomaly types: network scans and DoS/DDoS attacks

arXiv.org e-Print Archive

Crossref

TRAP

NORA - Norwegian Open Research Archives

Improving data preparation for the application of process mining

Author: Ramos Gutiérrez Belén
Publication venue
Publication date: 07/02/2023
Field of study

Immersed in what is already known as the fourth industrial revolution, automation and data exchange are taking on a particularly relevant role in complex environments, such as industrial manufacturing environments or logistics. This digitisation and transition to the Industry 4.0 paradigm is causing experts to start analysing business processes from other perspectives. Consequently, where management and business intelligence used to dominate, process mining appears as a link, trying to build a bridge between both disciplines to unite and improve them. This new perspective on process analysis helps to improve strategic decision making and competitive capabilities. Process mining brings together data and process perspectives in a single discipline that covers the entire spectrum of process management. Through process mining, and based on observations of their actual operations, organisations can understand the state of their operations, detect deviations, and improve their performance based on what they observe. In this way, process mining is an ally, occupying a large part of current academic and industrial research. However, although this discipline is receiving more and more attention, it presents severe application problems when it is implemented in real environments. The variety of input data in terms of form, content, semantics, and levels of abstraction makes the execution of process mining tasks in industry an iterative, tedious, and manual process, requiring multidisciplinary experts with extensive knowledge of the domain, process management, and data processing. Currently, although there are numerous academic proposals, there are no industrial solutions capable of automating these tasks. For this reason, in this thesis by compendium we address the problem of improving business processes in complex environments thanks to the study of the state-of-the-art and a set of proposals that improve relevant aspects in the life cycle of processes, from the creation of logs, log preparation, process quality assessment, and improvement of business processes. Firstly, for this thesis, a systematic study of the literature was carried out in order to gain an in-depth knowledge of the state-of-the-art in this field, as well as the different challenges faced by this discipline. This in-depth analysis has allowed us to detect a number of challenges that have not been addressed or received insufficient attention, of which three have been selected and presented as the objectives of this thesis. The first challenge is related to the assessment of the quality of input data, known as event logs, since the requeriment of the application of techniques for improving the event log must be based on the level of quality of the initial data, which is why this thesis presents a methodology and a set of metrics that support the expert in selecting which technique to apply to the data according to the quality estimation at each moment, another challenge obtained as a result of our analysis of the literature. Likewise, the use of a set of metrics to evaluate the quality of the resulting process models is also proposed, with the aim of assessing whether improvement in the quality of the input data has a direct impact on the final results. The second challenge identified is the need to improve the input data used in the analysis of business processes. As in any data-driven discipline, the quality of the results strongly depends on the quality of the input data, so the second challenge to be addressed is the improvement of the preparation of event logs. The contribution in this area is the application of natural language processing techniques to relabel activities from textual descriptions of process activities, as well as the application of clustering techniques to help simplify the results, generating more understandable models from a human point of view. Finally, the third challenge detected is related to the process optimisation, so we contribute with an approach for the optimisation of resources associated with business processes, which, through the inclusion of decision-making in the creation of flexible processes, enables significant cost reductions. Furthermore, all the proposals made in this thesis are validated and designed in collaboration with experts from different fields of industry and have been evaluated through real case studies in public and private projects in collaboration with the aeronautical industry and the logistics sector

idUS. Depósito de Investigación Universidad de Sevilla

Analysis of Digital Footprints Associated with Cybersecurity Behavior Patterns of Users of University Information and Education Systems

Author: Desyatko Alona
Kryvoruchko Olena
Kurbaiyazov Nurgazy
Lakhno Miroslav
Lakhno Valerii
Tsiutsiura Mykola
Tsiutsiura Svitlana
Publication venue: Electronics and Telecommunications Committee
Publication date: 18/07/2024
Field of study

The analysis of digital footprints (DF) related to the cybersecurity (cyber risk) user behavior of university information and education systems (UIES) involves the study and evaluation of various aspects of activity in the systems. In particular, such analysis includes the study of typical patterns (patterns) of access to UIES, password usage, network activity, compliance with security policies, identification of anomalous behavior, and more. It is shown that user behavior in UIES is represented by sequences of actions and can be analyzed using the sequential analysis method. Such analysis will allow information security (IS) systems of UIES to efficiently process categorical data associated with sequential patterns of user actions. It is shown that analyzing sequential patterns of cyberthreatening user behavior will allow UIES IS systems to identify more complex threats that may be hidden in chains of actions, not just individual events. This will allow for more effective identification of potential threats and prevention of security incidents in the UIES.

International Journal of Electronics and Telecommunications (Warsaw University of Technology)

LUNA: A Model-Based Universal Analysis Framework for Large Language Models

Author: Huang Yuheng
Juefei-Xu Felix
Ma Lei
Song Da
Song Jiayang
Xie Xuan
Zhu Derui
Publication venue
Publication date: 22/10/2023
Field of study

Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, LLMs have made rapid advancements that have propelled AI to a new level, enabling even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large model scale, and autoregressive generation schema, differ from classic AI software based on CNNs and RNNs and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand. Towards bridging this gap, we initiate an early exploratory study and propose a universal analysis framework for LLMs, LUNA, designed to be general and extensible, to enable versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset, which is empowered by various abstract model construction methods. To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes.Comment: 44 pages, 9 figure

arXiv.org e-Print Archive

From Monoliths to Microservices: automating service boundary detection

Author: Rúben Xavier Cruz de Jesus
Publication venue
Publication date: 25/10/2021
Field of study

Repositório Aberto da Universidade do Porto

The application of process mining to care pathway analysis in the NHS

Author: Siddiqi Bushra
Publication venue: School of Public Health, Imperial College London
Publication date: 01/12/2017
Field of study

Background: Prostate cancer is the most common cancer in men in the UK and the sixth-fastest increasing cancer in males. Within England survival rates are improving, however, these are comparatively poorer than other countries. Currently, information available on outcomes of care is scant and there is an urgent need for techniques to improve healthcare systems and processes. Aims: To provide prostate cancer pathway analysis, by applying concepts of process mining and visualisation and comparing the performance metrics against the standard pathway laid out by national guidelines. Methods: A systematic review was conducted to see how process mining has been used in healthcare. Appropriate datasets for prostate cancer were identified within Imperial College Healthcare NHS Trust London. A process model was constructed by linking and transforming cohort data from six distinct database sources. The cohort dataset was filtered to include patients who had a PSA from 2010-2015, and validated by comparing the medical patient records against a Case-note audit. Process mining techniques were applied to the data to analyse performance and conformance of the prostate cancer pathway metrics to national guideline metrics. These techniques were evaluated with stakeholders to ascertain its impact on user experience. Results: Case note audit revealed 90% match against patients found in medical records. Application of process mining techniques showed massive heterogeneity as compared to the homogenous path laid out by national guidelines. This also gave insight into bottlenecks and deviations in the pathway. Evaluation with stakeholders showed that the visualisation and technology was well accepted, high quality and recommended to be used in healthcare decision making. Conclusion: Process mining is a promising technique used to give insight into complex and flexible healthcare processes. It can map the patient journey at a local level and audit it against explicit standards of good clinical practice, which will enable us to intervene at the individual and system level to improve care.Open Acces

Spiral - Imperial College Digital Repository