Search CORE

6 research outputs found

Unleashing textual descriptions of business processes

Author: Burattin Andrea
Carmona Vargas Josep
Montali Marco
Padró Lluís
Quishpi Betún Luis Hernán
Sànchez-Ferreres Josep
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Textual descriptions of processes are ubiquitous in organizations, so that documentation of the important processes can be accessible to anyone involved. Unfortunately, the value of this rich data source is hampered by the challenge of analyzing unstructured information. In this paper we propose a framework to overcome the current limitations on dealing with textual descriptions of processes. This framework considers extraction and analysis and connects to process mining via simulation. The framework is grounded in the notion of annotated textual descriptions of processes, which represents a middle-ground between formalization and accessibility, and which accounts for different modeling styles, ranging from purely imperative to purely declarative. The contributions of this paper are implemented in several tools, and case studies are highlighted.This work has been supported by MINECO and FEDER funds under grant TIN2017-86727-C2-1-R.Peer ReviewedPostprint (author's final draft

Formal Reasoning on Natural Language Descriptions of Processes

Author: A Cimatti
B Maqbool
H Leopold
J Carmona
J Claes
J Mendling
J Mendling
J Sànchez-Ferreres
M Dumas
R Dijkman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The existence of unstructured information that describes processes represents a challenge in organizations, mainly because this data cannot be directly referred into process-aware ecosystems due to ambiguities. Still, this information is important, since it encompasses aspects of a process that are left out when formalizing it on a particular modelling notation. This paper picks up this challenge and faces the problem of ambiguities by acknowledging its existence and mitigating it. Specifically, we propose a framework to partially automate the elicitation of a formal representation of a textual process description, via text annotation techniques on top of natural language processing. The result is the ATDP language, whose syntax and semantics are described in this paper. ATDP allows to explicitly cope with several interpretations of the same textual description of a process model. Moreover, we link the ATDP language to a formal reasoning engine and show several use cases. A prototype tool enabling the complete methodology has been implemented, and several examples using the tool are provided.Peer ReviewedPostprint (author's final draft

Fast Detection of Zero-Day Phishing Websites Using Machine Learning

Author: Nagunwa Thomas
Publication venue
Publication date: 01/06/2022
Field of study

The recent global growth in the number of internet users and online applications has led to a massive volume of personal data transactions taking place over the internet. In order to gain access to the valuable data and services involved for undertaking various malicious activities, attackers lure users to phishing websites that steal user credentials and other personal data required to impersonate their victims. Sophisticated phishing toolkits and flux networks are increasingly being used by attackers to create and host phishing websites, respectively, in order to increase the number of phishing attacks and evade detection. This has resulted in an increase in the number of new (zero-day) phishing websites. Anti-malware software and web browsers’ anti-phishing filters are widely used to detect the phishing websites thus preventing users from falling victim to phishing. However, these solutions mostly rely on blacklists of known phishing websites. In these techniques, the time lag between creation of a new phishing website and reporting it as malicious leaves a window during which users are exposed to the zero-day phishing websites. This has contributed to a global increase in the number of successful phishing attacks in recent years. To address the shortcoming, this research proposes three Machine Learning (ML)-based approaches for fast and highly accurate prediction of zero-day phishing websites using novel sets of prediction features. The first approach uses a novel set of 26 features based on URL structure, and webpage structure and contents to predict zero-day phishing webpages that collect users’ personal data. The other two approaches detect zero-day phishing webpages, through their hostnames, that are hosted in Fast Flux Service Networks (FFSNs) and Name Server IP Flux Networks (NSIFNs). The networks consist of frequently changing machines hosting malicious websites and their authoritative name servers respectively. The machines provide a layer of protection to the actual service hosts against blacklisting in order to prolong the active life span of the services. Consequently, the websites in these networks become more harmful than those hosted in normal networks. Aiming to address them, our second proposed approach predicts zero-day phishing hostnames hosted in FFSNs using a novel set of 56 features based on DNS, network and host characteristics of the hosting networks. Our last approach predicts zero-day phishing hostnames hosted in NSIFNs using a novel set of 11 features based on DNS and host characteristics of the hosting networks. The feature set in each approach is evaluated using 11 ML algorithms, achieving a high prediction performance with most of the algorithms. This indicates the relevance and robustness of the feature sets for their respective detection tasks. The feature sets also perform well against data collected over a later time period without retraining the data, indicating their long-term effectiveness in detecting the websites. The approaches use highly diversified feature sets which is expected to enhance the resistance to various detection evasion tactics. The measured prediction times of the first and the third approaches are sufficiently low for potential use for real-time protection of users. This thesis also introduces a multi-class classification technique for evaluating the feature sets in the second and third approaches. The technique predicts each of the hostname types as an independent outcome thus enabling experts to use type-specific measures in taking down the phishing websites. Lastly, highly accurate methods for labelling hostnames based on number of changes of IP addresses of authoritative name servers, monitored over a specific period of time, are proposed