689 research outputs found
Business process variant analysis based on mutual fingerprints of event logs
Comparing business process variants using event logs is a common use case in process mining. Existing techniques for process variant analysis detect statistically-significant differences between variants at the level of individual entities (such as process activities) and their relationships (e.g. directly-follows relations between activities). This may lead to a proliferation of differences due to the low level of granularity in which such differences are captured. This paper presents a novel approach to detect statistically-significant differences between variants at the level of entire process traces (i.e. sequences of directly-follows relations). The cornerstone of this approach is a technique to learn a directly-follows graph called mutual fingerprint from the event logs of the two variants. A mutual fingerprint is a lossless encoding of a set of traces and their duration using discrete wavelet transformation. This structure facilitates the understanding of statistical differences along the control-flow and performance dimensions. The approach has been evaluated using real-life event logs against two baselines. The results show that at a trace level, the baselines cannot always reveal the differences discovered by our approach, or can detect spurious differences.This research is partly funded by the Australian Research Council (DP180102839) and Spanish funds MINECO and FEDER (TIN2017-86727-C2-1-R).Peer ReviewedPostprint (author's final draft
Software development process mining: discovery, conformance checking and enhancement
Context. Modern software projects require the proper allocation of human, technical and
financial resources. Very often, project managers make decisions supported only by their personal
experience, intuition or simply by mirroring activities performed by others in similar
contexts. Most attempts to avoid such practices use models based on lines of code, cyclomatic
complexity or effort estimators, thus commonly supported by software repositories which are
known to contain several flaws.
Objective. Demonstrate the usefulness of process data and mining methods to enhance the
software development practices, by assessing efficiency and unveil unknown process insights,
thus contributing to the creation of novel models within the software development analytics
realm.
Method. We mined the development process fragments of multiple developers in three
different scenarios by collecting Integrated Development Environment (IDE) events during their
development sessions. Furthermore, we used process and text mining to discovery developers’
workflows and their fingerprints, respectively.
Results. We discovered and modeled with good quality developers’ processes during programming
sessions based on events extracted from their IDEs. We unveiled insights from
coding practices in distinct refactoring tasks, built accurate software complexity forecast models
based only on process metrics and setup a method for characterizing coherently developers’
behaviors. The latter may ultimately lead to the creation of a catalog of software development
process smells.
Conclusions. Our approach is agnostic to programming languages, geographic location or
development practices, making it suitable for challenging contexts such as in modern global
software development projects using either traditional IDEs or sophisticated low/no code platforms.Contexto. Projetos de software modernos requerem a correta alocação de recursos humanos,
técnicos e financeiros. Frequentemente, os gestores de projeto tomam decisões suportadas
apenas na sua própria experiência, intuição ou simplesmente espelhando atividades executadas
por terceiros em contextos similares. As tentativas para evitar tais práticas baseiam-se em
modelos que usam linhas de código, a complexidade ciclomática ou em estimativas de esforço,
sendo estes tradicionalmente suportados por repositórios de software conhecidos por conterem
várias limitações.
Objetivo. Demonstrar a utilidade dos dados de processo e respetivos métodos de análise na
melhoria das práticas de desenvolvimento de software, colocando o foco na análise da eficiência
e revelando aspetos dos processos até então desconhecidos, contribuindo para a criação de
novos modelos no contexto de análises avançadas para o desenvolvimento de software.
Método. Explorámos os fragmentos de processo de vários programadores em três cenários
diferentes, recolhendo eventos durante as suas sessões de desenvolvimento no IDE. Adicionalmente,
usámos métodos de descoberta e análise de processos e texto no sentido de modelar o
fluxo de trabalho dos programadores e as suas caracterÃsticas individuais, respetivamente.
Resultados. Descobrimos e modelámos com boa qualidade os processos dos programadores
durante as suas sessões de trabalho, usando eventos provenientes dos seus IDEs. Revelámos factos
desconhecidos sobre práticas de refabricação, construÃmos modelos de previsão da complexidade
ciclomática usando apenas métricas de processo e criámos um método para caracterizar
coerentemente os comportamentos dos programadores. Este último, pode levar à criação de um
catálogo de boas/más práticas no processo de desenvolvimento de software.
Conclusões. A nossa abordagem é agnóstica em termos de linguagens de programação,
localização geográfica ou prática de desenvolvimento, tornando-a aplicável em contextos complexos
tal como em projetos modernos de desenvolvimento global que utilizam tanto os IDEs
tradicionais como as atuais e sofisticadas plataformas "low/no code"
Opportunistic timing signals for pervasive mobile localization
Mención Internacional en el tÃtulo de doctorThe proliferation of handheld devices and the pressing need of location-based services call for
precise and accurate ubiquitous geographic mobile positioning that can serve a vast set of devices.
Despite the large investments and efforts in academic and industrial communities, a pin-point solution
is however still far from reality. Mobile devices mainly rely on Global Navigation Satellite
System (GNSS) to position themselves. GNSS systems are known to perform poorly in dense urban
areas and indoor environments, where the visibility of GNSS satellites is reduced drastically.
In order to ensure interoperability between the technologies used indoor and outdoor, a pervasive
positioning system should still rely on GNSS, yet complemented with technologies that can
guarantee reliable radio signals in indoor scenarios. The key fact that we exploit is that GNSS signals
are made of data with timing information. We then investigate solutions where opportunistic
timing signals can be extracted out of terrestrial technologies. These signals can then be used as
additional inputs of the multi-lateration problem. Thus, we design and investigate a hybrid system
that combines range measurements from the Global Positioning System (GPS), the world’s
most utilized GNSS system, and terrestrial technologies; the most suitable one to consider in our
investigation is WiFi, thanks to its large deployment in indoor areas. In this context, we first start
investigating standalone WiFi Time-of-flight (ToF)-based localization. Time-of-flight echo techniques
have been recently suggested for ranging mobile devices overWiFi radios. However, these
techniques have yielded only moderate accuracy in indoor environments because WiFi ToF measurements
suffer from extensive device-related noise which makes it challenging to differentiate
between direct path from non-direct path signal components when estimating the ranges. Existing
multipath mitigation techniques tend to fail at identifying the direct path when the device-related
Gaussian noise is in the same order of magnitude, or larger than the multipath noise. In order to
address this challenge, we propose a new method for filtering ranging measurements that is better
suited for the inherent large noise as found in WiFi radios. Our technique combines statistical
learning and robust statistics in a single filter. The filter is lightweight in the sense that it does not
require specialized hardware, the intervention of the user, or cumbersome on-site manual calibration.
This makes the method we propose as the first contribution of the present work particularly
suitable for indoor localization in large-scale deployments using existing legacy WiFi infrastructures.
We evaluate our technique for indoor mobile tracking scenarios in multipath environments,
and, through extensive evaluations across four different testbeds covering areas up to 1000m2, the filter is able to achieve a median ranging error between 1:7 and 2:4 meters.
The next step we envisioned towards preparing theoretical and practical basis for the aforementioned
hybrid positioning system is a deep inspection and investigation of WiFi and GPS ToF
ranges, and initial foundations of single-technology self-localization. Self-localization systems
based on the Time-of-Flight of radio signals are highly susceptible to noise and their performance
therefore heavily rely on the design and parametrization of robust algorithms. We study the noise
sources of GPS and WiFi ToF ranging techniques and compare the performance of different selfpositioning
algorithms at a mobile node using those ranges. Our results show that the localization
error varies greatly depending on the ranging technology, algorithm selection, and appropriate
tuning of the algorithms. We characterize the localization error using real-world measurements
and different parameter settings to provide guidance for the design of robust location estimators
in realistic settings.
These tools and foundations are necessary to tackle the problem of hybrid positioning system
providing high localization capabilities across indoor and outdoor environments. In this context,
the lack of a single positioning system that is able the fulfill the specific requirements of
diverse indoor and outdoor applications settings has led the development of a multitude of localization
technologies. Existing mobile devices such as smartphones therefore commonly rely on
a multi-RAT (Radio Access Technology) architecture to provide pervasive location information
in various environmental contexts as the user is moving. Yet, existing multi-RAT architectures
consider the different localization technologies as monolithic entities and choose the final navigation
position from the RAT that is foreseen to provide the highest accuracy in the particular
context. In contrast, we propose in this work to fuse timing range (Time-of-Flight) measurements
of diverse radio technologies in order to circumvent the limitations of the individual radio access
technologies and improve the overall localization accuracy in different contexts. We introduce
an Extended Kalman filter, modeling the unique noise sources of each ranging technology. As a
rich set of multiple ranges can be available across different RATs, the intelligent selection of the
subset of ranges with accurate timing information is critical to achieve the best positioning accuracy.
We introduce a novel geometrical-statistical approach to best fuse the set of timing ranging
measurements. We also address practical problems of the design space, such as removal of WiFi
chipset and environmental calibration to make the positioning system as autonomous as possible.
Experimental results show that our solution considerably outperforms the use of monolithic
technologies and methods based on classical fault detection and identification typically applied in
standalone GPS technology.
All the contributions and research questions described previously in localization and positioning
related topics suppose full knowledge of the anchors positions. In the last part of this work, we
study the problem of deriving proximity metrics without any prior knowledge of the positions of
the WiFi access points based on WiFi fingerprints, that is, tuples of WiFi Access Points (AP) and
respective received signal strength indicator (RSSI) values. Applications that benefit from proximity
metrics are movement estimation of a single node over time, WiFi fingerprint matching for localization systems and attacks on privacy. Using a large-scale, real-world WiFi fingerprint data
set consisting of 200,000 fingerprints resulting from a large deployment of wearable WiFi sensors,
we show that metrics from related work perform poorly on real-world data. We analyze the
cause for this poor performance, and show that imperfect observations of APs with commodity
WiFi clients in the neighborhood are the root cause. We then propose improved metrics to provide
such proximity estimates, without requiring knowledge of location for the observed AP. We
address the challenge of imperfect observations of APs in the design of these improved metrics.
Our metrics allow to derive a relative distance estimate based on two observed WiFi fingerprints.
We demonstrate that their performance is superior to the related work metrics.This work has been supported by IMDEA Networks InstitutePrograma Oficial de Doctorado en IngenierÃa TelemáticaPresidente: Francisco Barceló Arroyo.- Secretario: Paolo Casari.- Vocal: Marco Fior
Data Quality Challenges in Net-Work Automation Systems Case Study of a Multinational Financial Services Corporation
With the emerging trends of IPv6 rollout, Bring Your Own Device, virtualization, cloud computing and the Internet of Things, corporations are continuously facing challenges regarding data collection and analysis processes for multiple purposes. These challenges can also be applied to network monitoring practices: available data is used not only to assess network capacity and latency, but to identify possible security breaches and bottlenecks in network performance.
This study will focus on assessing the collected network data from a multinational financial services corporation on its quality and attempts to link the concept of network data quality with process automation of network management and monitoring. Information Technology (IT) can be perceived as the lifeblood within the financial services industry, yet within the discussed case study the corporation strives to cut down operational expenditures on IT by 2,5 to 5 percent.
This study combines both theoretical and practical approaches by conducting a literature review followed by a case study of abovementioned financial organization. The literature review focuses on (a) the importance of data quality, (b) IP Address Management (IPAM), and (c) network monitoring practices. The case study discusses the implementation of a network automation solution powered by Infoblox hardware and software, which should be capable of scanning all devices in the network along with DHCP lease history while having the convenience of easy IP address management mapping. Their own defined monitoring maturity levels are also taken into consideration. Twelve data quality issues have been identified using the network data management platform during the timeline of the research which potentially hinder the network management lifecycle of monitoring, configuration, and deployment.
While network management systems are not designed to identify, document, and repair data quality issues, representing the network’s performance in terms of capability, latency and behavior is dependent on data quality on the dimensions of completeness, timeliness and accuracy. The conclusion of the research is that the newly implemented network automation system has potential to achieve better decision-making for relevant stakeholders, and to eliminate business silos by centralizing network data to one platform, supporting business strategy on an operational, tactical, and strategic level; however, data quality is one of the biggest hurdles to overcome to achieve process automation and ultimately to achieve a passive network appliance monitoring system.siirretty Doriast
A multifaceted formal analysis of end-to-end encrypted email protocols and cryptographic authentication enhancements
Largely owing to cryptography, modern messaging tools (e.g., Signal) have reached a considerable degree of sophistication, balancing advanced security features with high usability. This has not been the case for email, which however, remains the most pervasive and interoperable form of digital communication. As sensitive information (e.g., identification documents, bank statements, or the message in the email itself) is frequently exchanged by this means, protecting the privacy of email communications is a justified concern which has been emphasized in the last years.
A great deal of effort has gone into the development of tools and techniques for providing email communications with privacy and security, requirements that were not originally considered. Yet, drawbacks across several dimensions hinder the development of a global solution that would strengthen security while maintaining the standard features that we expect from email clients.
In this thesis, we present improvements to security in email communications. Relying on formal methods and cryptography, we design and assess security protocols and analysis techniques, and propose enhancements to implemented approaches for end-to-end secure email communication.
In the first part, we propose a methodical process relying on code reverse engineering, which we use to abstract the specifications of two end-to-end security protocols from a secure email solution (called pEp); then, we apply symbolic verification techniques to analyze such protocols with respect to privacy and authentication properties. We also introduce a novel formal framework that enables a system's security analysis aimed at detecting flaws caused by possible discrepancies between the user's and the system's assessment of security. Security protocols, along with user perceptions and interaction traces, are modeled as transition systems; socio-technical security properties are defined as formulas in computation tree logic (CTL), which can then be verified by model checking.
Finally, we propose a protocol that aims at securing a password-based authentication system designed to detect the leakage of a password database, from a code-corruption attack.
In the second part, the insights gained by the analysis in Part I allow us to propose both, theoretical and practical solutions for improving security and usability aspects, primarily of email communication, but from which secure messaging solutions can benefit too. The first enhancement concerns the use of password-authenticated key exchange (PAKE) protocols for entity authentication in peer-to-peer decentralized settings, as a replacement for out-of-band channels; this brings provable security to the so far empirical process, and enables the implementation of further security and usability properties (e.g., forward secrecy, secure secret retrieval). A second idea refers to the protection of weak passwords at rest and in transit, for which we propose a scheme based on the use of a one-time-password; furthermore, we consider potential approaches for improving this scheme.
The hereby presented research was conducted as part of an industrial partnership between SnT/University of Luxembourg and pEp Security S.A
- …