689 research outputs found

    Business process variant analysis based on mutual fingerprints of event logs

    Get PDF
    Comparing business process variants using event logs is a common use case in process mining. Existing techniques for process variant analysis detect statistically-significant differences between variants at the level of individual entities (such as process activities) and their relationships (e.g. directly-follows relations between activities). This may lead to a proliferation of differences due to the low level of granularity in which such differences are captured. This paper presents a novel approach to detect statistically-significant differences between variants at the level of entire process traces (i.e. sequences of directly-follows relations). The cornerstone of this approach is a technique to learn a directly-follows graph called mutual fingerprint from the event logs of the two variants. A mutual fingerprint is a lossless encoding of a set of traces and their duration using discrete wavelet transformation. This structure facilitates the understanding of statistical differences along the control-flow and performance dimensions. The approach has been evaluated using real-life event logs against two baselines. The results show that at a trace level, the baselines cannot always reveal the differences discovered by our approach, or can detect spurious differences.This research is partly funded by the Australian Research Council (DP180102839) and Spanish funds MINECO and FEDER (TIN2017-86727-C2-1-R).Peer ReviewedPostprint (author's final draft

    Software development process mining: discovery, conformance checking and enhancement

    Get PDF
    Context. Modern software projects require the proper allocation of human, technical and financial resources. Very often, project managers make decisions supported only by their personal experience, intuition or simply by mirroring activities performed by others in similar contexts. Most attempts to avoid such practices use models based on lines of code, cyclomatic complexity or effort estimators, thus commonly supported by software repositories which are known to contain several flaws. Objective. Demonstrate the usefulness of process data and mining methods to enhance the software development practices, by assessing efficiency and unveil unknown process insights, thus contributing to the creation of novel models within the software development analytics realm. Method. We mined the development process fragments of multiple developers in three different scenarios by collecting Integrated Development Environment (IDE) events during their development sessions. Furthermore, we used process and text mining to discovery developers’ workflows and their fingerprints, respectively. Results. We discovered and modeled with good quality developers’ processes during programming sessions based on events extracted from their IDEs. We unveiled insights from coding practices in distinct refactoring tasks, built accurate software complexity forecast models based only on process metrics and setup a method for characterizing coherently developers’ behaviors. The latter may ultimately lead to the creation of a catalog of software development process smells. Conclusions. Our approach is agnostic to programming languages, geographic location or development practices, making it suitable for challenging contexts such as in modern global software development projects using either traditional IDEs or sophisticated low/no code platforms.Contexto. Projetos de software modernos requerem a correta alocação de recursos humanos, técnicos e financeiros. Frequentemente, os gestores de projeto tomam decisões suportadas apenas na sua própria experiência, intuição ou simplesmente espelhando atividades executadas por terceiros em contextos similares. As tentativas para evitar tais práticas baseiam-se em modelos que usam linhas de código, a complexidade ciclomática ou em estimativas de esforço, sendo estes tradicionalmente suportados por repositórios de software conhecidos por conterem várias limitações. Objetivo. Demonstrar a utilidade dos dados de processo e respetivos métodos de análise na melhoria das práticas de desenvolvimento de software, colocando o foco na análise da eficiência e revelando aspetos dos processos até então desconhecidos, contribuindo para a criação de novos modelos no contexto de análises avançadas para o desenvolvimento de software. Método. Explorámos os fragmentos de processo de vários programadores em três cenários diferentes, recolhendo eventos durante as suas sessões de desenvolvimento no IDE. Adicionalmente, usámos métodos de descoberta e análise de processos e texto no sentido de modelar o fluxo de trabalho dos programadores e as suas características individuais, respetivamente. Resultados. Descobrimos e modelámos com boa qualidade os processos dos programadores durante as suas sessões de trabalho, usando eventos provenientes dos seus IDEs. Revelámos factos desconhecidos sobre práticas de refabricação, construímos modelos de previsão da complexidade ciclomática usando apenas métricas de processo e criámos um método para caracterizar coerentemente os comportamentos dos programadores. Este último, pode levar à criação de um catálogo de boas/más práticas no processo de desenvolvimento de software. Conclusões. A nossa abordagem é agnóstica em termos de linguagens de programação, localização geográfica ou prática de desenvolvimento, tornando-a aplicável em contextos complexos tal como em projetos modernos de desenvolvimento global que utilizam tanto os IDEs tradicionais como as atuais e sofisticadas plataformas "low/no code"

    Comparative process mining:analyzing variability in process data

    Get PDF

    Comparative process mining:analyzing variability in process data

    Get PDF

    Logs and Models in Engineering Complex Embedded Production Software Systems

    Get PDF

    Opportunistic timing signals for pervasive mobile localization

    Get PDF
    Mención Internacional en el título de doctorThe proliferation of handheld devices and the pressing need of location-based services call for precise and accurate ubiquitous geographic mobile positioning that can serve a vast set of devices. Despite the large investments and efforts in academic and industrial communities, a pin-point solution is however still far from reality. Mobile devices mainly rely on Global Navigation Satellite System (GNSS) to position themselves. GNSS systems are known to perform poorly in dense urban areas and indoor environments, where the visibility of GNSS satellites is reduced drastically. In order to ensure interoperability between the technologies used indoor and outdoor, a pervasive positioning system should still rely on GNSS, yet complemented with technologies that can guarantee reliable radio signals in indoor scenarios. The key fact that we exploit is that GNSS signals are made of data with timing information. We then investigate solutions where opportunistic timing signals can be extracted out of terrestrial technologies. These signals can then be used as additional inputs of the multi-lateration problem. Thus, we design and investigate a hybrid system that combines range measurements from the Global Positioning System (GPS), the world’s most utilized GNSS system, and terrestrial technologies; the most suitable one to consider in our investigation is WiFi, thanks to its large deployment in indoor areas. In this context, we first start investigating standalone WiFi Time-of-flight (ToF)-based localization. Time-of-flight echo techniques have been recently suggested for ranging mobile devices overWiFi radios. However, these techniques have yielded only moderate accuracy in indoor environments because WiFi ToF measurements suffer from extensive device-related noise which makes it challenging to differentiate between direct path from non-direct path signal components when estimating the ranges. Existing multipath mitigation techniques tend to fail at identifying the direct path when the device-related Gaussian noise is in the same order of magnitude, or larger than the multipath noise. In order to address this challenge, we propose a new method for filtering ranging measurements that is better suited for the inherent large noise as found in WiFi radios. Our technique combines statistical learning and robust statistics in a single filter. The filter is lightweight in the sense that it does not require specialized hardware, the intervention of the user, or cumbersome on-site manual calibration. This makes the method we propose as the first contribution of the present work particularly suitable for indoor localization in large-scale deployments using existing legacy WiFi infrastructures. We evaluate our technique for indoor mobile tracking scenarios in multipath environments, and, through extensive evaluations across four different testbeds covering areas up to 1000m2, the filter is able to achieve a median ranging error between 1:7 and 2:4 meters. The next step we envisioned towards preparing theoretical and practical basis for the aforementioned hybrid positioning system is a deep inspection and investigation of WiFi and GPS ToF ranges, and initial foundations of single-technology self-localization. Self-localization systems based on the Time-of-Flight of radio signals are highly susceptible to noise and their performance therefore heavily rely on the design and parametrization of robust algorithms. We study the noise sources of GPS and WiFi ToF ranging techniques and compare the performance of different selfpositioning algorithms at a mobile node using those ranges. Our results show that the localization error varies greatly depending on the ranging technology, algorithm selection, and appropriate tuning of the algorithms. We characterize the localization error using real-world measurements and different parameter settings to provide guidance for the design of robust location estimators in realistic settings. These tools and foundations are necessary to tackle the problem of hybrid positioning system providing high localization capabilities across indoor and outdoor environments. In this context, the lack of a single positioning system that is able the fulfill the specific requirements of diverse indoor and outdoor applications settings has led the development of a multitude of localization technologies. Existing mobile devices such as smartphones therefore commonly rely on a multi-RAT (Radio Access Technology) architecture to provide pervasive location information in various environmental contexts as the user is moving. Yet, existing multi-RAT architectures consider the different localization technologies as monolithic entities and choose the final navigation position from the RAT that is foreseen to provide the highest accuracy in the particular context. In contrast, we propose in this work to fuse timing range (Time-of-Flight) measurements of diverse radio technologies in order to circumvent the limitations of the individual radio access technologies and improve the overall localization accuracy in different contexts. We introduce an Extended Kalman filter, modeling the unique noise sources of each ranging technology. As a rich set of multiple ranges can be available across different RATs, the intelligent selection of the subset of ranges with accurate timing information is critical to achieve the best positioning accuracy. We introduce a novel geometrical-statistical approach to best fuse the set of timing ranging measurements. We also address practical problems of the design space, such as removal of WiFi chipset and environmental calibration to make the positioning system as autonomous as possible. Experimental results show that our solution considerably outperforms the use of monolithic technologies and methods based on classical fault detection and identification typically applied in standalone GPS technology. All the contributions and research questions described previously in localization and positioning related topics suppose full knowledge of the anchors positions. In the last part of this work, we study the problem of deriving proximity metrics without any prior knowledge of the positions of the WiFi access points based on WiFi fingerprints, that is, tuples of WiFi Access Points (AP) and respective received signal strength indicator (RSSI) values. Applications that benefit from proximity metrics are movement estimation of a single node over time, WiFi fingerprint matching for localization systems and attacks on privacy. Using a large-scale, real-world WiFi fingerprint data set consisting of 200,000 fingerprints resulting from a large deployment of wearable WiFi sensors, we show that metrics from related work perform poorly on real-world data. We analyze the cause for this poor performance, and show that imperfect observations of APs with commodity WiFi clients in the neighborhood are the root cause. We then propose improved metrics to provide such proximity estimates, without requiring knowledge of location for the observed AP. We address the challenge of imperfect observations of APs in the design of these improved metrics. Our metrics allow to derive a relative distance estimate based on two observed WiFi fingerprints. We demonstrate that their performance is superior to the related work metrics.This work has been supported by IMDEA Networks InstitutePrograma Oficial de Doctorado en Ingeniería TelemáticaPresidente: Francisco Barceló Arroyo.- Secretario: Paolo Casari.- Vocal: Marco Fior

    Data Quality Challenges in Net-Work Automation Systems Case Study of a Multinational Financial Services Corporation

    Get PDF
    With the emerging trends of IPv6 rollout, Bring Your Own Device, virtualization, cloud computing and the Internet of Things, corporations are continuously facing challenges regarding data collection and analysis processes for multiple purposes. These challenges can also be applied to network monitoring practices: available data is used not only to assess network capacity and latency, but to identify possible security breaches and bottlenecks in network performance. This study will focus on assessing the collected network data from a multinational financial services corporation on its quality and attempts to link the concept of network data quality with process automation of network management and monitoring. Information Technology (IT) can be perceived as the lifeblood within the financial services industry, yet within the discussed case study the corporation strives to cut down operational expenditures on IT by 2,5 to 5 percent. This study combines both theoretical and practical approaches by conducting a literature review followed by a case study of abovementioned financial organization. The literature review focuses on (a) the importance of data quality, (b) IP Address Management (IPAM), and (c) network monitoring practices. The case study discusses the implementation of a network automation solution powered by Infoblox hardware and software, which should be capable of scanning all devices in the network along with DHCP lease history while having the convenience of easy IP address management mapping. Their own defined monitoring maturity levels are also taken into consideration. Twelve data quality issues have been identified using the network data management platform during the timeline of the research which potentially hinder the network management lifecycle of monitoring, configuration, and deployment. While network management systems are not designed to identify, document, and repair data quality issues, representing the network’s performance in terms of capability, latency and behavior is dependent on data quality on the dimensions of completeness, timeliness and accuracy. The conclusion of the research is that the newly implemented network automation system has potential to achieve better decision-making for relevant stakeholders, and to eliminate business silos by centralizing network data to one platform, supporting business strategy on an operational, tactical, and strategic level; however, data quality is one of the biggest hurdles to overcome to achieve process automation and ultimately to achieve a passive network appliance monitoring system.siirretty Doriast

    A multifaceted formal analysis of end-to-end encrypted email protocols and cryptographic authentication enhancements

    Get PDF
    Largely owing to cryptography, modern messaging tools (e.g., Signal) have reached a considerable degree of sophistication, balancing advanced security features with high usability. This has not been the case for email, which however, remains the most pervasive and interoperable form of digital communication. As sensitive information (e.g., identification documents, bank statements, or the message in the email itself) is frequently exchanged by this means, protecting the privacy of email communications is a justified concern which has been emphasized in the last years. A great deal of effort has gone into the development of tools and techniques for providing email communications with privacy and security, requirements that were not originally considered. Yet, drawbacks across several dimensions hinder the development of a global solution that would strengthen security while maintaining the standard features that we expect from email clients. In this thesis, we present improvements to security in email communications. Relying on formal methods and cryptography, we design and assess security protocols and analysis techniques, and propose enhancements to implemented approaches for end-to-end secure email communication. In the first part, we propose a methodical process relying on code reverse engineering, which we use to abstract the specifications of two end-to-end security protocols from a secure email solution (called pEp); then, we apply symbolic verification techniques to analyze such protocols with respect to privacy and authentication properties. We also introduce a novel formal framework that enables a system's security analysis aimed at detecting flaws caused by possible discrepancies between the user's and the system's assessment of security. Security protocols, along with user perceptions and interaction traces, are modeled as transition systems; socio-technical security properties are defined as formulas in computation tree logic (CTL), which can then be verified by model checking. Finally, we propose a protocol that aims at securing a password-based authentication system designed to detect the leakage of a password database, from a code-corruption attack. In the second part, the insights gained by the analysis in Part I allow us to propose both, theoretical and practical solutions for improving security and usability aspects, primarily of email communication, but from which secure messaging solutions can benefit too. The first enhancement concerns the use of password-authenticated key exchange (PAKE) protocols for entity authentication in peer-to-peer decentralized settings, as a replacement for out-of-band channels; this brings provable security to the so far empirical process, and enables the implementation of further security and usability properties (e.g., forward secrecy, secure secret retrieval). A second idea refers to the protection of weak passwords at rest and in transit, for which we propose a scheme based on the use of a one-time-password; furthermore, we consider potential approaches for improving this scheme. The hereby presented research was conducted as part of an industrial partnership between SnT/University of Luxembourg and pEp Security S.A

    8th SC@RUG 2011 proceedings:Student Colloquium 2010-2011

    Get PDF
    • …
    corecore