695 research outputs found

    Mobile Device Background Sensors: Authentication vs Privacy

    Get PDF
    The increasing number of mobile devices in recent years has caused the collection of a large amount of personal information that needs to be protected. To this aim, behavioural biometrics has become very popular. But, what is the discriminative power of mobile behavioural biometrics in real scenarios? With the success of Deep Learning (DL), architectures based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), such as Long Short-Term Memory (LSTM), have shown improvements compared to traditional machine learning methods. However, these DL architectures still have limitations that need to be addressed. In response, new DL architectures like Transformers have emerged. The question is, can these new Transformers outperform previous biometric approaches? To answers to these questions, this thesis focuses on behavioural biometric authentication with data acquired from mobile background sensors (i.e., accelerometers and gyroscopes). In addition, to the best of our knowledge, this is the first thesis that explores and proposes novel behavioural biometric systems based on Transformers, achieving state-of-the-art results in gait, swipe, and keystroke biometrics. The adoption of biometrics requires a balance between security and privacy. Biometric modalities provide a unique and inherently personal approach for authentication. Nevertheless, biometrics also give rise to concerns regarding the invasion of personal privacy. According to the General Data Protection Regulation (GDPR) introduced by the European Union, personal data such as biometric data are sensitive and must be used and protected properly. This thesis analyses the impact of sensitive data in the performance of biometric systems and proposes a novel unsupervised privacy-preserving approach. The research conducted in this thesis makes significant contributions, including: i) a comprehensive review of the privacy vulnerabilities of mobile device sensors, covering metrics for quantifying privacy in relation to sensitive data, along with protection methods for safeguarding sensitive information; ii) an analysis of authentication systems for behavioural biometrics on mobile devices (i.e., gait, swipe, and keystroke), being the first thesis that explores the potential of Transformers for behavioural biometrics, introducing novel architectures that outperform the state of the art; and iii) a novel privacy-preserving approach for mobile biometric gait verification using unsupervised learning techniques, ensuring the protection of sensitive data during the verification process

    MBA: Market Basket Analysis Using Frequent Pattern Mining Techniques

    Get PDF
    This Market Basket Analysis (MBA) is a data mining technique that uses frequent pattern mining algorithms to discover patterns of co-occurrence among items that are frequently purchased together. It is commonly used in retail and e-commerce businesses to generate association rules that describe the relationships between different items, and to make recommendations to customers based on their previous purchases. MBA is a powerful tool for identifying patterns of co-occurrence and generating insights that can improve sales and marketing strategies. Although a numerous works has been carried out to handle the computational cost for discovering the frequent itemsets, but it still needs more exploration and developments. In this paper, we introduce an efficient Bitwise-Based data structure technique for mining frequent pattern in large-scale databases. The algorithm scans the original database once, using the Bitwise-Based data representations as well as vertical database layout, compared to the well-known Apriori and FP-Growth algorithm. Bitwise-Based technique enhance the problems of multiple passes over the original database, hence, minimizes the execution time. Extensive experiments have been carried out to validate our technique, which outperform Apriori, Éclat, FP-growth, and H-mine in terms of execution time for Market Basket Analysis

    Explainable temporal data mining techniques to support the prediction task in Medicine

    Get PDF
    In the last decades, the increasing amount of data available in all fields raises the necessity to discover new knowledge and explain the hidden information found. On one hand, the rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, results to users. In the biomedical informatics and computer science communities, there is considerable discussion about the `` un-explainable" nature of artificial intelligence, where often algorithms and systems leave users, and even developers, in the dark with respect to how results were obtained. Especially in the biomedical context, the necessity to explain an artificial intelligence system result is legitimate of the importance of patient safety. On the other hand, current database systems enable us to store huge quantities of data. Their analysis through data mining techniques provides the possibility to extract relevant knowledge and useful hidden information. Relationships and patterns within these data could provide new medical knowledge. The analysis of such healthcare/medical data collections could greatly help to observe the health conditions of the population and extract useful information that can be exploited in the assessment of healthcare/medical processes. Particularly, the prediction of medical events is essential for preventing disease, understanding disease mechanisms, and increasing patient quality of care. In this context, an important aspect is to verify whether the database content supports the capability of predicting future events. In this thesis, we start addressing the problem of explainability, discussing some of the most significant challenges need to be addressed with scientific and engineering rigor in a variety of biomedical domains. We analyze the ``temporal component" of explainability, focusing on detailing different perspectives such as: the use of temporal data, the temporal task, the temporal reasoning, and the dynamics of explainability in respect to the user perspective and to knowledge. Starting from this panorama, we focus our attention on two different temporal data mining techniques. The first one, based on trend abstractions, starting from the concept of Trend-Event Pattern and moving through the concept of prediction, we propose a new kind of predictive temporal patterns, namely Predictive Trend-Event Patterns (PTE-Ps). The framework aims to combine complex temporal features to extract a compact and non-redundant predictive set of patterns composed by such temporal features. The second one, based on functional dependencies, we propose a methodology for deriving a new kind of approximate temporal functional dependencies, called Approximate Predictive Functional Dependencies (APFDs), based on a three-window framework. We then discuss the concept of approximation, the data complexity of deriving an APFD, the introduction of two new error measures, and finally the quality of APFDs in terms of coverage and reliability. Exploiting these methodologies, we analyze intensive care unit data from the MIMIC dataset

    Fraction-score: a generalized support measure for weighted and maximal co-location pattern mining

    Get PDF
    Co-location patterns, which capture the phenomenon that objects with certain labels are often located in close geographic proximity, are defined based on a support measure which quantifies the prevalence of a pattern candidate in the form of a label set. Existing support measures share the idea of counting the number of instances of a given label set C as its support, where an instance of C is an object set whose objects collectively carry all labels in C and are located close to one another. However, they suffer from various weaknesses, e.g., fail to capture all possible instances, or overlook the cases when multiple instances overlap. In this paper, we propose a new measure called Fraction-Score which counts instances fractionally if they overlap. Fraction-Score captures all possible instances, and handles the cases where instances overlap appropriately (so that the supports defined are more meaningful and anti-monotonic). We develop efficient algorithms to solve the co-location pattern mining problem defined with Fraction-Score. Furthermore, to obtain representative patterns, we develop an efficient algorithm for mining the maximal co-location patterns, which are those patterns without proper superset patterns. We conduct extensive experiments using real and synthetic datasets, which verified the superiority of our proposals

    Measuring the impact of COVID-19 on hospital care pathways

    Get PDF
    Care pathways in hospitals around the world reported significant disruption during the recent COVID-19 pandemic but measuring the actual impact is more problematic. Process mining can be useful for hospital management to measure the conformance of real-life care to what might be considered normal operations. In this study, we aim to demonstrate that process mining can be used to investigate process changes associated with complex disruptive events. We studied perturbations to accident and emergency (A &E) and maternity pathways in a UK public hospital during the COVID-19 pandemic. Co-incidentally the hospital had implemented a Command Centre approach for patient-flow management affording an opportunity to study both the planned improvement and the disruption due to the pandemic. Our study proposes and demonstrates a method for measuring and investigating the impact of such planned and unplanned disruptions affecting hospital care pathways. We found that during the pandemic, both A &E and maternity pathways had measurable reductions in the mean length of stay and a measurable drop in the percentage of pathways conforming to normative models. There were no distinctive patterns of monthly mean values of length of stay nor conformance throughout the phases of the installation of the hospital’s new Command Centre approach. Due to a deficit in the available A &E data, the findings for A &E pathways could not be interpreted

    A genetic algorithm coupled with tree-based pruning for mining closed association rules

    Get PDF
    Due to the voluminous amount of itemsets that are generated, the association rules extracted from these itemsets contain redundancy, and designing an effective approach to address this issue is of paramount importance. Although multiple algorithms were proposed in recent years for mining closed association rules most of them underperform in terms of run time or memory. Another issue that remains challenging is the nature of the dataset. While some of the existing algorithms perform well on dense datasets others perform well on sparse datasets. This paper aims to handle these drawbacks by using a genetic algorithm for mining closed association rules. Recent studies have shown that genetic algorithms perform better than conventional algorithms due to their bitwise operations of crossover and mutation. Bitwise operations are predominantly faster than conventional approaches and bits consume lesser memory thereby improving the overall performance of the algorithm. To address the redundancy in the mined association rules a tree-based pruning algorithm has been designed here. This works on the principle of minimal antecedent and maximal consequent. Experiments have shown that the proposed approach works well on both dense and sparse datasets while surpassing existing techniques with regard to run time and memory

    Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability

    Get PDF
    [...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...

    Mining useful knowledge in biomedicine with pattern mining techniques

    Get PDF
    The detection of common and unique characteristics of cancer patients is one of the main focuses of current research which aims to develop personalized therapies against the disease. However, while many of the current proposals base their methodology on predictive approaches, this study proposes a descriptive approach to improve the understanding of the molecular mechanisms that give rise to cancer. This approach does not require prior knowledge and retrieves high-order gene relationships from transcriptomic data. In this sense, it is possible to obtain the set of signalling pathways affected by the disease, as well as complementary information such as the prognosis of patients through additional analyses applied to the set of solutions obtained. For this purpose, after an introduction to the omics study of cancer and the presentation of the data analysis techniques used, a methodology based on pattern mining is proposed for the transcriptomic study of the disease with the adaptation of two techniques belonging to the supervised descriptive pattern mining field. Each of these approaches constitutes a chapter of this study, the first of which is devoted to the most common type of análisis performed on any omics study (and therefore in cancer): case vs. control. This chapter details a methodology based on the use of emerging pattern-mining techniques for the descriptive and discriminatory study of the disease. The results obtained, as outlined in the chapter, are of particular interest because they provide both contrasted information (ideal for validating the methodology) and novel information in the form of high-order relationships. The second chapter extends the concept of descriptive discriminant analysis techniques to several conditions, being used this time for the study of several cancer subtypes. For this purpose, contrast set pattern mining techniques are employed. As an added value, this chapter discusses the role of the obtained results on the survival of patients suffering from the disease.La tarea de deteccion de de caracteristicas comunes y unicas de los pacientes aquejados por el cancer constituye uno de los principales focos de investigacion en la actualidad cuyo objetivo es mejorar las terapias personalizadas contra la enfermedad. Sin embargo, mientras que muchas de las propuestas actuales basan su metodologia en aproximaciones predictivas, este estudio propone una aproximacion descriptiva para mejorar la comprensión de los mecanismos moleculares que dan lugar al cancer. Esta aproximacion no requiere de conocimiento previo y es capaz de capturar relaciones genicas de alto orden a partir de datos de origen transcriptomico. En este sentido, es posible obtener el conjunto de rutas de senalizacion afectadas por la enfermedad, ademas de informacion complementaria como la prognosis de los pacientes a traves de analisis adicionales aplicados sobre el conjunto de soluciones obtenidas. Para ello, tras una introduccion al estudio omico del cancer y la presentacion de las técnicas de analisis de datos empleadas, se propone una metodologia basada en la mineria de patrones para el estudio transcriptomico de la enfermedad bajo dos tecnicas pertenecientes al dominio de la mineria de patrones descriptiva supervisada. Cada una de estas aplicaciones compone un capitulo del presente estudio, estando el primero destinado al tipo de análisis mas comun realizado sobre cualquier estudio omico (y por consiguiente en cancer): caso vs control. En este capitulo, se detalla una metodologia basada en el uso de técnicas de mineria de patrones emergentes para el estudio descriptivo y discriminatorio de la enfermedad. Los resultados obtenidos, tal y como se detallan en el capitulo, presentan un especial interes debido a que ofrecen tanto informacion contrastada (ideal para validar la metodologia) como informacion novedosa en forma de relaciones de alto orden. El segundo capitulo extiende el concepto de las tecnicas de analisis descriptivas discriminatorias a varias condiciones, en este caso al estudio de varios subtipos de cancer. Para ello, en esta ocasion se hace uso de tecnicas de mineria de patrones de contraste. Como valor anadido, este capitulo discute las implicaciones de los resultados obtenidos sobre la supervivencia de los pacientes aquejados por la enfermedad

    Signature-based Tree for Finding Frequent Itemsets

    Get PDF
    The efficiency of a data mining process depends on the data structure used to find frequent itemsets. Two approaches are possible: use the original transaction dataset or transform it into another more compact structure. Many algorithms use trees as compact structure, like FP-Tree and the associated algorithm FP-Growth. Although this structure reduces the number of scans (only 2), its efficiency depends on two criteria: (i) the size of the support (small or large); (ii) the type of transaction dataset (sparse or dense). But these two criteria can generate very large trees. In this paper, we propose a new tree-based structure that emphasizes on transactions and not on itemsets. Hence, we avoid the problem of support values that have a negative impact on the generated tree

    A comprehensive survey of V2X cybersecurity mechanisms and future research paths

    Get PDF
    Recent advancements in vehicle-to-everything (V2X) communication have notably improved existing transport systems by enabling increased connectivity and driving autonomy levels. The remarkable benefits of V2X connectivity come inadvertently with challenges which involve security vulnerabilities and breaches. Addressing security concerns is essential for seamless and safe operation of mission-critical V2X use cases. This paper surveys current literature on V2X security and provides a systematic and comprehensive review of the most relevant security enhancements to date. An in-depth classification of V2X attacks is first performed according to key security and privacy requirements. Our methodology resumes with a taxonomy of security mechanisms based on their proactive/reactive defensive approach, which helps identify strengths and limitations of state-of-the-art countermeasures for V2X attacks. In addition, this paper delves into the potential of emerging security approaches leveraging artificial intelligence tools to meet security objectives. Promising data-driven solutions tailored to tackle security, privacy and trust issues are thoroughly discussed along with new threat vectors introduced inevitably by these enablers. The lessons learned from the detailed review of existing works are also compiled and highlighted. We conclude this survey with a structured synthesis of open challenges and future research directions to foster contributions in this prominent field.This work is supported by the H2020-INSPIRE-5Gplus project (under Grant agreement No. 871808), the ”Ministerio de Asuntos Económicos y Transformacion Digital” and the European Union-NextGenerationEU in the frameworks of the ”Plan de Recuperación, Transformación y Resiliencia” and of the ”Mecanismo de Recuperación y Resiliencia” under references TSI-063000-2021-39/40/41, and the CHIST-ERA-17-BDSI-003 FIREMAN project funded by the Spanish National Foundation (Grant PCI2019-103780).Peer ReviewedPostprint (published version
    corecore