561 research outputs found

    Exploiting Large Language Models to Train Automatic Detectors of Sensitive Data

    Get PDF
    openThis thesis proposes an automated system designed to identify sensitive data within text documents, aligning with the definitions and regulations outlined in the General Data Protection Regulation (GDPR). It reviews the current state of the art in Personally Identifiable Information (PII) and sensitive data detection, and how machine learning models for Natural Language Processing (NLP) are tailored to perform these tasks. A critical challenge addressed in this work pertains to the acquisition of suitable datasets for the training and evaluation of the proposed system. To overcome this obstacle, we explore the use of Large Language Model (LLM)s to generate synthetic datasets, thus serving as a valuable resource for training classification models. Both proprietary and open-source LLMs are leveraged to investigate the capabilities of local models in document generation. It then presents a comprehensive framework for sensitive data detection, covering six key domains and proposing specific criteria to identify the disclosure of sensitive data, which take into account the context and the domain relevance. To achieve the detection of sensitive data, a variety of models are explored, mainly based on the Transformer architecture (Bidirectional Encoder Representations from Transformers (BERT)), adapted to fulfill tasks of text classification and Named Entity Recognition (NER). It evaluates the performance of the models using fine-grained metrics, and shows that the NER model achieves the best results (90% score) when trained interchangeably on both datasets, also confirming the quality of the dataset generated with the open source LLM.This thesis proposes an automated system designed to identify sensitive data within text documents, aligning with the definitions and regulations outlined in the General Data Protection Regulation (GDPR). It reviews the current state of the art in Personally Identifiable Information (PII) and sensitive data detection, and how machine learning models for Natural Language Processing (NLP) are tailored to perform these tasks. A critical challenge addressed in this work pertains to the acquisition of suitable datasets for the training and evaluation of the proposed system. To overcome this obstacle, we explore the use of Large Language Model (LLM)s to generate synthetic datasets, thus serving as a valuable resource for training classification models. Both proprietary and open-source LLMs are leveraged to investigate the capabilities of local models in document generation. It then presents a comprehensive framework for sensitive data detection, covering six key domains and proposing specific criteria to identify the disclosure of sensitive data, which take into account the context and the domain relevance. To achieve the detection of sensitive data, a variety of models are explored, mainly based on the Transformer architecture (Bidirectional Encoder Representations from Transformers (BERT)), adapted to fulfill tasks of text classification and Named Entity Recognition (NER). It evaluates the performance of the models using fine-grained metrics, and shows that the NER model achieves the best results (90% score) when trained interchangeably on both datasets, also confirming the quality of the dataset generated with the open source LLM

    Retail payments and the real economy

    Full text link
    This paper examines the fundamental relationship between retail payments and the real economy. Using data from across 27 European markets over the period 1995-2009, the results confirm that migration to efficient electronic retail payments stimulates the overall economy, consumption and trade. Among different payment instruments, this relationship is strongest for card payments, followed by credit transfers. Cheque payments are found to have a relatively low macroeconomic impact. Retail payment transaction technology itself is also associated positively to real economic aggregates. We also show that initiatives to integrate and harmonise retail payment markets foster trade and consumption and thereby have a beneficial effect for whole economy. Additionally, the findings reveal that the impact of retail payments on the real economy is more pronounced in euro area countries. Our findings are robust to different regression specifications. The study supports the adoption of policies promoting a swift migration to efficient and harmonised electronic payment instruments

    An Optogenetic Method to Modulate Cell Contractility during Tissue Morphogenesis

    Get PDF
    SummaryMorphogenesis of multicellular organisms is driven by localized cell shape changes. How, and to what extent, changes in behavior in single cells or groups of cells influence neighboring cells and large-scale tissue remodeling remains an open question. Indeed, our understanding of multicellular dynamics is limited by the lack of methods allowing the modulation of cell behavior with high spatiotemporal precision. Here, we developed an optogenetic approach to achieve local modulation of cell contractility and used it to control morphogenetic movements during Drosophila embryogenesis. We show that local inhibition of apical constriction is sufficient to cause a global arrest of mesoderm invagination. By varying the spatial pattern of inhibition during invagination, we further demonstrate that coordinated contractile behavior responds to local tissue geometrical constraints. Together, these results show the efficacy of this optogenetic approach to dissect the interplay between cell-cell interaction, force transmission, and tissue geometry during complex morphogenetic processes

    Assessing Web Services Interfaces with Lightweight Semantic Basis

    Get PDF
    In the last years, Web Services have become the technological choice to materialize the Service-Oriented Computing paradigm. However, a broad use of Web Services requires efficient approaches to allow service consumption from within applications. Currently, developers are compelled to search for suitable services mainly by manually exploring Web catalogs, which usually show poorly relevant information, than to provide the adequate "glue-code" for their assembly. This implies a large effort into discovering, selecting and adapting services. To overcome these challenges, this paper presents a novel Web Service Selection Method. We have defined an Interface Compatibility procedure to assess structural-semantic aspects from functional specifications - in the form of WSDL documents - of candidate Web Services. Two different semantic basis have been used to define and implement the approach: WordNet, a widely known lexical dictionary of the English language; and DISCO, a database which indexes co-occurrences of terms in very large text collections. We performed a set of experiments to evaluate the approach regarding the underlying semantic basis and against third-party approaches with a data-set of real-life Web Services. Promising results have been obtained in terms of well-known metrics of the Information Retrieval field

    A Software Tool for Selection and Integrability on Service Oriented Applications

    Get PDF
    Connecting services to rapidly developing service-oriented applications is a challenging issue. Selection of adequate services implies to face an overwhelming assessment effort, even with a reduced set of candidate services. On previous work we have presented an approach for service selection addressing the assessment of WSDL interfaces and the expected execution behavior of candidate services. In this paper we present a plugin for the Eclipse IDE to support the approach and to assist developers’ daily tasks on exploring services integrability. Particularly for behavioral compatibility we make use of two testing frameworks: JUnit and MuClipse to achieve a compliance testing strategy.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Parameter estimation of binary black holes in the endpoint of the up-down instability

    Full text link
    Black-hole binary spin precession admits equilibrium solutions corresponding to systems with (anti-) aligned spins. Among these, binaries in the up-down configuration, where the spin of the heavier (lighter) black hole is co- (counter-) aligned with the orbital angular momentum, might be unstable to small perturbations of the spin directions. The occurrence of the up-down instability leads to gravitational-wave sources that formed with aligned spins but are detected with precessing spins. We present a Bayesian procedure based on the Savage-Dickey density ratio to test the up-down origin of gravitational-wave events. This is applied to both simulated signals, which indicate that achieving strong evidence is within the reach of current experiments, and the LIGO/Virgo events released to date, which indicate that current data are not informative enough
    corecore