8 research outputs found

    Safeguarding Privacy Through Deep Learning Techniques

    Get PDF
    Over the last few years, there has been a growing need to meet minimum security and privacy requirements. Both public and private companies have had to comply with increasingly stringent standards, such as the ISO 27000 family of standards, or the various laws governing the management of personal data. The huge amount of data to be managed has required a huge effort from the employees who, in the absence of automatic techniques, have had to work tirelessly to achieve the certification objectives. Unfortunately, due to the delicate information contained in the documentation relating to these problems, it is difficult if not impossible to obtain material for research and study purposes on which to experiment new ideas and techniques aimed at automating processes, perhaps exploiting what is in ferment in the scientific community and linked to the fields of ontologies and artificial intelligence for data management. In order to bypass this problem, it was decided to examine data related to the medical world, which, especially for important reasons related to the health of individuals, have gradually become more and more freely accessible over time, without affecting the generality of the proposed methods, which can be reapplied to the most diverse fields in which there is a need to manage privacy-sensitive information

    A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment

    Get PDF
    Consiglio Nazionale delle Ricerche-CARI-CARE-ITALY’ within the CRUI CARE Agreemen

    Lexicon-Based vs. Bert-Based Sentiment Analysis: A Comparative Study in Italian

    No full text
    Recent evolutions in the e-commerce market have led to an increasing importance attributed by consumers to product reviews made by third parties before proceeding to purchase. The industry, in order to improve the offer intercepting the discontent of consumers, has placed increasing attention towards systems able to identify the sentiment expressed by buyers, whether positive or negative. From a technological point of view, the literature in recent years has seen the development of two types of methodologies: those based on lexicons and those based on machine and deep learning techniques. This study proposes a comparison between these technologies in the Italian market, one of the largest in the world, exploiting an ad hoc dataset: scientific evidence generally shows the superiority of language models such as BERT built on deep neural networks, but it opens several considerations on the effectiveness and improvement of these solutions when compared to those based on lexicons in the presence of datasets of reduced size such as the one under study, a common condition for languages other than English or Chinese

    An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

    No full text
    Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages

    A Novel COVID-19 Data Set and an Effective Deep Learning Approach for the De-Identification of Italian Medical Records

    No full text
    In the last years, the need to de-identify privacy-sensitive information within Electronic Health Records (EHRs) has become increasingly felt and extremely relevant to encourage the sharing and publication of their content in accordance with the restrictions imposed by both national and supranational privacy authorities. In the field of Natural Language Processing (NLP), several deep learning techniques for Named Entity Recognition (NER) have been applied to face this issue, significantly improving the effectiveness in identifying sensitive information in EHRs written in English. However, the lack of data sets in other languages has strongly limited their applicability and performance evaluation. To this aim, a new de-identification data set in Italian has been developed in this work, starting from the 115 COVID-19 EHRs provided by the Italian Society of Radiology (SIRM): 65 were used for training and development, the remaining 50 were used for testing. The data set was labelled following the guidelines of the i2b2 2014 de-identification track. As additional contribution, combined with the best performing Bi-LSTM + CRF sequence labeling architecture, a stacked word representation form, not yet experimented for the Italian clinical de-identification scenario, has been tested, based both on a contextualized linguistic model to manage word polysemy and its morpho-syntactic variations and on sub-word embeddings to better capture latent syntactic and semantic similarities. Finally, other cutting-edge approaches were compared with the proposed model, which achieved the best performance highlighting the goodness of the promoted approach

    D6.4 – Use Cases System Integration, Deployment & Experimentation V1

    No full text
    <p>The Deliverable 6.4 of the MOBISPACES project "Use Cases System Integration, Deployment & Experimentation v1" describes the development of the Use Cases of MOBISPACES. </p><p>For each of the five Use Cases, the document investigates the integration with MOBISPACES systems, the level of deployment and the experimentation done so far. In spite of the short time of development, a preliminary evaluation of the results of each Use Case is presented, in order to give evidence of their potential. </p><p>In the first months of the project, the main activities held by Use Cases have been the definition of the architecture and the setting of the interfaces between the Use Cases and the general MOBISPACES architecture. Each individual architecture is depicted and described in this Deliverable as an instantiation of the MOBISPACES architecture. </p><p>The Use Cases have the objective to validate the added value and the effectiveness of MOBISPACES, thus, the Deliverable focuses on the exploitation and adaptation of the different MOBISPACES components to the real situations of the Use Cases. </p><p>Then, each Use Case has its own requirements, described in Deliverable 6.1. The document aims to confirm, given the preliminary results obtained, the reachability of the requirements and the feasibility of the work configured.</p&gt

    D4.1 – AI-BASED DATA OPERATIONS V1

    No full text
    <p>This is the first of the series of deliverables related to the activities of WP4 ("AI-based Data Management for Green Data Operations"). Following the MobiSpaces Reference Architecture defined under the scope of T2.1 ("Design of Reference Architecture") and its current release reported in D2.1 ("Conceptual Model & Reference Architecture v1"), this document gives more details about one of the major architectural pillars, the AI-based Data Operations Toolbox. </p><p>The overall activities conducted under the scope of WP4 ("AI-based Data Management for Green Data Operations") that are being reported in this document, mostly focus on this particular pillar, with the exception of the T4.4 ("Privacy-driven Data Aggregation"). The latter is considered as part of the Trustworthy Data Governance Services, however, the progress of this task is reported in these series of deliverables that summarize the activities of the whole WP4. In this document, we present the individual software components that are part of the AI-based Data Operations Toolbox, we give details of their interactions, the background technologies that these components are currently being built upon, along with more detailed description of their internal building blocks. </p><p>WP4 focuses on both the data management aspects of MobiSpaces and the data operations of the platform in terms of automating the definition of AI workflows in a declarative manner and their corresponding runtime deployment and orchestration of their entire data lifecycle. The first category of components consists of the Data Management Toolset of the integrated solution that offers a variety of different but complementary data management systems to be exploited by the data users and application developers. For the second category of components, we provide the tools and algorithms for automating the definition and execution of complex AI workflows, consuming data from the aforementioned Data Management Toolset in a transparent manner. The target objective is to execute these workflows in an energy efficient manner, using our novel resource allocator to reduce the carbon emission. </p><p>The duration of WP4 spans from M04 to M34. This deliverable reports the work that has been conducted until M10, which accomplishes the milestone MS04 ("Software prototypes - Iteration I"). At this phase of the project, we have identified the internal building blocks of the AI-based Data Operations Toolbox, the details of their interactions and we have delivered the first release of the corresponding prototypes. In this report our primary focus is on the individual evaluation of the components, while D2.7 ("AI-based Data Operations Toolbox v1") will later focus on the integrated solution based on our prototypes described here, to be evaluated by the project's use cases. Given the different maturity levels of the different components in WP4 at this moment, in this document we either provide some initial evaluation results or a concrete plan for evaluation that will be followed during the next period. Two additional versions are planned to be submitted in M22 and M34, where the second and third release of the prototypes will be available, giving more details of the implementation and final evaluation, implementing all target objectives of the WP4. </p&gt
    corecore