7 research outputs found

    PPGAN: Privacy-preserving Generative Adversarial Network

    Full text link
    Generative Adversarial Network (GAN) and its variants serve as a perfect representation of the data generation model, providing researchers with a large amount of high-quality generated data. They illustrate a promising direction for research with limited data availability. When GAN learns the semantic-rich data distribution from a dataset, the density of the generated distribution tends to concentrate on the training data. Due to the gradient parameters of the deep neural network contain the data distribution of the training samples, they can easily remember the training samples. When GAN is applied to private or sensitive data, for instance, patient medical records, as private information may be leakage. To address this issue, we propose a Privacy-preserving Generative Adversarial Network (PPGAN) model, in which we achieve differential privacy in GANs by adding well-designed noise to the gradient during the model learning procedure. Besides, we introduced the Moments Accountant strategy in the PPGAN training process to improve the stability and compatibility of the model by controlling privacy loss. We also give a mathematical proof of the differential privacy discriminator. Through extensive case studies of the benchmark datasets, we demonstrate that PPGAN can generate high-quality synthetic data while retaining the required data available under a reasonable privacy budget.Comment: This paper was accepted by IEEE ICPADS 2019 Workshop. This paper contains 10 pages, 3 figure

    Big Data Management Towards Impact Assessment of Level 3 Automated Driving Functions

    Get PDF
    As industrial research in automated driving is rapidly advancing, it is of paramount importance to analyze field data from extensive road tests. This thesis presents a research work done in L3Pilot, the first comprehensive test of automated driving functions (ADFs) on public roads in Europe. L3Pilot is now completing the test of ADFs in vehicles by 13 companies. The tested functions are mainly of Society of Automotive Engineers (SAE) automation level 3, some of level 4. The overall collaboration among several organizations led to the design and development of a toolchain aimed at processing and managing experimental data sharable among all the vehicle manufacturers to answer a set of 100+ research questions (RQs) about the evaluation of ADFs at various levels, from technical system functioning to overall impact assessment. The toolchain was designed to support a coherent, robust workflow based on Field opErational teSt supporT Action (FESTA), a well-established reference methodology for automotive piloting. Key challenges included ensuring methodological soundness and data validity while protecting the vehicle manufacturers\u2019 intellectual property. Through this toolchain, the project set up what could become a reference architecture for managing research data in automated vehicle tests. In the first step of the workflow, the methodology partners captured the quantitative requirements of each RQ in terms of the relevant data needed from the tests. L3Pilot did not intend to share the original vehicular signal timeseries, both for confidentiality reasons and for the enormous amount of data that would have been shared. As the factual basis for quantitatively answering the RQs, a set of performance indicators (PIs) was defined. The source vehicular signals were translated from their proprietary format into the common data format (CDF), which was defined by L3Pilot to support efficient processing through multiple partners\u2019 tools, and data quality checking. The subsequent vi performance indicator (PI) computation step consists in synthesizing the vehicular time series into statistical syntheses to be stored in the project-shared database, namely the Consolidated Database (CDB). Computation of the PIs is segmented based on experimental condition, road type and driving scenarios, as required to answer the RQs. The supported analysis concerns both objective data, from vehicular sensors, and subjective data from user (test drivers and passengers) questionnaires. The overall L3Pilot toolchain allowed setting up a data management process involving several partners (vehicle manufacturers, research institutions, suppliers, and developers), with different perspectives and requirements. The system was deployed and used by all the relevant partners in the pilot sites. The experience highlights the importance of the reference methodology to theoretically inform and coherently manage all the steps of the project and the need for effective and efficient tools, to support the everyday work of all the involved research teams, from vehicle manufacturers to data analysts

    Managing Big Data for Addressing Research Questions in a Collaborative Project on Automated Driving Impact Assessment

    Get PDF
    While extracting meaningful information from big data is getting relevance, literature lacks information on how to handle sensitive data by different project partners in order to collectively answer research questions (RQs), especially on impact assessment of new automated driving technologies. This paper presents the application of an established reference piloting methodology and the consequent development of a coherent, robust workflow. Key challenges include ensuring methodological soundness and data validity while protecting partners’ intellectual property. The authors draw on their experiences in a 34-partner project aimed at assessing the impact of advanced automated driving functions, across 10 European countries. In the first step of the workflow, we captured the quantitative requirements of each RQ in terms of the relevant data needed from the tests. Most of the data come from vehicular sensors, but subjective data from questionnaires are processed as well. Next, we set up a data management process involving several partners (vehicle manufacturers, research institutions, suppliers and developers), with different perspectives and requirements. Finally, we deployed the system so that it is fully integrated within the project big data toolchain and usable by all the partners. Based on our experience, we highlight the importance of the reference methodology to theoretically inform and coherently manage all the steps of the project and the need for effective and efficient tools, in order to support the everyday work of all the involved research teams, from vehicle manufacturers to data analysts

    Applications in Electronics Pervading Industry, Environment and Society

    Get PDF
    This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs

    Az érzékeny kutatási adatok megosztása a személyre szabott orvoslás gyakorlatában = Sharing sensitive research data in the practice of personalised medicine

    Get PDF
    Az egészségügyi és az orvosbiológiai kutatások adatainak széttagoltsága az adatvezérelt döntéseken alapuló, személyre szabott orvoslás egyik akadálya. A fejlődéshez a méretben és komplexitásban is rendkívüli, ám töredezett egészségügyi adatkincs hatékony kiaknázását, illetve az intézményeken vagy akár határokon is átívelő adatmegosztást biztosító technológiák szükségesek. A biobankok nemcsak a minták archívumai, hanem adatintegrációs központok is egyúttal. A biobankok adatainak együttműködésben történő elemzése értékesebb következtetéseket ígér. Az adatok megosztásának előfeltétele a harmonizáció, azaz a minták egyedi klinikai és molekuláris jellemzőinek egységes adatmodellben és standard kódokkal történő leképezése. Az egészségügyben keletkezett információk ezekben a közös sémára illesztett adatbázisokban válnak elérhetővé a gépi tanulás számára, így a módszer az együttműködés során a személyes adatokat tiszteletben tartó felhasználásra is lehetőséget ad. Az érzékeny egészségügyi adatok újraértékelése elképzelhetetlen a személyes adatok védelme nélkül, amelynek jogi és koncepcionális kereteit a GDPR- (General Data Protection Regulation) és a FAIR- (findable, accessible, interoperable, reusable) elvek jelölik ki. Az Európában működő biobankok számára a BBMRI-ERIC (Biobanking and Biomolecular Research Infrastructure – European Research Infrastructure Consortium) kutatási infrastruktúra fejleszt közös irányelveket, amelyhez hazánk 2021-ben mint Magyar BBMRI Csomópont csatlakozott. Első lépésben a biobankok szövetségében kapcsolódhatnak össze a széttagolt adathalmok, ahol sokrétű kutatási cél által motivált, igényesen összerendezett adatkészletek válnak hozzáférhetővé. Ezt követően, a betegellátás valós környezetében keletkezett adatok magasabb szinten történő értékelése is lehetővé válik, így a klinikai vizsgálatok szigorú keretek között generált bizonyítékai új szintre kerülhetnek. Közleményünkben a „federált” adatmegosztásban rejlő lehetőségeket mutatjuk be a Semmelweis Egyetem biobankjainak közös projektje kapcsán. | Fragmentation of health data and biomedical research data is a major obstacle for precision medicine based on data-driven decisions. The development of personalized medicine requires the efficient exploitation of health data resources that are extraordinary in size and complexity, but highly fragmented, as well as technologies that enable data sharing across institutions and even borders. Biobanks are both sample archives and data integration centers. The analysis of large biobank data warehouses in federated datasets promises to yield conclusions with higher statistical power. A prerequisite for data sharing is harmonization, i.e., the mapping of the unique clinical and molecular characteristics of samples into a unified data model and standard codes. These databases, which are aligned to a common schema, then make healthcare information available for privacy-preserving federated data sharing and learning. The re-evaluation of sensitive health data is inconceivable without the protection of privacy, the legal and conceptual framework for which is set out in the GDPR (General Data Protection Regulation) and the FAIR (findable, accessible, interoperable, reusable) principles. For biobanks in Europe, the BBMRI-ERIC (Biobanking and Biomolecular Research Infrastructure - European Research Infrastructure Consortium) research infrastructure develops common guidelines, which the Hungarian BBMRI Node joined in 2021. As the first step, a federation of biobanks can connect fragmented datasets, providing high-quality data sets motivated by multiple research goals. Extending the approach to real-word data could also allow for higher level evaluation of data generated in the real world of patient care, and thus take the evidence generated in clinical trials within a rigorous framework to a new level. In this publication, we present the potential of federated data sharing in the context of the Semmelweis University Biobanks joint project. Orv Hetil. 2023; 164(21): 811-819
    corecore