775 research outputs found

    Big Data Ethics in Research

    Get PDF
    The main problems faced by scientists in working with Big Data sets, highlighting the main ethical issues, taking into account the legislation of the European Union. After a brief Introduction to Big Data, the Technology section presents specific research applications. There is an approach to the main philosophical issues in Philosophical Aspects, and Legal Aspects with specific ethical issues in the EU Regulation on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (Data Protection Directive - General Data Protection Regulation, "GDPR"). The Ethics Issues section details the specific aspects of Big Data. After a brief section of Big Data Research, I finalize my work with the presentation of Conclusions on research ethics in working with Big Data. CONTENTS: Abstract 1. Introduction - 1.1 Definitions - 1.2 Big Data dimensions 2. Technology - 2.1 Applications - - 2.1.1 In research 3. Philosophical aspects 4. Legal aspects - 4.1 GDPR - - Stages of processing of personal data - - Principles of data processing - - Privacy policy and transparency - - Purposes of data processing - - Design and implicit confidentiality - - The (legal) paradox of Big Data 5. Ethical issues - Ethics in research - Awareness - Consent - Control - Transparency - Trust - Ownership - Surveillance and security - Digital identity - Tailored reality - De-identification - Digital inequality - Privacy 6. Big Data research Conclusions Bibliography DOI: 10.13140/RG.2.2.11054.4640

    Bridging the demand and the offer in data science

    Get PDF
    During the last several years, we have observed an exponential increase in the demand for Data Scientists in the job market. As a result, a number of trainings, courses, books, and university educational programs (both at undergraduate, graduate and postgraduate levels) have been labeled as “Big data” or “Data Science”; the fil‐rouge of each of them is the aim at forming people with the right competencies and skills to satisfy the business sector needs. In this paper, we report on some of the exercises done in analyzing current Data Science education offer and matching with the needs of the job markets to propose a scalable matching service, ie, COmpetencies ClassificatiOn (E‐CO‐2), based on Data Science techniques. The E‐CO‐2 service can help to extract relevant information from Data Science–related documents (course descriptions, job Ads, blogs, or papers), which enable the comparison of the demand and offer in the field of Data Science Education and HR management, ultimately helping to establish the profession of Data Scientist.publishedVersio

    Protecting sensitive data using differential privacy and role-based access control

    Get PDF
    Dans le monde d'aujourd'hui oĂč la plupart des aspects de la vie moderne sont traitĂ©s par des systĂšmes informatiques, la vie privĂ©e est de plus en plus une grande prĂ©occupation. En outre, les donnĂ©es ont Ă©tĂ© gĂ©nĂ©rĂ©es massivement et traitĂ©es en particulier dans les deux derniĂšres annĂ©es, ce qui motive les personnes et les organisations Ă  externaliser leurs donnĂ©es massives Ă  des environnements infonuagiques offerts par des fournisseurs de services. Ces environnements peuvent accomplir les tĂąches pour le stockage et l'analyse de donnĂ©es massives, car ils reposent principalement sur Hadoop MapReduce qui est conçu pour traiter efficacement des donnĂ©es massives en parallĂšle. Bien que l'externalisation de donnĂ©es massives dans le nuage facilite le traitement de donnĂ©es et rĂ©duit le coĂ»t de la maintenance et du stockage de donnĂ©es locales, elle soulĂšve de nouveaux problĂšmes concernant la protection de la vie privĂ©e. Donc, comment on peut effectuer des calculs sur de donnĂ©es massives et sensibles tout en prĂ©servant la vie privĂ©e. Par consĂ©quent, la construction de systĂšmes sĂ©curisĂ©s pour la manipulation et le traitement de telles donnĂ©es privĂ©es et massives est cruciale. Nous avons besoin de mĂ©canismes pour protĂ©ger les donnĂ©es privĂ©es, mĂȘme lorsque le calcul en cours d'exĂ©cution est non sĂ©curisĂ©. Il y a eu plusieurs recherches ont portĂ© sur la recherche de solutions aux problĂšmes de confidentialitĂ© et de sĂ©curitĂ© lors de l'analyse de donnĂ©es dans les environnements infonuagique. Dans cette thĂšse, nous Ă©tudions quelques travaux existants pour protĂ©ger la vie privĂ©e de tout individu dans un ensemble de donnĂ©es, en particulier la notion de vie privĂ©e connue comme confidentialitĂ© diffĂ©rentielle. ConfidentialitĂ© diffĂ©rentielle a Ă©tĂ© proposĂ©e afin de mieux protĂ©ger la vie privĂ©e du forage des donnĂ©es sensibles, assurant que le rĂ©sultat global publiĂ© ne rĂ©vĂšle rien sur la prĂ©sence ou l'absence d'un individu donnĂ©. Enfin, nous proposons une idĂ©e de combiner confidentialitĂ© diffĂ©rentielle avec une autre mĂ©thode de prĂ©servation de la vie privĂ©e disponible.In nowadays world where most aspects of modern life are handled and managed by computer systems, privacy has increasingly become a big concern. In addition, data has been massively generated and processed especially over the last two years. The rate at which data is generated on one hand, and the need to efficiently store and analyze it on the other hand, lead people and organizations to outsource their massive amounts of data (namely Big Data) to cloud environments supported by cloud service providers (CSPs). Such environments can perfectly undertake the tasks for storing and analyzing big data since they mainly rely on Hadoop MapReduce framework, which is designed to efficiently handle big data in parallel. Although outsourcing big data into the cloud facilitates data processing and reduces the maintenance cost of local data storage, it raises new problem concerning privacy protection. The question is how one can perform computations on sensitive and big data while still preserving privacy. Therefore, building secure systems for handling and processing such private massive data is crucial. We need mechanisms to protect private data even when the running computation is untrusted. There have been several researches and work focused on finding solutions to the privacy and security issues for data analytics on cloud environments. In this dissertation, we study some existing work to protect the privacy of any individual in a data set, specifically a notion of privacy known as differential privacy. Differential privacy has been proposed to better protect the privacy of data mining over sensitive data, ensuring that the released aggregate result gives almost nothing about whether or not any given individual has been contributed to the data set. Finally, we propose an idea of combining differential privacy with another available privacy preserving method

    Internet of Things-aided Smart Grid: Technologies, Architectures, Applications, Prototypes, and Future Research Directions

    Full text link
    Traditional power grids are being transformed into Smart Grids (SGs) to address the issues in existing power system due to uni-directional information flow, energy wastage, growing energy demand, reliability and security. SGs offer bi-directional energy flow between service providers and consumers, involving power generation, transmission, distribution and utilization systems. SGs employ various devices for the monitoring, analysis and control of the grid, deployed at power plants, distribution centers and in consumers' premises in a very large number. Hence, an SG requires connectivity, automation and the tracking of such devices. This is achieved with the help of Internet of Things (IoT). IoT helps SG systems to support various network functions throughout the generation, transmission, distribution and consumption of energy by incorporating IoT devices (such as sensors, actuators and smart meters), as well as by providing the connectivity, automation and tracking for such devices. In this paper, we provide a comprehensive survey on IoT-aided SG systems, which includes the existing architectures, applications and prototypes of IoT-aided SG systems. This survey also highlights the open issues, challenges and future research directions for IoT-aided SG systems

    Intelligent and Distributed Data Warehouse for Student’s Academic Performance Analysis

    Get PDF
    In the academic world, a large amount of data is handled each day, ranging from student’s assessments to their socio-economic data. In order to analyze this historical information, an interesting alternative is to implement a Data Warehouse. However, Data Warehouses are not able to perform predictive analysis by themselves, so machine intelligence techniques can be used for sorting, grouping, and predicting based on historical information to improve the analysis quality. This work describes a Data Warehouse architecture to carry out an academic performance analysis of students

    Challenges for MapReduce in Big Data

    Get PDF
    In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research
    • 

    corecore