13 research outputs found

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems

    ExtremeEarth meets satellite data from space

    Get PDF
    Bringing together a number of cutting-edge technologies that range from storing extremely large volumesof data all the way to developing scalable machine learning and deep learning algorithms in a distributed manner, and having them operate over the same infrastructure poses unprecedentedchallenges. One of these challenges is the integration of European Space Agency (ESA)s Thematic Exploitation Platforms (TEPs) and data information access service platforms with a data platform, namely Hopsworks, that enables scalable data processing, machine learning, and deep learning on Copernicus data, and development of very large training datasets for deep learning architectures targeting the classification of Sentinel images. In this paper, we present the software architecture of ExtremeEarth that aims at the development of scalable deep learning and geospatial analytics techniques for processing and analyzing petabytes of Copernicus data. The ExtremeEarth software infrastructure seamlessly integrates existing and novel software platforms and tools for storing, accessing, processing, analyzing, and visualizing large amounts of Copernicus data. New techniques in the areas of remote sensing and artificial intelligence with an emphasis on deep learning are developed. These techniques and corresponding software presented in thispaper are to be integrated with and used in two ESA TEPs, namely Polar and Food Security TEPs. Furthermore, we presentthe integration of Hopsworks with the Polar and Food Securityuse cases and the flow of events for the products offered through the TEPs

    The Big Data Agenda

    Get PDF
    "This book highlights that the capacity for gathering, analysing, and utilising vast amounts of digital (user) data raises significant ethical issues. Annika Richterich provides a systematic contemporary overview of the field of critical data studies that reflects on practices of digital data collection and analysis. The book assesses in detail one big data research area: biomedical studies, focused on epidemiological surveillance. Specific case studies explore how big data have been used in academic work. The Big Data Agenda concludes that the use of big data in research urgently needs to be considered from the vantage point of ethics and social justice. Drawing upon discourse ethics and critical data studies, Richterich argues that entanglements between big data research and technology/ internet corporations have emerged. In consequence, more opportunities for discussing and negotiating emerging research practices and their implications for societal values are needed. An electronic version of this book is freely available, thanks to the support of libraries working with Knowledge Unlatched. KU is a collaborative initiative designed to make high quality books Open Access for the public good. More information about the initiative and details about KU's Open Access programme can be found at www.knowledgeunlatched.org.

    DARE: A Reflective Platform Designed to Enable Agile Data-Driven Research on the Cloud

    Get PDF
    The DARE platform has been designed to help research developers deliver user-facing applications and solutions over diverse underlying e-infrastructures, data and computational contexts. The platform is Cloud-ready, and relies on the exposure of APIs, which are suitable for raising the abstraction level and hiding complexity. At its core, the platform implements the cataloguing and execution of fine-grained and Python-based dispel4py workflows as services. Reflection is achieved via a logical knowledge base, comprising multiple internal catalogues, registries and semantics, while it supports persistent and pervasive data provenance. This paper presents design and implementation aspects of the DARE platform, as well as it provides directions for future development.PublishedSan Diego (CA, USA)3IT. Calcolo scientific

    Query Optimization Techniques For Scaling Up To Data Variety

    Get PDF
    Even though Data Lakes are efficient in terms of data storage, they increase the complexity of query processing; this can lead to expensive query execution. Hence, novel techniques for generating query execution plans are demanded. Those techniques have to be able to exploit the main characteristics of Data Lakes. Ontario is a federated query engine capable of processing queries over heterogeneous data sources. Ontario uses source descriptions based on RDF Molecule Templates, i.e., an abstract description of the properties belonging to the entities in the unified schema of the data in the Data Lake. This thesis proposes new heuristics tailored to the problem of query processing over heterogeneous data sources including heuristics specifically designed for certain data models. The proposed heuristics are integrated into the Ontario query optimizer. Ontario is compared to state-of-the-art RDF query engines in order to study the overhead introduced by considering heterogeneity during query processing. The results of the empirical evaluation suggest that there is no significant overhead when considering heterogeneity. Furthermore, the baseline version of Ontario is compared to two different sets of additional heuristics, i.e., heuristics specifically designed for certain data models and heuristics that do not consider the data model. The analysis of the obtained experimental results shows that source-specific heuristics are able to improve query performance. Ontario optimization techniques are able to generate effective and efficient query plans that can be executed over heterogeneous data sources in a Data Lake

    DARE: A Reflective Platform Designed to Enable Agile Data-Driven Research on the Cloud

    Get PDF
    The DARE platform has been designed to help research developers deliver user-facing applications and solutions over diverse underlying e-infrastructures, data and computational contexts. The platform is Cloud-ready, and relies on the exposure of APIs, which are suitable for raising the abstraction level and hiding complexity. At its core, the platform implements the cataloguing and execution of fine-grained and Python-based dispel4py workflows as services. Reflection is achieved via a logical knowledge base, comprising multiple internal catalogues, registries and semantics, while it supports persistent and pervasive data provenance. This paper presents design and implementation aspects of the DARE platform, as well as it provides directions for future development.PublishedSan Diego (CA, USA)3IT. Calcolo scientific

    Compliance Using Metadata

    Get PDF
    Everybody talks about the data economy. Data is collected stored, processed and re-used. In the EU, the GDPR creates a framework with conditions (e.g. consent) for the processing of personal data. But there are also other legal provisions containing requirements and conditions for the processing of data. Even today, most of those are hard-coded into workflows or database schemes, if at all. Data lakes are polluted with unusable data because nobody knows about usage rights or data quality. The approach presented here makes the data lake intelligent. It remembers usage limitations and promises made to the data subject or the contractual partner. Data can be used as risk can be assessed. Such a system easily reacts on new requirements. If processing is recorded back into the data lake, the recording of this information allows to prove compliance. This can be shown to authorities on demand as an audit trail. The concept is best exemplified by the SPECIAL project https://specialprivacy.eu (Scalable Policy-aware Linked Data Architecture For PrivacyPrivacy, TransparencyTransparency and ComplianceCompliance). SPECIAL has several use cases, but the basic framework is applicable beyond those cases

    Big Data Platform Architecture Under The Background of Financial Technology

    Full text link
    With the rise of the concept of financial technology, financial and technology gradually in-depth integration, scientific and technological means to become financial product innovation, improve financial efficiency and reduce financial transaction costs an important driving force. In this context, the new technology platform is from the business philosophy, business model, technical means, sales, internal management, and other dimensions to re-shape the financial industry. In this paper, the existing big data platform architecture technology innovation, adding space-time data elements, combined with the insurance industry for practical analysis, put forward a meaningful product circle and customer circle.Comment: 4 pages, 3 figures, 2018 International Conference on Big Data Engineering and Technolog

    The BigDataEurope platform - supporting the variety dimension of big data

    No full text
    The management and analysis of large-scale datasets - described with the term Big Data - involves the three classic dimensions volume, velocity and variety. While the former two are well supported by a plethora of software components, the variety dimension is still rather neglected. We present the BDE platform - an easy-to-deploy, easy-to-use and adaptable (cluster-based and standalone) platform for the execution of big data components and tools like Hadoop, Spark, Flink, Flume and Cassandra. The BDE platform was designed based upon the requirements gathered from seven of the societal challenges put forward by the European Commission in the Horizon 2020 programme and targeted by the BigDataEurope pilots. As a result, the BDE platform allows to perform a variety of Big Data flow tasks like message passing, storage, analysis or publishing. To facilitate the processing of heterogeneous data, a particular innovation of the platform is the Semantic Layer, which allows to directly process RDF data and to map and transform arbitrary data into RDF. The advantages of the BDE platform are demonstrated through seven pilots, each focusing on a major societal challenge
    corecore