1,803 research outputs found

    Supporting security-oriented, inter-disciplinary research: crossing the social, clinical and geospatial domains

    Get PDF
    How many people have had a chronic disease for longer than 5-years in Scotland? How has this impacted upon their choices of employment? Are there any geographical clusters in Scotland where a high-incidence of patients with such long-term illness can be found? How does the life expectancy of such individuals compare with the national averages? Such questions are important to understand the health of nations and the best ways in which health care should be delivered and measured for their impact and success. In tackling such research questions, e-Infrastructures need to provide tailored, secure access to an extensible range of distributed resources including primary and secondary e-Health clinical data; social science data, and geospatial data sets amongst numerous others. In this paper we describe the security models underlying these e-Infrastructures and demonstrate their implementation in supporting secure, federated access to a variety of distributed and heterogeneous data sets exploiting the results of a variety of projects at the National e-Science Centre (NeSC) at the University of Glasgow

    Dataspaces: Concepts, Architectures and Initiatives

    Get PDF
    Despite not being a new concept, dataspaces have become a prominent topic due to the increasing availability of data and the need for efficient management and utilization of diverse data sources. In simple terms, a dataspace refers to an environment where data from various sources, formats, and domains can be integrated, shared, and analyzed. It aims to provide a unified view of heterogeneous data by bridging the gap between different data silos, enabling interoperability. The concept of dataspaces promotes the idea that data should be treated as a cohesive entity, rather than being fragmented across different systems and applications. Dataspaces often involve the integration of structured and unstructured data, including databases, documents, sensor data, social media feeds, and more. The goal is to enable organizations to harness the full potential of their data assets by facilitating data discovery, access, and analysis. By bringing together diverse data sources, dataspaces can offer new insights, support decision-making processes, and drive innovation. In the context of European Commission-funded research projects, dataspaces are often explored as part of initiatives focused on data management, data sharing, and the development of data-driven technologies. These projects aim to address challenges related to data integration, data privacy, data governance, and scalability. The goal is to advance the state of the art in data management and enable organizations to leverage data more effectively for societal, economic, and scientific advancements. It is important to notice that while dataspaces offer potential benefits, they also come with challenges. These challenges include data quality assurance, data privacy and security, semantic interoperability, scalability, and the need for appropriate data governance frameworks. Overall, dataspaces represent an approach to managing and utilizing data that emphasizes integration, interoperability, and accessibility. The concept is being explored and researched to develop innovative solutions that can unlock the value of data in various domains and sectors

    GeocentraleApps : Practical Approaches to Data Integration for Spatially Enabled Apps.

    Get PDF
    With modern mobile and internet technologies, spatial data is becoming ubiquitous. In order to realize the vision of a spatially enabled society, however, decision makers in government, public administration, business and society need to understand and actively take into account location as a driver in their decisions. This necessitates decision-support tools that integrate the required spatial and textual data from a variety of sources. GeocentraleApps is a platform for building such modern spatially enabled applications. These applications and the underlying spatial data infrastructures face challenges such as data discovery, matching between disparate data sources, issues with data and service quality, as well as the need for appropriate visualization and presentation. GeocentraleApps meets these challenges by flexibly combining a number of different mechanisms for data integration. This paper presents lessons learned from an analysis of the platform’s data integration approaches with regard to their individual architectures. It points out the advantages and disadvantages of the current solutions and gives an outlook on future developments

    Data warehouse automation trick or treat?

    Get PDF
    Data warehousing systems have been around for 25 years playing a crucial role in collecting data and transforming that data into value, allowing users to make decisions based on informed business facts. It is widely accepted that a data warehouse is a critical component to a data-driven enterprise, and it becomes part of the organisation’s information systems strategy, with a significant impact on the business. However, after 25 years, building a Data Warehouse is still painful, they are too time-consuming, too expensive and too difficult to change after deployment. Data Warehouse Automation appears with the promise to address the limitations of traditional approaches, turning the data warehouse development from a prolonged effort into an agile one, with gains in efficiency and effectiveness in data warehousing processes. So, is Data Warehouse Automation a Trick or Treat? To answer this question, a case study of a data warehousing architecture using a data warehouse automation tool, called WhereScape, was developed. Also, a survey was made to organisations that are using data warehouse automation tools, in order to understand their motivation in the adoption of this kind of tools in their data warehousing systems. Based on the results of the survey and on the case study, automation in the data warehouses building process is necessary to deliver data warehouse systems faster, and a solution to consider when modernize data warehouse architectures as a way to achieve results faster, keeping costs controlled and reduce risk. Data Warehouse Automation definitely may be a Treat.Os sistemas de armazenamento de dados existem há 25 anos, desempenhando um papel crucial na recolha de dados e na transformação desses dados em valor, permitindo que os utilizadores tomem decisões com base em fatos. É amplamente aceite, que um data warehouse é um componente crítico para uma empresa orientada a dados e se torna parte da estratégia de sistemas de informação da organização, com um impacto significativo nos negócios. No entanto, após 25 anos, a construção de um Data Warehouse ainda é uma tarefa penosa, demora muito tempo, é cara e difícil de mudar após a sua conclusão. A automação de Data Warehouse aparece com a promessa de endereçar as limitações das abordagens tradicionais, transformando o desenvolvimento da data warehouse de um esforço prolongado em um esforço ágil, com ganhos de eficiência e eficácia. Será, a automação de Data Warehouse uma doçura ou travessura? Foi desenvolvido um estudo de caso de uma arquitetura de data warehousing usando uma ferramenta de automação, designada WhereScape. Foi também conduzido um questionário a organizações que utilizam ferramentas de automação de data warehouse, para entender sua motivação na adoção deste tipo de ferramentas. Com base nos resultados da pesquisa e no estudo de caso, a automação no processo de construção de data warehouses, é necessária para uma maior agilidade destes sistemas e uma solução a considerar na modernização destas arquiteturas, pois permitem obter resultados mais rapidamente, mantendo os custos controlados e reduzindo o risco. A automação de data warehouse pode bem vir a ser uma “doçura”

    Conceptual Model of a Federated Data Lake

    Get PDF
    Valuable insights are frequently only available after combining and analysing data from multiple sources. This paper presents a Conceptual Model of a Federated Data Lake, as a contribution to formalize the required components and their relationships, in order to identify and address them in the implementation of a comprehensive system that supports on-the-fly query processing over multiple heterogeneous sources and provides an adequate data management by highlighting the concepts of a Data Lake and focusing on the Metadata Management domain as an engine to the integration of several Data Lakes

    Data Mesh: concepts and principles of a paradigm shift in data architectures

    Get PDF
    Inherent to the growing use of the most varied forms of software (e.g., social applications), there is the creation and storage of data that, due to its characteristics (volume, variety, and velocity), make the concept of Big Data emerge. Big Data Warehouses and Data Lakes are concepts already well established and implemented by several organizations, to serve their decision-making needs. After analyzing the various problems demonstrated by those monolithic architectures, it is possible to conclude about the need for a paradigm shift that will make organizations truly data-oriented. In this new paradigm, data is seen as the main concern of the organization, and the pipelining tools and the Data Lake itself are seen as a secondary concern. Thus, the Data Mesh consists in the implementation of an architecture where data is intentionally distributed among several Mesh nodes, in such a way that there is no chaos or data silos, since there are centralized governance strategies and the guarantee that the core principles are shared throughout the Mesh nodes. This paper presents the motivation for the appearance of the Data Mesh paradigm, its features, and approaches for its implementation.- (undefined

    Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

    Get PDF
    Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.Comment: SIGMOD'1
    corecore