22 research outputs found

    AstroGrid-D: Grid Technology for Astronomical Science

    Full text link
    We present status and results of AstroGrid-D, a joint effort of astrophysicists and computer scientists to employ grid technology for scientific applications. AstroGrid-D provides access to a network of distributed machines with a set of commands as well as software interfaces. It allows simple use of computer and storage facilities and to schedule or monitor compute tasks and data management. It is based on the Globus Toolkit middleware (GT4). Chapter 1 describes the context which led to the demand for advanced software solutions in Astrophysics, and we state the goals of the project. We then present characteristic astrophysical applications that have been implemented on AstroGrid-D in chapter 2. We describe simulations of different complexity, compute-intensive calculations running on multiple sites, and advanced applications for specific scientific purposes, such as a connection to robotic telescopes. We can show from these examples how grid execution improves e.g. the scientific workflow. Chapter 3 explains the software tools and services that we adapted or newly developed. Section 3.1 is focused on the administrative aspects of the infrastructure, to manage users and monitor activity. Section 3.2 characterises the central components of our architecture: The AstroGrid-D information service to collect and store metadata, a file management system, the data management system, and a job manager for automatic submission of compute tasks. We summarise the successfully established infrastructure in chapter 4, concluding with our future plans to establish AstroGrid-D as a platform of modern e-Astronomy.Comment: 14 pages, 12 figures Subjects: data analysis, image processing, robotic telescopes, simulations, grid. Accepted for publication in New Astronom

    Multi-tenant Pub/Sub processing for real-time data streams

    Get PDF
    Devices and sensors generate streams of data across a diversity of locations and protocols. That data usually reaches a central platform that is used to store and process the streams. Processing can be done in real time, with transformations and enrichment happening on-the-fly, but it can also happen after data is stored and organized in repositories. In the former case, stream processing technologies are required to operate on the data; in the latter batch analytics and queries are of common use. This paper introduces a runtime to dynamically construct data stream processing topologies based on user-supplied code. These dynamic topologies are built on-the-fly using a data subscription model defined by the applications that consume data. Each user-defined processing unit is called a Service Object. Every Service Object consumes input data streams and may produce output streams that others can consume. The subscription-based programing model enables multiple users to deploy their own data-processing services. The runtime does the dynamic forwarding of data and execution of Service Objects from different users. Data streams can originate in real-world devices or they can be the outputs of Service Objects. The runtime leverages Apache STORM for parallel data processing, that combined with dynamic user-code injection provides multi-tenant stream processing topologies. In this work we describe the runtime, its features and implementation details, as well as we include a performance evaluation of some of its core components.This work is partially supported by the European Research Council (ERC) un- der the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and Competitivity (TIN2015-65316-P) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Secure Sharing of Spatio-Temporal Data through Name-based Access Control

    Get PDF
    Named Data Networking (NDN) is proposed as a future Internet architecture, which provides name-based data publishing and fetching primitive. Compared to TCP/IP, the benefits of NDN are as follows. NDN removes the need to manage IP address; NDN provides semantically meaningful and structured names; NDN has a stateful and name-based forwarding plane; NDN supports data-centric security and in-network caching. Name-based Access Control is an access control solution proposed over NDN, which is a content-based access control by encrypting data at the time of production directly without relying on a third-party service(i.e., Cloud storage), utilizes NDN’s hierarchical naming convention to express access control policy, and enables automation of key distribution. As more and more mobile data (e.g., mobile-health data) are generated dynamically and continuously over time and space, data owners often want to share his data with others for data analysis or healthcare, etc. To protect their privacy, they may want to share a subset of data based on their requirements with time and/or space restrictions. An effective and secure access control solution is required to ensure only authorized users can access certain data with fine granularity. Inspired by Named-based Access Control scheme, we take into account the data attributes (time, location) to make access decisions. In this work, we introduce a spatio-temporal access control scheme that allows data owners to specify access control policy and limit data access to a given time interval and/or location area. Specifically, we design a hierarchically structured naming convention to express fine-grained access control policy on spatio-temporal data, werealize a publish-subscribe functionality based on PSync for real-time data stream sharing, we develop a practical spatial-temporal data access control prototype based on NDN codebase. Moreover, we run experiments using Mini-NDN to evaluate the performance of sharing historical data from storage and sharing.data in real time

    Combined AI Capabilities for Enhancing Maritime Safety in a Common Information Sharing Environment

    Get PDF
    The complexity of maritime traffic operations indicates an unprecedented necessity for joint introduction and exploitation of artificial intelligence (AI) technologies, that take advantage of the vast amount of vessels’ data, offered by disparate surveillance systems to face challenges at sea. This paper reviews the recent Big Data and AI technology implementations for enhancing the maritime safety level in the common information sharing environment (CISE) of the maritime agencies, including vessel behavior and anomaly monitoring, and ship collision risk assessment. Specifically, the trajectory fusion implemented with InSyTo module for soft information fusion and management toolbox, and the Early Notification module for Vessel Collision are presented within EFFECTOR Project. The focus is to elaborate technical architecture features of these modules and combined AI capabilities for achieving the desired interoperability and complementarity between maritime systems, aiming to provide better decision support and proper information to be distributed among CISE maritime safety stakeholders

    ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation

    Full text link
    Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models to non-English inputs and achieve impressive performance. However, these models focus only on understanding tasks utilizing encoder-only architecture. In this paper, we propose ERNIE-UniX2, a unified cross-lingual cross-modal pre-training framework for both generation and understanding tasks. ERNIE-UniX2 integrates multiple pre-training paradigms (e.g., contrastive learning and language modeling) based on encoder-decoder architecture and attempts to learn a better joint representation across languages and modalities. Furthermore, ERNIE-UniX2 can be seamlessly fine-tuned for varieties of generation and understanding downstream tasks. Pre-trained on both multilingual text-only and image-text datasets, ERNIE-UniX2 achieves SOTA results on various cross-lingual cross-modal generation and understanding tasks such as multimodal machine translation and multilingual visual question answering.Comment: 13 pages, 2 figure

    Professionals’ digital training for child maltreatment prevention in the COVID-19 era : a pan-European model

    Get PDF
    Funding: This study is part of the ERICA project funded by the European Union’s Rights, Equality and Citizenship Programme (2014–2020). GA 856760.The responsiveness of professionals working with children and families is of key importance for child maltreatment early identification. However, this might be undermined when multifaceted circumstances, such as the COVID-19 pandemic, reduce interdisciplinary educational activities. Thanks to technological developments, digital platforms seem promising in dealing with 30 new challenges for professionals’ trainings. We examined a digital approach to child maltreatment training through the ERICA project experience (Stopping Child Maltreatment through Pan-European Multiprofessional Training Programme). ERICA has been piloted during the pandemic in seven European centers involving interconnected sectors of professionals working with children and families. The training consisted of interactive modules embedded in a digital learning frame-work. Different aspects (i.e., technology, interaction, and organization) were evaluated and trainers’ feedback on digital features was sought. Technical issues were the main barrier. However, these did not significantly disrupt the training. The trainers perceived reduced interaction between participants although distinct factors were uncovered as potential favorable mediators. Based on participants’ subjective experiences and perspectives, digital learning frameworks for professionals working with children and families, like the ERICA model nested in its indispensable adaptation to an e-learning mode, can represent a novel interactive approach to empower trainers and trainees to tackle child maltreatment during critical times like a pandemic and as an alternative to more traditional learning frameworks.Publisher PDFPeer reviewe

    Data sharing in neurodegenerative disease research: challenges and learnings from the innovative medicines initiative public-private partnership model

    Get PDF
    Efficient data sharing is hampered by an array of organizational, ethical, behavioral, and technical challenges, slowing research progress and reducing the utility of data generated by clinical research studies on neurodegenerative diseases. There is a particular need to address differences between public and private sector environments for research and data sharing, which have varying standards, expectations, motivations, and interests. The Neuronet data sharing Working Group was set up to understand the existing barriers to data sharing in public-private partnership projects, and to provide guidance to overcome these barriers, by convening data sharing experts from diverse projects in the IMI neurodegeneration portfolio. In this policy and practice review, we outline the challenges and learnings of the WG, providing the neurodegeneration community with examples of good practices and recommendations on how to overcome obstacles to data sharing. These obstacles span organizational issues linked to the unique structure of cross-sectoral, collaborative research initiatives, to technical issues that affect the storage, structure and annotations of individual datasets. We also identify sociotechnical hurdles, such as academic recognition and reward systems that disincentivise data sharing, and legal challenges linked to heightened perceptions of data privacy risk, compounded by a lack of clear guidance on GDPR compliance mechanisms for public-private research. Focusing on real-world, neuroimaging and digital biomarker data, we highlight particular challenges and learnings for data sharing, such as data management planning, development of ethical codes of conduct, and harmonization of protocols and curation processes. Cross-cutting solutions and enablers include the principles of transparency, standardization and co-design – from open, accessible metadata catalogs that enhance findability of data, to measures that increase visibility and trust in data reuse

    Scalable processing of aggregate functions for data streams in resource-constrained environments

    Get PDF
    The fast evolution of data analytics platforms has resulted in an increasing demand for real-time data stream processing. From Internet of Things applications to the monitoring of telemetry generated in large datacenters, a common demand for currently emerging scenarios is the need to process vast amounts of data with low latencies, generally performing the analysis process as close to the data source as possible. Devices and sensors generate streams of data across a diversity of locations and protocols. That data usually reaches a central platform that is used to store and process the streams. Processing can be done in real time, with transformations and enrichment happening on-the-fly, but it can also happen after data is stored and organized in repositories. In the former case, stream processing technologies are required to operate on the data; in the latter batch analytics and queries are of common use. Stream processing platforms are required to be malleable and absorb spikes generated by fluctuations of data generation rates. Data is usually produced as time series that have to be aggregated using multiple operators, being sliding windows one of the most common abstractions used to process data in real-time. To satisfy the above-mentioned demands, efficient stream processing techniques that aggregate data with minimal computational cost need to be developed. However, data analytics might require to aggregate extensive windows of data. Approximate computing has been a central paradigm for decades in data analytics in order to improve the performance and reduce the needed resources, such as memory, computation time, bandwidth or energy. In exchange for these improvements, the aggregated results suffer from a level of inaccuracy that in some cases can be predicted and constrained. This doctoral thesis aims to demonstrate that it is possible to have constant-time and memory efficient aggregation functions with approximate computing mechanisms for constrained environments. In order to achieve this goal, the work has been structured in three research challenges. First we introduce a runtime to dynamically construct data stream processing topologies based on user-supplied code. These dynamic topologies are built on-the-fly using a data subscription model de¿ned by the applications that consume data. The subscription-based programing model enables multiple users to deploy their own data-processing services. On top of this runtime, we present the Amortized Monoid Tree Aggregator general sliding window aggregation framework, which seamlessly combines the following features: amortized O(1) time complexity and a worst-case of O(log n) between insertions; it provides both a window aggregation mechanism and a window slide policy that are user programmable; the enforcement of the window sliding policy exhibits amortized O(1) computational cost for single evictions and supports bulk evictions with cost O(log n); and it requires a local memory space of O(log n). The framework can compute aggregations over multiple data dimensions, and has been designed to support decoupling computation and data storage through the use of distributed Key-Value Stores to keep window elements and partial aggregations. Specially motivated by edge computing scenarios, we contribute Approximate and Amortized Monoid Tree Aggregator (A2MTA). It is, to our knowledge, the first general purpose sliding window programable framework that combines constant-time aggregations with error bounded approximate computing techniques. A2MTA uses statistical analysis of the stream data in order to perform inaccurate aggregations, providing a critical reduction of needed resources for massive stream data aggregation, and an improvement of performance.La ràpida evolució de les plataformes d'anàlisi de dades ha resultat en un increment de la demanda de processament de fluxos continus de dades en temps real. Des de la internet de les coses fins al monitoratge de telemetria generada en grans servidors, una demanda recurrent per escenaris emergents es la necessitat de processar grans quantitats de dades amb latències molt baixes, generalment fent el processat de les dades tant a prop dels origines com sigui possible. Les dades son generades com a fluxos continus per dispositius que utilitzen una varietat de localitzacions i protocols. Aquests processat de les dades s pot fer en temps real amb les transformacions efectuant-se al vol, i en aquest cas la utilització de plataformes de processat d'streams és necessària. Les plataformes de processat d'streams cal que absorbeixin pics de freqüència de dades. Les dades es generen com a series temporals que s'agreguen fent servir multiples operadors, on les finestres són l'abstracció més habitual. Per a satisfer les baixes latències i maleabilitat requerides, els operadors necesiten tenir un cost computacional mínim, inclús amb extenses finestres de dades per a agregar. La computació aproximada ha sigut durant decades un paradigma rellevant per l'anàlisi de dades on cal millorar el rendiment de diferents algorismes i reduir-ne el temps de computació, la memòria requerida, l'ample de banda o el consum energètic. A canvi d'aquestes millores, els resultats poden patir d'una falta d'exactitud que pot ser estimada i controlada. Aquesta tesi doctoral vol demostrar que es posible tenir funcions d'agregació pel processat d'streams que tinc un cost de temps constant, sigui eficient en termes de memoria i faci ús de computació aproximada. Per aconseguir aquests objectius, aquesta tesi està dividida en tres reptes. Primer presentem un entorn per a la construcció dinàmica de topologies de computació d'streams de dades utilitzant codi d'usuari. Aquestes topologies es construeixen fent servir un model de subscripció a streams, en el que les aplicación consumidores de dades amplien les topologies mentre s'estan executant. Aquest entorn permet multiples entitats ampliant una mateixa topologia. A sobre d'aquest entorn, presentem un framework de propòsit general per a l'agregació de finestres de dades anomenat AMTA (Amortized Monoid Tree Aggregator). Aquest framework combina: temps amortitzat constant per a totes les operacions, amb un cas pitjor logarítmic; programable tant en termes d'agregació com en termes d'expulsió d'elements de la finestra. L'expulsió massiva d'elements de la finestra es considera una operació atòmica, amb un cost amortitzat constant; i requereix espai en memoria local per a O(log n) elements de la finestra. Aquest framework pot computar agregacions sobre multiples dimensions de dades, i ha estat dissenyat per desacoplar la computació de les dades del seu desat, podent tenir els continguts de la finestra distribuits en diferents màquines. Motivats per la computació en l'edge (edge computing), hem contribuit A2MTA (Approximate and Amortized Monoid Tree Aggregator). Des de el nostre coneixement, es el primer framework de propòsit general per a la computació de finestres que combina un cost constant per a totes les seves operacions amb tècniques de computació aproximada amb control de l'error. A2MTA fa us d'anàlisis estadístics per a poder fer agregacions amb error limitat, reduint críticament els recursos necessaris per a la computació de grans quantitats de dades

    Realtime Processing and Presentation of Environmental data on Global Sensor Network web framework to study climate change on Glaciers.

    Get PDF
    La tesi tratta lo studio e l'implementazione di un realtime web framework per il monitoraggio di dati ambientali forniti da una Automatic Weather Station (AWS) situata sul ghiacciaio de LaMare. Lo scopo di questo lavoro è quello di rendere accessibile il mondo della glaciologia a studiosi che effettuano ricerche e ad utenti non esperti ma con curiosità sull'argomento. Il lavoro è stato diviso in 2 fasi: la prima si preoccupa di studiare un web framework esistente, realizzato dal CCES (ETH Center for Competence Environment and Sustainability), per l'acquisizione e la visualizzazione dei dati, valutandone le potenzialità e la conformità con le nostre specifiche. La seconda fase consiste nell'implementazione del framework, lo sviluppo di sensori virtuali utili al nostro scopo, e lo sviluppo di moduli aggiuntivi per completare le funzionalità necessarie. Questo lavoro ha portato alla creazione di un servizio web che gestisce 7 sensori virtuali, con i quali interagire per effettuare comparazioni grafiche sui dati raccolti
    corecore