Search CORE

4,662 research outputs found

Recommended from our members

On requirements for federated data integration as a compilation process

Author: Adamou Alessandro
d'Aquin Mathieu
Publication venue: CEUR-WS.org
Publication date: 15/05/2015
Field of study

Data integration problems are commonly viewed as interoperability issues, where the burden of reaching a common ground for exchanging data is distributed across the peers involved in the process. While apparently an effective approach towards standardization and interoperability, it poses a constraint to data providers who, for a variety of reasons, require backwards compatibility with proprietary or non-standard mechanisms. Publishing a holistic data API is one such use case, where a single peer performs most of the integration work in a many-to-one scenario. Incidentally, this is also the base setting of software compilers, whose operational model is comprised of phases that perform analysis, linkage and assembly of source code and generation of intermediate code. There are several analogies with a data integration process, more so with data that live in the Semantic Web, but what requirements would a data provider need to satisfy, for an integrator to be able to query and transform its data effectively, with no further enforcements on the provider? With this paper, we inquire into what practices and essential prerequisites could turn this intuition into a concrete and exploitable vision, within Linked Data and beyond

Open Research Online (The Open University)

Enabling Adaptive Grid Scheduling and Resource Management

Author: Lazarevic A.
Prnjat O.
Sacks L.
Publication venue
Publication date: 01/05/2005
Field of study

Wider adoption of the Grid concept has led to an increasing amount of federated computational, storage and visualisation resources being available to scientists and researchers. Distributed and heterogeneous nature of these resources renders most of the legacy cluster monitoring and management approaches inappropriate, and poses new challenges in workflow scheduling on such systems. Effective resource utilisation monitoring and highly granular yet adaptive measurements are prerequisites for a more efficient Grid scheduler. We present a suite of measurement applications able to monitor per-process resource utilisation, and a customisable tool for emulating observed utilisation models. We also outline our future work on a predictive and probabilistic Grid scheduler. The research is undertaken as part of UK e-Science EPSRC sponsored project SO-GRM (Self-Organising Grid Resource Management) in cooperation with BT

UCL Discovery

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Integrated data analysis (IDA) pipelines—that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring—become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used—increasingly heterogeneous—hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results

Institute of Transport Research:Publications

The IT University of Copenhagen's Repository

BlogForever D3.2: Interoperability Prospects

Author: Banos V.
Berninger L.
Kalb H.
Kim Y.
Kopidaki S.
Lazaridou P.
Pinsent E.
Ross S.
Publication venue
Publication date: 25/10/2013
Field of study

This report evaluates the interoperability prospects of the BlogForever platform. Therefore, existing interoperability models are reviewed, a Delphi study to identify crucial aspects for the interoperability of web archives and digital libraries is conducted, technical interoperability standards and protocols are reviewed regarding their relevance for BlogForever, a simple approach to consider interoperability in specific usage scenarios is proposed, and a tangible approach to develop a succession plan that would allow a reliable transfer of content from the current digital archive to other digital repositories is presented

ZENODO

An object-oriented approach to distributed data management.

Author: Marinos L.
Papazoglou M.
Publication venue
Publication date
Field of study

Research Papers in Economics

Collaborative Improvement of Smart Manufacturing using Privacy-Preserving Federated Learning

Author: Costa Alexandre Miguel Manta da
Publication venue
Publication date: 01/12/2022
Field of study

Nowadays, data sharing among different sources is is very challenging in the manufac- turing domain, mainly due to industry competition, complicated bureaucratic processes, and privacy and security concerns. Centralized Machine Learning (ML) poses an essential aspect in several industries, including smart manufacturing. However this approach may lead to several issues regarding security and performance. In response to these problems, Federated Learning (FL) was created. FL is an innova- tive and decentralized approach to ML, focused on collaboration and data privacy. In this approach, data is kept in each source where it is trained locally, and only model weights or gradients are shared to create a global model. Although several works have already been implemented towards this problem, there are still many unresolved issues concerning the application of FL frameworks in smart manufacturing scenarios. Among the several issues found in the analysed works it is important to emphasize the disregard facing industry 4.0 architectures, strategies and the unavailability to improve those frameworks further. This work aims to build a FL framework for smart manufacturing with specific con- cerns in privacy and applicability in industrial scenarios. The main focus of this frame- work is to facilitate a collaborative approach in the application of ML to manufacturing by enabling the knowledge sharing for this purpose and taking privacy as a special concern. In addition, the implementation and testing of privacy-preserving algorithms, while im- proving the framework for industrial scenarios are emphasized. A modular approach is chosen to create a framework adapted to various industrial cases by implementing several nodes that focus on specific aspects of data collection, data treatment, connection with the FL system, and ML model management. The results revealed a competitive model performance of the framework compared to the centralized approach while keeping data at each source, protecting its privacy. The implemented framework also proved to be compliant with the IEEE Std 3652.1-2020 standard guidelines, attaining the established requirement levels.Atualmente, a partilha de dados entre diferentes fontes é um grande desafio no domí- nio da manufatura, principalmente devido à concorrência da indústria, processos burocrá- ticos complicados e preocupações de privacidade e segurança. O Machine Learning (ML) impõe-se como um aspeto essencial em várias indústrias, incluindo a manufatura inteli- gente. Contudo, esta abordagem pode levantar várias questões relativamente à segurança e ao desempenho. Em resposta a estes problemas, foi criado o Federated Learning (FL). FL é uma aborda- gem inovadora e descentralizada de ML, centrada na colaboração e privacidade de dados. Nesta abordagem, os dados são mantidos em cada fonte, onde são treinados localmente, e apenas os pesos ou gradientes dos modelos são partilhados para criar um modelo global. Embora vários trabalhos já tenham sido implementados visando esta temática, ainda existem muitas questões por resolver relativas à aplicação de frameworks de FL em ce- nários de manufatura inteligente. Entre as várias questões encontradas na literatura analisada, é importante enfatizar a desconsideração pelas arquiteturas e estratégias da indústria 4.0 e a indisponibilidade para melhorar essas frameworks. Este trabalho visa construir uma framework de FL aplicada à manufatura inteligente com preocupações específicas no que toca a matérias de privacidade e aplicabilidade em cenários industriais. O principal objectivo desta framework é facilitar uma abordagem colaborativa na aplicação de ML ao fabrico, permitindo a partilha de conhecimentos para este fim e enfatizando a preocupação na privacidade dos utilizadores. Uma abordagem modular foi escolhida para criar uma framework adaptada a vários casos industriais atra- vés da implementação de vários nós que se concentram em aspetos específicos da recolha de dados, tratamento de dados, ligação com o sistema de FL e gestão do modelo de ML. Os resultados revelaram um desempenho competitivo do modelo em relação a uma abordagem centralizada, mantendo os dados em cada fonte e protegendo a sua privaci- dade. A framework implementada também provou estar em conformidade com a norma IEEE Std 3652.1-2020, atingindo os níveis de exigência estabelecidos

Repositório da Universidade Nova de Lisboa

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

University of Melbourne Institutional Repository