4,662 research outputs found
Recommended from our members
On requirements for federated data integration as a compilation process
Data integration problems are commonly viewed as interoperability issues, where the burden of reaching a common ground for exchanging data is distributed across the peers involved in the process. While apparently an effective approach towards standardization and interoperability, it poses a constraint to data providers who, for a variety of reasons, require backwards compatibility with proprietary or non-standard mechanisms. Publishing a holistic data API is one such use case, where a single peer performs most of the integration work in a many-to-one scenario. Incidentally, this is also the base setting of software compilers, whose operational model is comprised of phases that perform analysis, linkage and assembly of source code and generation of intermediate code. There are several analogies with a data integration process, more so with data that live in the Semantic Web, but what requirements would a data provider need to satisfy, for an integrator to be able to query and transform its data effectively, with no further enforcements on the provider? With this paper, we inquire into what practices and essential prerequisites could turn this intuition into a concrete and exploitable vision, within Linked Data and beyond
Enabling Adaptive Grid Scheduling and Resource Management
Wider adoption of the Grid concept has led to an increasing amount of federated
computational, storage and visualisation resources being available to scientists and
researchers. Distributed and heterogeneous nature of these resources renders most of the
legacy cluster monitoring and management approaches inappropriate, and poses new
challenges in workflow scheduling on such systems. Effective resource utilisation monitoring
and highly granular yet adaptive measurements are prerequisites for a more efficient Grid
scheduler. We present a suite of measurement applications able to monitor per-process
resource utilisation, and a customisable tool for emulating observed utilisation models. We
also outline our future work on a predictive and probabilistic Grid scheduler. The research is
undertaken as part of UK e-Science EPSRC sponsored project SO-GRM (Self-Organising
Grid Resource Management) in cooperation with BT
DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Integrated data analysis (IDA) pipelines—that combine data management (DM) and query processing, high-performance computing
(HPC), and machine learning (ML) training and scoring—become
increasingly common in practice. Interestingly, systems of these
areas share many compilation and runtime techniques, and the
used—increasingly heterogeneous—hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource
management, data formats and representations, as well as execution
strategies differ substantially. DAPHNE is an open and extensible
system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for
increasing productivity and eliminating unnecessary overheads. In
this paper, we make a case for IDA pipelines, describe the overall
DAPHNE system architecture, its key components, and the design
of a vectorized execution engine for computational storage, HW
accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas,
DuckDB, and TensorFlow show promising results
BlogForever D3.2: Interoperability Prospects
This report evaluates the interoperability prospects of the BlogForever platform. Therefore, existing interoperability models are reviewed, a Delphi study to identify crucial aspects for the interoperability of web archives and digital libraries is conducted, technical interoperability standards and protocols are reviewed regarding their relevance for BlogForever, a simple approach to consider interoperability in specific usage scenarios is proposed, and a tangible approach to develop a succession plan that would allow a reliable transfer of content from the current digital archive to other digital repositories is presented
Collaborative Improvement of Smart Manufacturing using Privacy-Preserving Federated Learning
Nowadays, data sharing among different sources is is very challenging in the manufac-
turing domain, mainly due to industry competition, complicated bureaucratic processes,
and privacy and security concerns. Centralized Machine Learning (ML) poses an essential
aspect in several industries, including smart manufacturing. However this approach may
lead to several issues regarding security and performance.
In response to these problems, Federated Learning (FL) was created. FL is an innova-
tive and decentralized approach to ML, focused on collaboration and data privacy. In this
approach, data is kept in each source where it is trained locally, and only model weights
or gradients are shared to create a global model.
Although several works have already been implemented towards this problem, there
are still many unresolved issues concerning the application of FL frameworks in smart
manufacturing scenarios. Among the several issues found in the analysed works it is
important to emphasize the disregard facing industry 4.0 architectures, strategies and
the unavailability to improve those frameworks further.
This work aims to build a FL framework for smart manufacturing with specific con-
cerns in privacy and applicability in industrial scenarios. The main focus of this frame-
work is to facilitate a collaborative approach in the application of ML to manufacturing by
enabling the knowledge sharing for this purpose and taking privacy as a special concern.
In addition, the implementation and testing of privacy-preserving algorithms, while im-
proving the framework for industrial scenarios are emphasized. A modular approach is
chosen to create a framework adapted to various industrial cases by implementing several
nodes that focus on specific aspects of data collection, data treatment, connection with
the FL system, and ML model management.
The results revealed a competitive model performance of the framework compared to
the centralized approach while keeping data at each source, protecting its privacy. The
implemented framework also proved to be compliant with the IEEE Std 3652.1-2020
standard guidelines, attaining the established requirement levels.Atualmente, a partilha de dados entre diferentes fontes é um grande desafio no domí-
nio da manufatura, principalmente devido à concorrência da indústria, processos burocrá-
ticos complicados e preocupações de privacidade e segurança. O Machine Learning (ML)
impõe-se como um aspeto essencial em várias indústrias, incluindo a manufatura inteli-
gente. Contudo, esta abordagem pode levantar várias questões relativamente à segurança
e ao desempenho.
Em resposta a estes problemas, foi criado o Federated Learning (FL). FL é uma aborda-
gem inovadora e descentralizada de ML, centrada na colaboração e privacidade de dados.
Nesta abordagem, os dados são mantidos em cada fonte, onde são treinados localmente, e
apenas os pesos ou gradientes dos modelos são partilhados para criar um modelo global.
Embora vários trabalhos já tenham sido implementados visando esta temática, ainda
existem muitas questões por resolver relativas à aplicação de frameworks de FL em ce-
nários de manufatura inteligente. Entre as várias questões encontradas na literatura
analisada, é importante enfatizar a desconsideração pelas arquiteturas e estratégias da
indústria 4.0 e a indisponibilidade para melhorar essas frameworks.
Este trabalho visa construir uma framework de FL aplicada à manufatura inteligente
com preocupações específicas no que toca a matérias de privacidade e aplicabilidade em
cenários industriais. O principal objectivo desta framework é facilitar uma abordagem
colaborativa na aplicação de ML ao fabrico, permitindo a partilha de conhecimentos para
este fim e enfatizando a preocupação na privacidade dos utilizadores. Uma abordagem
modular foi escolhida para criar uma framework adaptada a vários casos industriais atra-
vés da implementação de vários nós que se concentram em aspetos específicos da recolha
de dados, tratamento de dados, ligação com o sistema de FL e gestão do modelo de ML.
Os resultados revelaram um desempenho competitivo do modelo em relação a uma
abordagem centralizada, mantendo os dados em cada fonte e protegendo a sua privaci-
dade. A framework implementada também provou estar em conformidade com a norma
IEEE Std 3652.1-2020, atingindo os níveis de exigência estabelecidos
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
- …