Search CORE

48,224 research outputs found

Distributed Holistic Clustering on Linked Data

Author: A Saeedi
A-C Ngonga Ngomo
E Rahm
I Megdiche
K Hildebrandt
M Nentwig
M Nentwig
Publication venue
Publication date: 30/08/2017
Field of study

Link discovery is an active field of research to support data integration in the Web of Data. Due to the huge size and number of available data sources, efficient and effective link discovery is a very challenging task. Common pairwise link discovery approaches do not scale to many sources with very large entity sets. We here propose a distributed holistic approach to link many data sources based on a clustering of entities that represent the same real-world object. Our clustering approach provides a compact and fused representation of entities, and can identify errors in existing links as well as many new links. We support a distributed execution of the clustering approach to achieve faster execution times and scalability for large real-world data sets. We provide a novel gold standard for multi-source clustering, and evaluate our methods with respect to effectiveness and efficiency for large data sets from the geographic and music domains

arXiv.org e-Print Archive

Crossref

Designing Traceability into Big Data Systems

Author: Branson Andrew
Consortium the CRISTAL-ISE
Kovacs Zsolt
McClatchey Richard
Shamdasani Jetendr
Publication venue
Publication date: 01/01/2015
Field of study

Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore July 2015. arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.575

arXiv.org e-Print Archive

CERN Document Server

Recommended from our members

A conceptual model for semantically-based e-government portals

Author: Cabral Liliana
Domingue John
Gugliotta Alessio
Roberto Vito
Publication venue
Publication date: 01/01/2005
Field of study

Issues of semantic interoperability and service integration for e-government portals are the domain of interest of the present paper. We propose a Conceptual Model for One-Stop e-Government Portals based on the Semantic Web Service technology. We describe our research into building the three basic ontologies and their integration with standard ontologies. The result is a project-independent reusable model. At the same time, we outline a simple methodology for applying the proposed conceptual model into a specific scenario

Open Research Online (The Open University)

SciTokens: Capability-Based Secure Access to Remote Scientific Data

Author: Basney Jim
Bockelman Brian
Brown Duncan
Gaynor Jeff
Miller Zach
Tannenbaum Todd
Weitzel Derek
Withers Alex
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/07/2018
Field of study

The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, design, and implementation addressing use cases from the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

arXiv.org e-Print Archive

Crossref

Towards business integration as a service 2.0 (BIaaS 2.0)

Author: Chang V
Walters RJ
Wills G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2011
Field of study

Cloud Computing Business Framework (CCBF) is a framework for designing and implementation of Could Computing solutions. This proposal focuses on how CCBF can help to address linkage in Cloud Computing implementations. This leads to the development of Business Integration as a Service 1.0 (BIaaS 1.0) allowing different services, roles and functionalities to work together in a linkage-oriented framework where the outcome of one service can be input to another, without the need to translate between domains or languages. BIaaS 2.0 aims to allow automation, enhanced security, advanced risk modelling and improved collaboration between processes in BIaaS 1.0. The benefits from adopting BIaaS 1.0 and developing BIaaS 2.0 are illustrated using a case study from the University of Southampton and several collaborators including IBM US. BIaaS 2.0 can work with mainstream technologies such as scientific workflows, and the proposal and demonstration of BIaaS 2.0 will be aimed to certainly benefit industry and academia. © 2011 IEEE

Crossref

Greenwich Academic Literature Archive

Leeds Beckett Repository