48,224 research outputs found
Distributed Holistic Clustering on Linked Data
Link discovery is an active field of research to support data integration in
the Web of Data. Due to the huge size and number of available data sources,
efficient and effective link discovery is a very challenging task. Common
pairwise link discovery approaches do not scale to many sources with very large
entity sets. We here propose a distributed holistic approach to link many data
sources based on a clustering of entities that represent the same real-world
object. Our clustering approach provides a compact and fused representation of
entities, and can identify errors in existing links as well as many new links.
We support a distributed execution of the clustering approach to achieve faster
execution times and scalability for large real-world data sets. We provide a
novel gold standard for multi-source clustering, and evaluate our methods with
respect to effectiveness and efficiency for large data sets from the geographic
and music domains
Designing Traceability into Big Data Systems
Providing an appropriate level of accessibility and traceability to data or
process elements (so-called Items) in large volumes of data, often
Cloud-resident, is an essential requirement in the Big Data era.
Enterprise-wide data systems need to be designed from the outset to support
usage of such Items across the spectrum of business use rather than from any
specific application view. The design philosophy advocated in this paper is to
drive the design process using a so-called description-driven approach which
enriches models with meta-data and description and focuses the design process
on Item re-use, thereby promoting traceability. Details are given of the
description-driven design of big data systems at CERN, in health informatics
and in business process management. Evidence is presented that the approach
leads to design simplicity and consequent ease of management thanks to loose
typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International
Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore
July 2015. arXiv admin note: text overlap with arXiv:1402.5764,
arXiv:1402.575
Recommended from our members
A conceptual model for semantically-based e-government portals
Issues of semantic interoperability and service integration for e-government portals are the domain of interest of the present paper. We propose a Conceptual Model for One-Stop e-Government Portals based on the Semantic Web Service technology. We describe our research into building the three basic ontologies and their integration with standard ontologies. The result is a project-independent reusable model. At the same time, we outline a simple methodology for applying the proposed conceptual model into a specific scenario
SciTokens: Capability-Based Secure Access to Remote Scientific Data
The management of security credentials (e.g., passwords, secret keys) for
computational science workflows is a burden for scientists and information
security officers. Problems with credentials (e.g., expiration, privilege
mismatch) cause workflows to fail to fetch needed input data or store valuable
scientific results, distracting scientists from their research by requiring
them to diagnose the problems, re-run their computations, and wait longer for
their results. In this paper, we introduce SciTokens, open source software to
help scientists manage their security credentials more reliably and securely.
We describe the SciTokens system architecture, design, and implementation
addressing use cases from the Laser Interferometer Gravitational-Wave
Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey
Telescope (LSST) projects. We also present our integration with widely-used
software that supports distributed scientific computing, including HTCondor,
CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for
capability-based secure access to remote scientific data. The access tokens
convey the specific authorizations needed by the workflows, rather than
general-purpose authentication impersonation credentials, to address the risks
of scientific workflows running on distributed infrastructure including NSF
resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds
(e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the
interoperability and security of scientific workflows, SciTokens 1) enables use
of distributed computing for scientific domains that require greater data
protection and 2) enables use of more widely distributed computing resources by
reducing the risk of credential abuse on remote systems.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
Towards business integration as a service 2.0 (BIaaS 2.0)
Cloud Computing Business Framework (CCBF) is a framework for designing and implementation of Could Computing solutions. This proposal focuses on how CCBF can help to address linkage in Cloud Computing implementations. This leads to the development of Business Integration as a Service 1.0 (BIaaS 1.0) allowing different services, roles and functionalities to work together in a linkage-oriented framework where the outcome of one service can be input to another, without the need to translate between domains or languages. BIaaS 2.0 aims to allow automation, enhanced security, advanced risk modelling and improved collaboration between processes in BIaaS 1.0. The benefits from adopting BIaaS 1.0 and developing BIaaS 2.0 are illustrated using a case study from the University of Southampton and several collaborators including IBM US. BIaaS 2.0 can work with mainstream technologies such as scientific workflows, and the proposal and demonstration of BIaaS 2.0 will be aimed to certainly benefit industry and academia. © 2011 IEEE
- …