8,061 research outputs found
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing
The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements.
The project involves the following components:
1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies
Creating a Relational Distributed Object Store
In and of itself, data storage has apparent business utility. But when we can
convert data to information, the utility of stored data increases dramatically.
It is the layering of relation atop the data mass that is the engine for such
conversion. Frank relation amongst discrete objects sporadically ingested is
rare, making the process of synthesizing such relation all the more
challenging, but the challenge must be met if we are ever to see an equivalent
business value for unstructured data as we already have with structured data.
This paper describes a novel construct, referred to as a relational distributed
object store (RDOS), that seeks to solve the twin problems of how to
persistently and reliably store petabytes of unstructured data while
simultaneously creating and persisting relations amongst billions of objects.Comment: 12 pages, 5 figure
From Sensor to Observation Web with Environmental Enablers in the Future Internet
This paper outlines the grand challenges in global sustainability research and the objectives of the FP7 Future Internet PPP program within the Digital Agenda for Europe. Large user communities are generating significant amounts of valuable environmental observations at local and regional scales using the devices and services of the Future Internet. These communitiesâ environmental observations represent a wealth of information which is currently hardly used or used only in isolation and therefore in need of integration with other information sources. Indeed, this very integration will lead to a paradigm shift from a mere Sensor Web to an Observation Web with semantically enriched content emanating from sensors, environmental simulations and citizens. The paper also describes the research challenges to realize the Observation Web and the associated environmental enablers for the Future Internet. Such an environmental enabler could for instance be an electronic sensing device, a web-service application, or even a social networking group affording or facilitating the capability of the Future Internet applications to consume, produce, and use environmental observations in cross-domain applications. The term ?envirofied? Future Internet is coined to describe this overall target that forms a cornerstone of work in the Environmental Usage Area within the Future Internet PPP program. Relevant trends described in the paper are the usage of ubiquitous sensors (anywhere), the provision and generation of information by citizens, and the convergence of real and virtual realities to convey understanding of environmental observations. The paper addresses the technical challenges in the Environmental Usage Area and the need for designing multi-style service oriented architecture. Key topics are the mapping of requirements to capabilities, providing scalability and robustness with implementing context aware information retrieval. Another essential research topic is handling data fusion and model based computation, and the related propagation of information uncertainty. Approaches to security, standardization and harmonization, all essential for sustainable solutions, are summarized from the perspective of the Environmental Usage Area. The paper concludes with an overview of emerging, high impact applications in the environmental areas concerning land ecosystems (biodiversity), air quality (atmospheric conditions) and water ecosystems (marine asset management)
Metadata and ontologies for organizing studentsâ memories and learning: standards and convergence models for context awareness
Este artĂculo trata de las ontologĂas que sirven para la comprensiĂłn en contexto y la GestiĂłn de la InformaciĂłn Personal (PIM)y su aplicabilidad al proyecto Memex Metadata(M2). M2 es un proyecto de investigaciĂłn de la Universidad de Carolina del Norte en Chapel Hill para mejorar la memoria digital de los alumnos utilizando tablet PC, la tecnologĂa SenseCam de Microsoft y otras tecnologĂas mĂłviles(p.ej. un dispositivo de GPS) para capturar el contexto del aprendizaje. Este artĂculo presenta el proyecto M2, dicute el concepto de los portafolios digitales en las actuales tendencias educativas, relacionĂĄndolos con las tecnologĂas emergentes, revisa las ontologĂas relevantes y su relaciĂłn con el proyecto CAF (Context Awareness Framework), y concluye identificando las lĂneas de investigaciĂłn futuras.This paper focuses on ontologies supporting context awareness and Personal Information Management (PIM) and their
applicability in Memex Metadata (M2) project. M2 is a research project of the University of North Carolina at Chapel Hill to
improve student digital memories using the tablet PC, Microsoftâs SenseCam technology, and other mobile technologies (e.g.,
a GPS device) to capture context. The M2 project offers new opportunities studying studentsâ learning with digital
technologies. This paper introduces the M2 project; discusses E-portfolios and current educational trends related to pervasive
computing; reviews relevant ontologies and their relationship to the projectsâ CAF (context awareness framework), and
concludes by identifying future research directions
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and âenablersâ, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Geospatial information infrastructures
Manual of Digital Earth / Editors: Huadong Guo, Michael F. Goodchild, Alessandro Annoni .- Springer, 2020 .- ISBN: 978-981-32-9915-3Geospatial information infrastructures (GIIs) provide the technological, semantic,organizationalandlegalstructurethatallowforthediscovery,sharing,and use of geospatial information (GI). In this chapter, we introduce the overall concept and surrounding notions such as geographic information systems (GIS) and spatial datainfrastructures(SDI).WeoutlinethehistoryofGIIsintermsoftheorganizational andtechnologicaldevelopmentsaswellasthecurrentstate-of-art,andreïŹectonsome of the central challenges and possible future trajectories. We focus on the tension betweenincreasedneedsforstandardizationandtheever-acceleratingtechnological changes. We conclude that GIIs evolved as a strong underpinning contribution to implementation of the Digital Earth vision. In the future, these infrastructures are challengedtobecomeïŹexibleandrobustenoughtoabsorbandembracetechnological transformationsandtheaccompanyingsocietalandorganizationalimplications.With this contribution, we present the reader a comprehensive overview of the ïŹeld and a solid basis for reïŹections about future developments
DRIVER Technology Watch Report
This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field
Nonparametric Bayesian Modeling for Automated Database Schema Matching
The problem of merging databases arises in many government and commercial
applications. Schema matching, a common first step, identifies equivalent
fields between databases. We introduce a schema matching framework that builds
nonparametric Bayesian models for each field and compares them by computing the
probability that a single model could have generated both fields. Our
experiments show that our method is more accurate and faster than the existing
instance-based matching algorithms in part because of the use of nonparametric
Bayesian models
- âŠ