8,061 research outputs found

    Towards Exascale Scientific Metadata Management

    Full text link
    Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    Creating a Relational Distributed Object Store

    Full text link
    In and of itself, data storage has apparent business utility. But when we can convert data to information, the utility of stored data increases dramatically. It is the layering of relation atop the data mass that is the engine for such conversion. Frank relation amongst discrete objects sporadically ingested is rare, making the process of synthesizing such relation all the more challenging, but the challenge must be met if we are ever to see an equivalent business value for unstructured data as we already have with structured data. This paper describes a novel construct, referred to as a relational distributed object store (RDOS), that seeks to solve the twin problems of how to persistently and reliably store petabytes of unstructured data while simultaneously creating and persisting relations amongst billions of objects.Comment: 12 pages, 5 figure

    From Sensor to Observation Web with Environmental Enablers in the Future Internet

    Get PDF
    This paper outlines the grand challenges in global sustainability research and the objectives of the FP7 Future Internet PPP program within the Digital Agenda for Europe. Large user communities are generating significant amounts of valuable environmental observations at local and regional scales using the devices and services of the Future Internet. These communities’ environmental observations represent a wealth of information which is currently hardly used or used only in isolation and therefore in need of integration with other information sources. Indeed, this very integration will lead to a paradigm shift from a mere Sensor Web to an Observation Web with semantically enriched content emanating from sensors, environmental simulations and citizens. The paper also describes the research challenges to realize the Observation Web and the associated environmental enablers for the Future Internet. Such an environmental enabler could for instance be an electronic sensing device, a web-service application, or even a social networking group affording or facilitating the capability of the Future Internet applications to consume, produce, and use environmental observations in cross-domain applications. The term ?envirofied? Future Internet is coined to describe this overall target that forms a cornerstone of work in the Environmental Usage Area within the Future Internet PPP program. Relevant trends described in the paper are the usage of ubiquitous sensors (anywhere), the provision and generation of information by citizens, and the convergence of real and virtual realities to convey understanding of environmental observations. The paper addresses the technical challenges in the Environmental Usage Area and the need for designing multi-style service oriented architecture. Key topics are the mapping of requirements to capabilities, providing scalability and robustness with implementing context aware information retrieval. Another essential research topic is handling data fusion and model based computation, and the related propagation of information uncertainty. Approaches to security, standardization and harmonization, all essential for sustainable solutions, are summarized from the perspective of the Environmental Usage Area. The paper concludes with an overview of emerging, high impact applications in the environmental areas concerning land ecosystems (biodiversity), air quality (atmospheric conditions) and water ecosystems (marine asset management)

    Metadata and ontologies for organizing students’ memories and learning: standards and convergence models for context awareness

    Get PDF
    Este artículo trata de las ontologías que sirven para la comprensión en contexto y la Gestión de la Información Personal (PIM)y su aplicabilidad al proyecto Memex Metadata(M2). M2 es un proyecto de investigación de la Universidad de Carolina del Norte en Chapel Hill para mejorar la memoria digital de los alumnos utilizando tablet PC, la tecnología SenseCam de Microsoft y otras tecnologías móviles(p.ej. un dispositivo de GPS) para capturar el contexto del aprendizaje. Este artículo presenta el proyecto M2, dicute el concepto de los portafolios digitales en las actuales tendencias educativas, relacionándolos con las tecnologías emergentes, revisa las ontologías relevantes y su relación con el proyecto CAF (Context Awareness Framework), y concluye identificando las líneas de investigación futuras.This paper focuses on ontologies supporting context awareness and Personal Information Management (PIM) and their applicability in Memex Metadata (M2) project. M2 is a research project of the University of North Carolina at Chapel Hill to improve student digital memories using the tablet PC, Microsoft’s SenseCam technology, and other mobile technologies (e.g., a GPS device) to capture context. The M2 project offers new opportunities studying students’ learning with digital technologies. This paper introduces the M2 project; discusses E-portfolios and current educational trends related to pervasive computing; reviews relevant ontologies and their relationship to the projects’ CAF (context awareness framework), and concludes by identifying future research directions

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Geospatial information infrastructures

    Get PDF
    Manual of Digital Earth / Editors: Huadong Guo, Michael F. Goodchild, Alessandro Annoni .- Springer, 2020 .- ISBN: 978-981-32-9915-3Geospatial information infrastructures (GIIs) provide the technological, semantic,organizationalandlegalstructurethatallowforthediscovery,sharing,and use of geospatial information (GI). In this chapter, we introduce the overall concept and surrounding notions such as geographic information systems (GIS) and spatial datainfrastructures(SDI).WeoutlinethehistoryofGIIsintermsoftheorganizational andtechnologicaldevelopmentsaswellasthecurrentstate-of-art,andreïŹ‚ectonsome of the central challenges and possible future trajectories. We focus on the tension betweenincreasedneedsforstandardizationandtheever-acceleratingtechnological changes. We conclude that GIIs evolved as a strong underpinning contribution to implementation of the Digital Earth vision. In the future, these infrastructures are challengedtobecomeïŹ‚exibleandrobustenoughtoabsorbandembracetechnological transformationsandtheaccompanyingsocietalandorganizationalimplications.With this contribution, we present the reader a comprehensive overview of the ïŹeld and a solid basis for reïŹ‚ections about future developments

    DRIVER Technology Watch Report

    Get PDF
    This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field

    Nonparametric Bayesian Modeling for Automated Database Schema Matching

    Full text link
    The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models
    • 

    corecore