810 research outputs found

    BlogForever: D3.1 Preservation Strategy Report

    Get PDF
    This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design

    Proceedings of the 12th International Conference on Digital Preservation

    Get PDF
    The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase

    Proceedings of the 12th International Conference on Digital Preservation

    Get PDF
    The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase

    Ontology based data warehousing for mining of heterogeneous and multidimensional data sources

    Get PDF
    Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals

    Toward Large Scale Semantic Image Understanding and Retrieval

    Get PDF
    Semantic image retrieval is a multifaceted, highly complex problem. Not only does the solution to this problem require advanced image processing and computer vision techniques, but it also requires knowledge beyond what can be inferred from the image content alone. In contrast, traditional image retrieval systems are based upon keyword searches on filenames or metadata tags, e.g. Google image search, Flickr search, etc. These conventional systems do not analyze the image content and their keywords are not guaranteed to represent the image. Thus, there is significant need for a semantic image retrieval system that can analyze and retrieve images based upon the content and relationships that exist in the real world.In this thesis, I present a framework that moves towards advancing semantic image retrieval in large scale datasets. At a conceptual level, semantic image retrieval requires the following steps: viewing an image, understanding the content of the image, indexing the important aspects of the image, connecting the image concepts to the real world, and finally retrieving the images based upon the index concepts or related concepts. My proposed framework addresses each of these components in my ultimate goal of improving image retrieval. The first task is the essential task of understanding the content of an image. Unfortunately, typically the only data used by a computer algorithm when analyzing images is the low-level pixel data. But, to achieve human level comprehension, a machine must overcome the semantic gap, or disparity that exists between the image data and human understanding. This translation of the low-level information into a high-level representation is an extremely difficult problem that requires more than the image pixel information. I describe my solution to this problem through the use of an online knowledge acquisition and storage system. This system utilizes the extensible, visual, and interactable properties of Scalable Vector Graphics (SVG) combined with online crowd sourcing tools to collect high level knowledge about visual content.I further describe the utilization of knowledge and semantic data for image understanding. Specifically, I seek to incorporate knowledge in various algorithms that cannot be inferred from the image pixels alone. This information comes from related images or structured data (in the form of hierarchies and ontologies) to improve the performance of object detection and image segmentation tasks. These understanding tasks are crucial intermediate steps towards retrieval and semantic understanding. However, the typical object detection and segmentation tasks requires an abundance of training data for machine learning algorithms. The prior training information provides information on what patterns and visual features the algorithm should be looking for when processing an image. In contrast, my algorithm utilizes related semantic images to extract the visual properties of an object and also to decrease the search space of my detection algorithm. Furthermore, I demonstrate the use of related images in the image segmentation process. Again, without the use of prior training data, I present a method for foreground object segmentation by finding the shared area that exists in a set of images. I demonstrate the effectiveness of my method on structured image datasets that have defined relationships between classes i.e. parent-child, or sibling classes.Finally, I introduce my framework for semantic image retrieval. I enhance the proposed knowledge acquisition and image understanding techniques with semantic knowledge through linked data and web semantic languages. This is an essential step in semantic image retrieval. For example, a car class classified by an image processing algorithm not enhanced by external knowledge would have no idea that a car is a type of vehicle which would also be highly related to a truck and less related to other transportation methods like a train . However, a query for modes of human transportation should return all of the mentioned classes. Thus, I demonstrate how to integrate information from both image processing algorithms and semantic knowledge bases to perform interesting queries that would otherwise be impossible. The key component of this system is a novel property reasoner that is able to translate low level image features into semantically relevant object properties. I use a combination of XML based languages such as SVG, RDF, and OWL in order to link to existing ontologies available on the web. My experiments demonstrate an efficient data collection framework and novel utilization of semantic data for image analysis and retrieval on datasets of people and landmarks collected from sources such as IMDB and Flickr. Ultimately, my thesis presents improvements to the state of the art in visual knowledge representation/acquisition and computer vision algorithms such as detection and segmentation toward the goal of enhanced semantic image retrieval

    Sixth Goddard Conference on Mass Storage Systems and Technologies Held in Cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems

    Get PDF
    This document contains copies of those technical papers received in time for publication prior to the Sixth Goddard Conference on Mass Storage Systems and Technologies which is being held in cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems at the University of Maryland-University College Inn and Conference Center March 23-26, 1998. As one of an ongoing series, this Conference continues to provide a forum for discussion of issues relevant to the management of large volumes of data. The Conference encourages all interested organizations to discuss long term mass storage requirements and experiences in fielding solutions. Emphasis is on current and future practical solutions addressing issues in data management, storage systems and media, data acquisition, long term retention of data, and data distribution. This year's discussion topics include architecture, tape optimization, new technology, performance, standards, site reports, vendor solutions. Tutorials will be available on shared file systems, file system backups, data mining, and the dynamics of obsolescence

    ACEWATER2 Regional database: hydro-climatology data-analysis

    Get PDF
    The report presents the architecture of a regional hydro-climatology information system, developed in the framework of the ACEWATER2 project, in order to support effective organization of information. Information includes both freely available large and regional scale data sources, as well as databases compiled by the CoEs (Centers of Excellence) and submitted as part of their scientific undertakings. The information system builds upon and specializes the JRC knowledge sharing platform Aquaknow (https://aquaknow.jrc.ec.europa.eu/), including: • at the system core, a relational database; its schema has been designed to store both detailed metadata and, where relevant (avoiding duplication of information otherwise accessible), data themselves. Metadata include, among others, datasets extended description, spatial extent, temporal frequency, reference Institutions/authors, credits and limitations, web links to access original data and/or any further documentation. Data can be stored as public or private, depending upon confidentialy and sharing policies; • user friendly facilities, supporting the end user in efficiently browsing, querying, uploading and downloading information (metadata and data). System access is limited to accredited audience, via password authentication. Dedicated groups for the three ACEWATER CoE networks (Western, Southern and Central-Eastern Africa) have been setup and scientists invited to register. Currently the system is operational and we submitted databases documented and, depending upon confidentiality and authorization issues, also stored. A general review and classification of freely available information at continenal, regional and local scale of interest to ACEWATER2 project, and particularly to selected study areas (Senegal, Gambia and Niger; Zambezi; Blue Nile and Lake Victoria), have been completed. Metadata and, where relevant, data themselves have been stored to the information system database. Information submitted by the CoE (a continuous ongoing process) is migrated to the database as well, depending upon sharing authorization and/or limitations. The report also documents the ongoing scientific research at JRC on climate variability analysis based on L-Moments statistics. In particular maps of estimated precipitation deficit for different return periods at the river basins of interest are presented and included in the database.JRC.D.2-Water and Marine Resource

    Resource Sharing for Multi-Tenant Nosql Data Store in Cloud

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Multi-tenancy hosting of users in cloud NoSQL data stores is favored by cloud providers because it enables resource sharing at low operating cost. Multi-tenancy takes several forms depending on whether the back-end file system is a local file system (LFS) or a parallel file system (PFS), and on whether tenants are independent or share data across tenants In this thesis I focus on and propose solutions to two cases: independent data-local file system, and shared data-parallel file system. In the independent data-local file system case, resource contention occurs under certain conditions in Cassandra and HBase, two state-of-the-art NoSQL stores, causing performance degradation for one tenant by another. We investigate the interference and propose two approaches. The first provides a scheduling scheme that can approximate resource consumption, adapt to workload dynamics and work in a distributed fashion. The second introduces a workload-aware resource reservation approach to prevent interference. The approach relies on a performance model obtained offline and plans the reservation according to different workload resource demands. Results show the approaches together can prevent interference and adapt to dynamic workloads under multi-tenancy. In the shared data-parallel file system case, it has been shown that running a distributed NoSQL store over PFS for shared data across tenants is not cost effective. Overheads are introduced due to the unawareness of the NoSQL store of PFS. This dissertation targets the key-value store (KVS), a specific form of NoSQL stores, and proposes a lightweight KVS over a parallel file system to improve efficiency. The solution is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not designed for. Results show the proposed system outperforms Cassandra and Voldemort in several different workloads
    • …
    corecore