3 research outputs found

    LDM: Lineage-Aware Data Management in Multi-tier Storage Systems

    Get PDF
    We design and develop LDM, a novel data management solution to cater the needs of applications exhibiting the lineage property, i.e. in which the current writes are future reads. In such a class of applications, slow writes significantly hurt the over-all performance of jobs, i.e. current writes determine the fate of next reads. We believe that in a large scale shared production cluster, the issues associated due to data management can be mitigated at a way higher layer in the hierarchy of the I/O path, even before requests to data access are made. Contrary to the current solutions to data management which are mostly reactive and/or based on heuristics, LDM is both deterministic and pro-active. We develop block-graphs, which enable LDM to capture the complete time-based data-task dependency associations, therefore use it to perform life-cycle management through tiering of data blocks. LDM amalgamates the information from the entire data center ecosystem, right from the application code, to file system mappings, the compute and storage devices topology, etc. to make oracle-like deterministic data management decisions. With trace-driven experiments, LDM is able to achieve 29–52% reduction in over-all data center workload execution time. Moreover, by deploying LDM with extensive pre-processing creates efficient data consumption pipelines, which also reduces write and read delays significantly

    Hydrologic Information Systems: Advancing Cyberinfrastructure for Environmental Observatories

    Get PDF
    Recently, community initiatives have emerged for the establishment of large-scale environmental observatories. Cyberinfrastructure is the backbone upon which these observatories will be built, and scientists\u27 ability to access and use the data collected within observatories to address research questions will depend on the successful implementation of cyberinfrastructure. The research described in this dissertation advances the cyberinfrastructure available for supporting environmental observatories. This has been accomplished through both development of new cyberinfrastructure components as well as through the demonstration and application of existing tools, with a specific focus on point observations data. The cyberinfrastructure that was developed and deployed to support collection, management, analysis, and publication of data generated by an environmental sensor network in the Little Bear River environmental observatory test bed is described, as is the sensor network design and deployment. Results of several analyses that demonstrate how high-frequency data enable identification of trends and analysis of physical, chemical, and biological behavior that would be impossible using traditional, low-frequency monitoring data are presented. This dissertation also illustrates how the cyberinfrastructure components demonstrated in the Little Bear River test bed have been integrated into a data publication system that is now supporting a nationwide network of 11 environmental observatory test bed sites, as well as other research sites within and outside of the United States. Enhancements to the infrastructure for research and education that are enabled by this research are impacting a diverse community, including the national community of researchers involved with prospective Water and Environmental Research Systems (WATERS) Network environmental observatories as well as other observatory efforts, research watersheds, and test beds. The results of this research provide insight into and potential solutions for some of the bottlenecks associated with design and implementation of cyberinfrastructure for observatory support

    Qualitätskontrolle mittels semantischer Technologien in digitalen Bibliotheken

    Get PDF
    Controlled content quality especially in terms of indexing is one of the major ad-vantages of using digital libraries in contrast to general Web sources or Web search engines. Therefore, more and more digital libraries offer corpora related to a specialized domain. Beyond simple keyword based searches the resulting infor-mation systems often rely on entity centered searches. For being able to offer this kind of search, a high quality document processing is essential. However, considering today’s information flood the mostly manual effort in ac-quiring new sources and creating suitable (semantic) metadata for content indexing and retrieval is already prohibitive. A recent solution is given by automatic genera-tion of metadata, where mostly statistical techniques like e.g. document classifica-tion and entity extraction currently become more widespread. But in this case neglecting quality assurance is even more problematic, because heuristic genera-tion often fails and the resulting low-quality metadata will directly diminish the quality of service that a digital library provides. Thus, the quality assessment of information system’s metadata annotations used for subsequent querying of collections has to be enabled. In this thesis we discuss the importance of metadata quality assessment for information systems and the benefits gained from controlled and guaranteed quality.Eine kontrollierte Qualität der Metadaten ist einer der wichtigsten Vorteile bei der Verwendung von digitalen Bibliotheken im Vergleich zu Web Suchmaschinen. Auf diesen hochqualitativen Inhalten werden immer mehr fachspezifische Portale durch die digitalen Bibliotheken erzeugt. Die so entstehenden Informationssysteme bieten oftmals neben einer simplen Stichwortsuche auch Objekt zentrierte Suchen an. Um solch eine Objekt-Suche zu ermöglichen, ist aber eine hochqualitative Verarbeitung der zugrunde liegenden Dokumente notwendig. Betrachtet man hingegen die heutige Informationsflut, so stellt man fest, dass der Aufwand für eine manuelle Erschließung von neuen Quellen und die Erzeugung von (semantischen) Metadaten für die Indexierung schon heute unerschwinglich ist. Eine aktuelle Lösung für dieses Problem ist die zumeist automatische Erzeugung von (semantischen) Metadaten, durch statistische Methoden, wie die automatische Dokumenten Klassifizierung Entitäten Extraktion. Aber bei der Verwendung sol-cher Methoden ist die Vernachlässigung der Qualität noch problematischer, da eine heuristische Erzeugung oftmals fehlerbehaftet ist. Diese schlechte Qualität der so erzeugten Metadaten wird dabei direkt die Servicequalität einer digitalen Biblio-thek herabmindern. Somit muss eine Qualitätsbewertung der Metadaten garantiert werden. In dieser Arbeit diskutieren wir die Bedeutung von Metadaten Qualität für Digitale Bibliotheken und die Chancen die aus kontrollierter und garantierter Qua-lität gewonnen werden können
    corecore