4,541 research outputs found

    ArrayBridge: Interweaving declarative array processing with high-performance computing

    Full text link
    Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aims to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.Comment: 12 pages, 13 figure

    Integration of browser-to-browser architectures with third party legacy cloud storage

    Get PDF
    An increasing number of web applications run totally or partially in the client machines - from collaborative editing tools to multi-user games. Avoiding to continuously contact the server allows to reduce latency between clients and to minimize the load on the centralized component. Novel implementation techniques, such as WebRTC, allows to address network problems that previously prevented these systems to be deployed in practice. Legion is a newly developed framework that exploits these mechanisms, allowing client web applications to replicate data from servers, and synchronize these replicas directly among them. This work aims to extend the current Legion framework with the integration of an additional legacy storage system that can be used to support web applications, Antidote. We study the best way to support Legion’s data model into Antidote, we design a synchronization mechanism, and finally we measure the performance cost of such an integration

    Tools for distributed application management

    Get PDF
    Distributed application management consists of monitoring and controlling an application as it executes in a distributed environment. It encompasses such activities as configuration, initialization, performance monitoring, resource scheduling, and failure response. The Meta system (a collection of tools for constructing distributed application management software) is described. Meta provides the mechanism, while the programmer specifies the policy for application management. The policy is manifested as a control program which is a soft real-time reactive program. The underlying application is instrumented with a variety of built-in and user-defined sensors and actuators. These define the interface between the control program and the application. The control program also has access to a database describing the structure of the application and the characteristics of its environment. Some of the more difficult problems for application management occur when preexisting, nondistributed programs are integrated into a distributed application for which they may not have been intended. Meta allows management functions to be retrofitted to such programs with a minimum of effort

    Tools for distributed application management

    Get PDF
    Distributed application management consists of monitoring and controlling an application as it executes in a distributed environment. It encompasses such activities as configuration, initialization, performance monitoring, resource scheduling, and failure response. The Meta system is described: a collection of tools for constructing distributed application management software. Meta provides the mechanism, while the programmer specifies the policy for application management. The policy is manifested as a control program which is a soft real time reactive program. The underlying application is instrumented with a variety of built-in and user defined sensors and actuators. These define the interface between the control program and the application. The control program also has access to a database describing the structure of the application and the characteristics of its environment. Some of the more difficult problems for application management occur when pre-existing, nondistributed programs are integrated into a distributed application for which they may not have been intended. Meta allows management functions to be retrofitted to such programs with a minimum of effort

    Collaboration and end-user information management tools

    Get PDF
    Knowledge and information workers work as individuals within virtual team structures. Those teams may be small, or they may in fact form part of larger groups which continue to share similar objectives. Knowledge workers therefore acquire and manage personal and group information which they may choose to manage themselves using Personal and small-Group Information Management tools and techniques (PIM, GIM). This paper summarises some earlier research into PIM and GIM before addressing its objectives, which are to: * Consider the use of PIM / GIM techniques as the basis for collaboration within those virtual teams * Discuss collaboration and community involvement in the actual development of PIM/GIM tools * Identify the need for formalisms and abstraction skills in effective personal information management

    Indeterministic Handling of Uncertain Decisions in Duplicate Detection

    Get PDF
    In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present an indeterministic approach for handling uncertain decisions in a duplicate detection process by using a probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds can be modeled in the resulting data. This approach minimizes the negative impacts of false decisions. Furthermore, the duplicate detection process becomes almost fully automatic and human effort can be reduced to a large extent. Unfortunately, a full-indeterministic approach is by definition too expensive (in time as well as in storage) and hence impractical. For that reason, we additionally introduce several semi-indeterministic methods for heuristically reducing the set of indeterministic handled decisions in a meaningful way

    Effective Interpretation, Integration and Querying of Web Tables

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A Review of Relational Machine Learning for Knowledge Graphs

    Get PDF
    Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be “trained” on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two different kinds of statistical relational models, both of which can scale to massive datasets. The first is based on tensor factorization methods and related latent variable models. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. In particular, we discuss Google’s Knowledge Vault project.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216
    corecore