283 research outputs found

    Archiving scientific data

    Get PDF
    We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools

    XSQ: Streaming XPath Queries

    Get PDF
    We describe the design and implementation of XSQ, a system for evaluating XPath 1.0 queries on streaming XML data. Each XML element in the input data is presented to the system only once in a serial order determined by the data source. It is not possible to seek forward or backward in the data stream, and data cannot be recalled unless explicitly buffered by the system. Processing XPath queries correctly and efficiently in this environment is a challenging task and, to the best of our knowledge, XSQ is the first system that efficiently implements XPath queries with features such as closures and multiple predicates. XSQ is efficient in both time and space. Stream query processing typically adds only 25% to the time required for parsing the stream (and discarding results). XSQ's space usage is optimal in the sense that it buffers only data that must be buffered by all streaming query processors. We describe the formal framework of hierarchical pushdown transducers that forms the basis of the XSQ system and highlight experimental results on real and synthetic data. (Also UMIACS-TR-2002-81

    XSQ: A Streaming XPath Engine

    Get PDF
    We have implemented and released the XSQ system for evaluating XPath queries on streaming XML data. XSQ supports XPath features such as multiple predicates, closures, and aggregation, which pose interesting challenges for streaming evaluation. Our implementation is based on using a hierarchical arrangement of pushdown transducers augmented with buffers. A notable feature of XSQ is that it buffers data for only as long as it must be buffered by any streaming XPath query engine. We present a detailed experimental study that characterizes the performance of XSQ and related systems, and illustrates the performance implications of XPath features such as closures. (UMIACS-TR-2003-62

    Context-Sensitive Search and Exploration of XML Text

    Get PDF
    XML permits documents with arbitrary nested context (tag structure). We investigate how this context may be used to aid the task of searching and exploring XML text. We describe the design and implementation of the Cextor system, which includes a context-sensitive text-search engine and a novel technique for organizing and exploring very large search results based on context. A distinguishing feature of this technique is that it does not assume search results are of modest size. Rather, it is designed to cope with search results that are potentially the size of the database. We present the results of an experimental evaluation of Cextor on derived data from the Web. (Cross-referenced as UMIACS-TR-2001-12

    MRI: Acquisition of a High Performance Cluster for the University of Maine Scientific Grid Portal

    Get PDF
    This project, acquiring a cluster to establish a scientific grid portal in Maine, aims to enable projects requiring large datasets. The work makes available to the wider community results such as widely-used whole-ice sheet models, tools for climate change research, prototype versions of object-based caching system (bundled with MPI-IO implementation developed at Argonne National Lab), the data management system, real-time animations, videos, etc. Additionally, the portal provides the larger community the compute power, storage capacity, and rendering engine to execute very high-resolution models, and receive animations and other visualized information in real time.Broader Impact: The infrastructure enhances understanding of global issues and contributes in the development of educational tools for K-12 students. The scientific grid portal contributes in the dissemination of important scientific discoveries. The portal also provides a show-case for research being performed in the state

    Efficient Peer-to-Peer Namespace Searches

    Get PDF
    In this paper we describe new methods for efficient and exact search (keyword and full-text) in distributed namespaces. Our methods can be used in conjunction with existing distributed lookup schemes, such as Distributed Hash Tables, and distributed directories. We describe how indexes for implementing distributed searches can be efficiently created, located, and stored. We describe techniques for creating approximate indexes that can be used to bound the space requirement at individual hosts; such techniques are particularly useful for full-text searches that may require a very large number of individual indexes to be created and maintained. Our methods use a new distributed data structure called the view tree. View trees can be used to efficiently cache and locate results from prior queries. We describe how view trees are created, and maintained. We present experimental results, using large namespaces and realistic data, showing that the techniques introduced in this paper can reduce search overheads (both network and processing costs) by more than an order of magnitude. (UMIACS-TR-2004-13

    Preferential survival in models of complex ad hoc networks

    Full text link
    There has been a rich interplay in recent years between (i) empirical investigations of real world dynamic networks, (ii) analytical modeling of the microscopic mechanisms that drive the emergence of such networks, and (iii) harnessing of these mechanisms to either manipulate existing networks, or engineer new networks for specific tasks. We continue in this vein, and study the deletion phenomenon in the web by following two different sets of web-sites (each comprising more than 150,000 pages) over a one-year period. Empirical data show that there is a significant deletion component in the underlying web networks, but the deletion process is not uniform. This motivates us to introduce a new mechanism of preferential survival (PS), where nodes are removed according to a degree-dependent deletion kernel. We use the mean-field rate equation approach to study a general dynamic model driven by Preferential Attachment (PA), Double PA (DPA), and a tunable PS, where c nodes (c<1) are deleted per node added to the network, and verify our predictions via large-scale simulations. One of our results shows that, unlike in the case of uniform deletion, the PS kernel when coupled with the standard PA mechanism, can lead to heavy-tailed power law networks even in the presence of extreme turnover in the network. Moreover, a weak DPA mechanism, coupled with PS, can help make the network even more heavy-tailed, especially in the limit when deletion and insertion rates are almost equal, and the overall network growth is minimal. The dynamics reported in this work can be used to design and engineer stable ad hoc networks and explain the stability of the power law exponents observed in real-world networks.Comment: 9 pages, 6 figure

    A formal model based on Game Theory for the analysis of cooperation in distributed service discovery

    Get PDF
    New systems can be designed, developed, and managed as societies of agents that interact with each other by o↵ering and providing services. These systems can be viewed as complex networks where nodes are bounded rational agents. In order to deal with complex goals, agents must cooperate with other agents to be able to locate the required services. The aim of this paper is to formally and empirically analyze under what circumstances cooperation emerges in decentralized search for services. We propose a repeated game model that formalizes the interactions among agents in a search process where each agent has the freedom to choose whether or not to cooperate with other agents. Agents make decisions based on the cost of their actions and the expected reward if they participate by forwarding queries in a search process that ends successfully. We propose a strategy that is based on random-walks, and we study under what conditions the strategy is a Nash Equilibrium. We performed several experiments in order to evaluate the model and the strategy and to analyze which network structures are the most appropriate for promoting cooperation
    corecore