Search CORE

283 research outputs found

Archiving scientific data

Author: Altinel M.
Avilla-Campillo I.
Buneman P.
Chawathe S. S.
Chawathe S. S.
Chien S.
Cobena G.
Diao Y.
Keishi Tajima
Marian A.
Peter Buneman
Sanjeev Khanna
Schmidt A. R.
Tufte K.
Wang-Chiew Tan
Publication venue
Publication date: 01/01/2002
Field of study

We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools

CiteSeerX

Crossref

Edinburgh Research Explorer

ScholarlyCommons@Penn

XSQ: Streaming XPath Queries

Author: Chawathe Sudarshan S.
Peng Feng
Publication venue
Publication date: 18/09/2002
Field of study

We describe the design and implementation of XSQ, a system for evaluating XPath 1.0 queries on streaming XML data. Each XML element in the input data is presented to the system only once in a serial order determined by the data source. It is not possible to seek forward or backward in the data stream, and data cannot be recalled unless explicitly buffered by the system. Processing XPath queries correctly and efficiently in this environment is a challenging task and, to the best of our knowledge, XSQ is the first system that efficiently implements XPath queries with features such as closures and multiple predicates. XSQ is efficient in both time and space. Stream query processing typically adds only 25% to the time required for parsing the stream (and discarding results). XSQ's space usage is optimal in the sense that it buffers only data that must be buffered by all streaming query processors. We describe the formal framework of hierarchical pushdown transducers that forms the basis of the XSQ system and highlight experimental results on real and synthetic data. (Also UMIACS-TR-2002-81

Digital Repository at the University of Maryland

XSQ: A Streaming XPath Engine

Author: Chawathe Sudarshan S.
Peng Feng
Publication venue
Publication date: 01/08/2003
Field of study

We have implemented and released the XSQ system for evaluating XPath queries on streaming XML data. XSQ supports XPath features such as multiple predicates, closures, and aggregation, which pose interesting challenges for streaming evaluation. Our implementation is based on using a hierarchical arrangement of pushdown transducers augmented with buffers. A notable feature of XSQ is that it buffers data for only as long as it must be buffered by any streaming XPath query engine. We present a detailed experimental study that characterizes the performance of XSQ and related systems, and illustrates the performance implications of XPath features such as closures. (UMIACS-TR-2003-62

Digital Repository at the University of Maryland

Context-Sensitive Search and Exploration of XML Text

Author: Baby Thomas
Chawathe Sudarshan S.
Publication venue
Publication date: 10/05/2001
Field of study

XML permits documents with arbitrary nested context (tag structure). We investigate how this context may be used to aid the task of searching and exploring XML text. We describe the design and implementation of the Cextor system, which includes a context-sensitive text-search engine and a novel technique for organizing and exploring very large search results based on context. A distinguishing feature of this technique is that it does not assume search results are of modest size. Rather, it is designed to cope with search results that are potentially the size of the database. We present the results of an experimental evaluation of Cextor on derived data from the Web. (Cross-referenced as UMIACS-TR-2001-12

Digital Repository at the University of Maryland

MRI: Acquisition of a High Performance Cluster for the University of Maine Scientific Grid Portal

Author: Chawathe Sudarshan S.
Dickens Phillip M.
Fastook James L.
Segee Bruce E.
Zhu Yifeng
Publication venue: DigitalCommons@UMaine
Publication date: 07/12/2009
Field of study

This project, acquiring a cluster to establish a scientific grid portal in Maine, aims to enable projects requiring large datasets. The work makes available to the wider community results such as widely-used whole-ice sheet models, tools for climate change research, prototype versions of object-based caching system (bundled with MPI-IO implementation developed at Argonne National Lab), the data management system, real-time animations, videos, etc. Additionally, the portal provides the larger community the compute power, storage capacity, and rendering engine to execute very high-resolution models, and receive animations and other visualized information in real time.Broader Impact: The infrastructure enhances understanding of global issues and contributes in the development of educational tools for K-12 students. The scientific grid portal contributes in the dissemination of important scientific discoveries. The portal also provides a show-case for research being performed in the state

University of Maine

Efficient Peer-to-Peer Namespace Searches

Author: Bhattacharjee Bobby
Chawathe Sudarshan
Gopalakrishnan Vijay
Keleher Pete
Publication venue
Publication date: 19/04/2004
Field of study

In this paper we describe new methods for efficient and exact search (keyword and full-text) in distributed namespaces. Our methods can be used in conjunction with existing distributed lookup schemes, such as Distributed Hash Tables, and distributed directories. We describe how indexes for implementing distributed searches can be efficiently created, located, and stored. We describe techniques for creating approximate indexes that can be used to bound the space requirement at individual hosts; such techniques are particularly useful for full-text searches that may require a very large number of individual indexes to be created and maintained. Our methods use a new distributed data structure called the view tree. View trees can be used to efficiently cache and locate results from prior queries. We describe how view trees are created, and maintained. We present experimental results, using large namespaces and realistic data, showing that the techniques introduced in this paper can reduce search overheads (both network and processing costs) by more than an order of magnitude. (UMIACS-TR-2004-13

Digital Repository at the University of Maryland

Preferential survival in models of complex ad hoc networks

Author: Adamic
Adamic
Albert
Barabasi
Bianconi
Borgs
Bornholdt
Chawathe
Chen
Cho
Chung
Cooper
Deng
Dorogovtsev
Geng
Gomes
Gulli
Huberman
Jeong
Joseph S. Kong
Kleinberg
Krapivsky
Kumar
Moore
Motwani
Newman
Pennock
Sarshar
Sarshar
Sarshar
Simon
Vwani P. Roychowdhury
Vázquez
Willis
Wouhaybi
Publication venue: 'Elsevier BV'
Publication date: 21/11/2007
Field of study

There has been a rich interplay in recent years between (i) empirical investigations of real world dynamic networks, (ii) analytical modeling of the microscopic mechanisms that drive the emergence of such networks, and (iii) harnessing of these mechanisms to either manipulate existing networks, or engineer new networks for specific tasks. We continue in this vein, and study the deletion phenomenon in the web by following two different sets of web-sites (each comprising more than 150,000 pages) over a one-year period. Empirical data show that there is a significant deletion component in the underlying web networks, but the deletion process is not uniform. This motivates us to introduce a new mechanism of preferential survival (PS), where nodes are removed according to a degree-dependent deletion kernel. We use the mean-field rate equation approach to study a general dynamic model driven by Preferential Attachment (PA), Double PA (DPA), and a tunable PS, where c nodes (c<1) are deleted per node added to the network, and verify our predictions via large-scale simulations. One of our results shows that, unlike in the case of uniform deletion, the PS kernel when coupled with the standard PA mechanism, can lead to heavy-tailed power law networks even in the presence of extreme turnover in the network. Moreover, a weak DPA mechanism, coupled with PS, can help make the network even more heavy-tailed, especially in the limit when deletion and insertion rates are almost equal, and the overall network growth is minimal. The dynamics reported in this work can be used to design and engineer stable ad hoc networks and explain the stability of the power law exponents observed in real-world networks.Comment: 9 pages, 6 figure

arXiv.org e-Print Archive

Crossref

A formal model based on Game Theory for the analysis of cooperation in distributed service discovery

Author: Backstrom
Ban
Cajueiro
Chawathe
Costa
de Weerdt
Del Val
Del Val
Del Val
El-Azouzi
Elena Del Val
Fersi
Fronczak
Fudenberg
Gkantsidis
Gu
Guillem Martínez-Cánovas
Hofmann
Janzadeh
Jaramillo
Kleinberg
Lee
Li
Lopes
Luck
Lv
MacKenzie
Miguel Rebollo
Newman
Newman
Noh
Noh
Ohtsuki
Penélope Hernández
Pu
Pujol
Ramzan
Sierra
Srivastava
Tseng
Vicente Botti
Yang
Zhang
Zhong
Zhou
Publication venue
Publication date: 01/01/2016
Field of study

New systems can be designed, developed, and managed as societies of agents that interact with each other by o↵ering and providing services. These systems can be viewed as complex networks where nodes are bounded rational agents. In order to deal with complex goals, agents must cooperate with other agents to be able to locate the required services. The aim of this paper is to formally and empirically analyze under what circumstances cooperation emerges in decentralized search for services. We propose a repeated game model that formalizes the interactions among agents in a search process where each agent has the freedom to choose whether or not to cooperate with other agents. Agents make decisions based on the cost of their actions and the expected reward if they participate by forwarding queries in a search process that ends successfully. We propose a strategy that is based on random-walks, and we study under what conditions the strategy is a Nash Equilibrium. We performed several experiments in order to evaluate the model and the strategy and to analyze which network structures are the most appropriate for promoting cooperation

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

RiuNet

A methodology for clustering XML documents by structure

Author: Abiteboul
Abiteboul
Carmel
Chawathe
Chawathe
Cobena
Cormen
Dalamagas
Direen
Flesca
Fuhr
Garcia-Molina
Garofalakis
Goldman
Gower
Halkidi
Hearst
Hubert
Jardine
Klaas-Jan Winkel
Lewie
Liu
Milligan
Myers
Nierman
Papakonstantinou
Polyzotis
Rasmussen
Sankoff
Selkow
Shanmugasundaram
Tai
Tang
Tao Cheng
Theodore Dalamagas
Timos Sellis
van Rijsbergen
Wagner
Wang
Wilson
Yang
Yoon
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref