13,203 research outputs found

    On distributed data processing in data grid architecture for a virtual repository

    Get PDF
    The article describes the problem of integration of distributed, heterogeneous and fragmented collections of data with application of the virtual repository and the data grid concept. The technology involves: wrappers enveloping external resources, a virtual network (based on the peer-topeer technology) responsible for integration of data into one global schema and a distributed index for speeding-up data retrieval. Authors present a method for obtaining data from heterogeneously structured external databases and then a procedure of integration the data to one, commonly available, global schema. The core of the described solution is based on the Stack-Based Query Language (SBQL) and virtual updatable SBQL views. The system transport and indexing layer is based on the P2P architecture

    Peer-to-peer systems for simple and flexible information sharing

    Get PDF
    Includes abstract.Includes bibliographical references (leaves 76-80).Peer to peer computing (P2P) is an architecture that enables applications to access shared resources, with peers having similar capabilities and responsibilities. The ubiquity of P2P computing and its increasing adoption for a decentralized data sharing mechanism have fueled my research interests. P2P networks are useful for sharing content files containing audio, video, and data. This research aims to address the problem of simple and flexible access to data from a variety of data sources across peers with different operating systems, databases and hardware. The proposed architecture makes use of SQL queries, web services, heterogeneous database servers and XML data transformation for the peer to peer data sharing prototype. SQL queries and web services provide a data sharing mechanism that allows both simple and flexible data access

    Connected Information Management

    Get PDF
    Society is currently inundated with more information than ever, making efficient management a necessity. Alas, most of current information management suffers from several levels of disconnectedness: Applications partition data into segregated islands, small notes don’t fit into traditional application categories, navigating the data is different for each kind of data; data is either available at a certain computer or only online, but rarely both. Connected information management (CoIM) is an approach to information management that avoids these ways of disconnectedness. The core idea of CoIM is to keep all information in a central repository, with generic means for organization such as tagging. The heterogeneity of data is taken into account by offering specialized editors. The central repository eliminates the islands of application-specific data and is formally grounded by a CoIM model. The foundation for structured data is an RDF repository. The RDF editing meta-model (REMM) enables form-based editing of this data, similar to database applications such as MS access. Further kinds of data are supported by extending RDF, as follows. Wiki text is stored as RDF and can both contain structured text and be combined with structured data. Files are also supported by the CoIM model and are kept externally. Notes can be quickly captured and annotated with meta-data. Generic means for organization and navigation apply to all kinds of data. Ubiquitous availability of data is ensured via two CoIM implementations, the web application HYENA/Web and the desktop application HYENA/Eclipse. All data can be synchronized between these applications. The applications were used to validate the CoIM ideas

    Efficient Indexing for Structured and Unstructured Data

    Get PDF
    The collection of digital data is growing at an exponential rate. Data originates from wide range of data sources such as text feeds, biological sequencers, internet traffic over routers, through sensors and many other sources. To mine intelligent information from these sources, users have to query the data. Indexing techniques aim to reduce the query time by preprocessing the data. Diversity of data sources in real world makes it imperative to develop application specific indexing solutions based on the data to be queried. Data can be structured i.e., relational tables or unstructured i.e., free text. Moreover, increasingly many applications need to seamlessly analyze both kinds of data making data integration a central issue. Integrating text with structured data needs to account for missing values, errors in the data etc. Probabilistic models have been proposed recently for this purpose. These models are also useful for applications where uncertainty is inherent in data e.g. sensor networks. This dissertation aims to propose efficient indexing solutions for several problems that lie at the intersection of database and information retrieval such as joining ranked inputs, full-text documents searching etc. Other well-known problems of ranked retrieval and pattern matching are also studied under probabilistic settings. For each problem, the worst-case theoretical bounds of the proposed solutions are established and/or their practicality is demonstrated by thorough experimentation

    A survey on tree matching and XML retrieval

    Get PDF
    International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval

    An infrastructure for delivering geospatial data to field users

    Get PDF
    Federal agencies collect and analyze data to carry out their missions. A significant portion of these activities requires geospatial data collection in the field. Models for computer-assisted survey information collection are still largely based on the client-server paradigm with symbolic data representation. Little attention has been given to digital geospatial information resources, or emerging mobile computing environments. This paper discusses an infrastructure designs for delivering geospatial data users in a mobile field computing environment. Mobile field computing environments vary widely, and generally offer extremely limited computing resources, visual display, and bandwidth relative to the usual resources required for distributed geospatial data. Key to handling heterogeneity in the field is an infrastructure design that provides flexibility in the location of computing tasks and returns information in forms appropriate for the field computing environment. A view agent based infrastructure has been developed with several components. Wrappers are used for encapsulating not only the data sources, but the mobile field environment as well, localizing the details associated with heterogeneity in data sources and field environments. Within the boundaries of the wrappers, mediators and object-oriented views implemented as mobile agents work in a relatively homogeneous environment to generate query results. Mediators receive a request from the user application via the field wrapper, and generate a sequence of mobile view agents to search for, retrieve, and process data. The internal infrastructure environment is populated with computation servers to provide a location for processing, especially for combining data from multiple locations. Each computation server has a local object-oriented data warehouse equipped with a set of data warehouse tools for working with geospatial data. Since the prospect of query reuse is likely for a field worker, we store the final and intermediate results in the data warehouse, allowing the warehouse to act as an active cache. Even when field computing capacity is ample, the warehouse is used to process data so that network traffic can be minimized

    EXODuS: Exploratory OLAP over Document Stores

    Get PDF
    OLAP has been extensively used for a couple of decades as a data analysis approach to support decision making on enterprise structured data. Now, with the wide diffusion of NoSQL databases holding semi-structured data, there is a growing need for enabling OLAP on document stores as well, to allow non-expert users to get new insights and make better decisions. Unfortunately, due to their schemaless nature, document stores are hardly accessible via direct OLAP querying. In this paper we propose EXODuS, an interactive, schema-on-read approach to enable OLAP querying of document stores in the context of self-service BI and exploratory OLAP. To discover multidimensional hierarchies in document stores we adopt a data-driven approach based on the mining of approximate functional dependencies; to ensure good performances, we incrementally build local portions of hierarchies for the levels involved in the current user query. Users execute an analysis session by expressing well-formed multidimensional queries related by OLAP operations; these queries are then translated into the native query language of MongoDB, one of the most popular document-based DBMS. An experimental evaluation on real-world datasets shows the efficiency of our approach and its compatibility with a real-time setting
    • 

    corecore