13,889 research outputs found

    Heterogeneous Relational Databases for a Grid-enabled Analysis Environment

    Get PDF
    Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid

    Set-oriented data mining in relational databases

    Get PDF
    Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud \ud In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases

    XML content warehousing: Improving sociological studies of mailing lists and web data

    Get PDF
    In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

    Grid tool integration within the eMinerals Project

    Get PDF
    In this article we describe the eMinerals mini grid, which is now running in production mode. Thisis an integration of both compute and data components, the former build upon Condor, PBS and thefunctionality of Globus v2, and the latter being based on the combined use of the Storage ResourceBroker and the CCLRC data portal. We describe how we have integrated the middleware components,and the different facilities provided to the users for submitting jobs within such an environment. We willalso describe additional functionality we found it necessary to provide ourselves

    Competing Technologies in the Database Management Systems Market

    Get PDF
    In this paper, we study the dynamics of the market for Database Management Systems (DBMS), which is commonly assumed to possess network effects and where there is still some viable competition in our study period, 2000 – 2004. Specifically, we make use of a unique and detailed dataset on several thousand UK firms to study individual organizations’ incentives to adopt a particular technology. We find that there are significant internal complement effects – in other words, using an operating system and a DBMS from the same vendor seems to confer some complementarities. We also find evidence for complementarities between enterprise resource planning systems (ERP) and DBMS and find that as ERP are frequently specific and customized, DBMS are unlikely to be changed once they have been customized to an ERP. We also find that organizations have an increasing tendency to use multiple DBMS on one site, which contradicts the notion that different DBMS are near-perfect substitutes

    The use of hypermedia to increase the productivity of software development teams

    Get PDF
    Rapid progress in low-cost commercial PC-class multimedia workstation technology will potentially have a dramatic impact on the productivity of distributed work groups of 50-100 software developers. Hypermedia/multimedia involves the seamless integration in a graphical user interface (GUI) of a wide variety of data structures, including high-resolution graphics, maps, images, voice, and full-motion video. Hypermedia will normally require the manipulation of large dynamic files for which relational data base technology and SQL servers are essential. Basic machine architecture, special-purpose video boards, video equipment, optical memory, software needed for animation, network technology, and the anticipated increase in productivity that will result for the introduction of hypermedia technology are covered. It is suggested that the cost of the hardware and software to support an individual multimedia workstation will be on the order of $10,000
    corecore