13,889 research outputs found
Heterogeneous Relational Databases for a Grid-enabled Analysis Environment
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid
Set-oriented data mining in relational databases
Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud
\ud
In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases
XML content warehousing: Improving sociological studies of mailing lists and web data
In this paper, we present the guidelines for an XML-based approach for the
sociological study of Web data such as the analysis of mailing lists or
databases available online. The use of an XML warehouse is a flexible solution
for storing and processing this kind of data. We propose an implemented
solution and show possible applications with our case study of profiles of
experts involved in W3C standard-setting activity. We illustrate the
sociological use of semi-structured databases by presenting our XML Schema for
mailing-list warehousing. An XML Schema allows many adjunctions or crossings of
data sources, without modifying existing data sets, while allowing possible
structural evolution. We also show that the existence of hidden data implies
increased complexity for traditional SQL users. XML content warehousing allows
altogether exhaustive warehousing and recursive queries through contents, with
far less dependence on the initial storage. We finally present the possibility
of exporting the data stored in the warehouse to commonly-used advanced
software devoted to sociological analysis
Grid tool integration within the eMinerals Project
In this article we describe the eMinerals mini grid, which is now running in production mode. Thisis an integration of both compute and data components, the former build upon Condor, PBS and thefunctionality of Globus v2, and the latter being based on the combined use of the Storage ResourceBroker and the CCLRC data portal. We describe how we have integrated the middleware components,and the different facilities provided to the users for submitting jobs within such an environment. We willalso describe additional functionality we found it necessary to provide ourselves
Competing Technologies in the Database Management Systems Market
In this paper, we study the dynamics of the market for Database
Management Systems (DBMS), which is commonly assumed to possess network
effects and where there is still some viable competition in our study
period, 2000 – 2004. Specifically, we make use of a unique and
detailed dataset on several thousand UK firms to study individual
organizations’ incentives to adopt a particular technology. We
find that there are significant internal complement effects – in
other words, using an operating system and a DBMS from the same vendor
seems to confer some complementarities. We also find evidence for
complementarities between enterprise resource planning systems (ERP) and
DBMS and find that as ERP are frequently specific and customized, DBMS
are unlikely to be changed once they have been customized to an ERP. We
also find that organizations have an increasing tendency to use multiple
DBMS on one site, which contradicts the notion that different DBMS are
near-perfect substitutes
The use of hypermedia to increase the productivity of software development teams
Rapid progress in low-cost commercial PC-class multimedia workstation technology will potentially have a dramatic impact on the productivity of distributed work groups of 50-100 software developers. Hypermedia/multimedia involves the seamless integration in a graphical user interface (GUI) of a wide variety of data structures, including high-resolution graphics, maps, images, voice, and full-motion video. Hypermedia will normally require the manipulation of large dynamic files for which relational data base technology and SQL servers are essential. Basic machine architecture, special-purpose video boards, video equipment, optical memory, software needed for animation, network technology, and the anticipated increase in productivity that will result for the introduction of hypermedia technology are covered. It is suggested that the cost of the hardware and software to support an individual multimedia workstation will be on the order of $10,000
- …