Search CORE

13,889 research outputs found

Towards Large-Scale Knowledge Discovery in Databases (KDD) by Exploiting Parallelism in Generic KDD Primitives

Author: Freitas Alex A.
Publication venue
Publication date: 01/07/1997
Field of study

Heterogeneous Relational Databases for a Grid-enabled Analysis Environment

Author: Ali Arshad
Anjum Ashiq
Azim Tahir
Bunn Julian
Iqbal Saima
McClatchey Richard
Newman Harvey
Shah S. Yousaf
Solomonides Tony
Steenberg Conrad
Thomas Michael
van Lingen Frank
Willers Ian
Publication venue
Publication date: 01/01/2005
Field of study

Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid

arXiv.org e-Print Archive

Caltech Authors

Set-oriented data mining in relational databases

Author: Houtsma Maurice
Swami Arun
Publication venue: North Holland
Publication date: 01/01/1995
Field of study

Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud \ud In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases

CiteSeerX

Crossref

University of Twente Research Information

XML content warehousing: Improving sociological studies of mailing lists and web data

Author: Colazzo Dario
Dudouet François-Xavier
Manolescu Ioana
Nguyen Benjamin
Senellart Pierre
Vion Antoine
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Crossref

INRIA a CCSD electronic archive server

HAL UVSQ

HAL-Rennes 1

Grid tool integration within the eMinerals Project

Author: Alexandrof V
Blanshard L
Brodholt J
Bruin R
Calleja M
Chapman C
Dove MT
Emmerich W
Kleese van Dam K
Thandavan A
Tyer R
Wilson P
Publication venue: UK Engineering and Physical Science Research Council
Publication date: 01/01/2004
Field of study

In this article we describe the eMinerals mini grid, which is now running in production mode. Thisis an integration of both compute and data components, the former build upon Condor, PBS and thefunctionality of Globus v2, and the latter being based on the combined use of the Storage ResourceBroker and the CCLRC data portal. We describe how we have integrated the middleware components,and the different facilities provided to the users for submitting jobs within such an environment. We willalso describe additional functionality we found it necessary to provide ourselves

CiteSeerX

UCL Discovery

Competing Technologies in the Database Management Systems Market

Author: Kretschmer Tobias - London School of Economics
Publication venue
Publication date: 01/01/2005
Field of study

In this paper, we study the dynamics of the market for Database Management Systems (DBMS), which is commonly assumed to possess network effects and where there is still some viable competition in our study period, 2000 – 2004. Specifically, we make use of a unique and detailed dataset on several thousand UK firms to study individual organizations’ incentives to adopt a particular technology. We find that there are significant internal complement effects – in other words, using an operating system and a DBMS from the same vendor seems to confer some complementarities. We also find evidence for complementarities between enterprise resource planning systems (ERP) and DBMS and find that as ERP are frequently specific and customized, DBMS are unlikely to be changed once they have been customized to an ERP. We also find that organizations have an increasing tendency to use multiple DBMS on one site, which contradicts the notion that different DBMS are near-perfect substitutes

New York University Faculty Digital Archive

GIS And Web-GIS, Commercial and Open Source Platforms: General Rules for Cultural Heritage Documentation

Author: Agosto Eros
Ardissone Paolo
Rinaudo Fulvio
Publication venue: ISRPS
Publication date: 01/01/2007
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

The use of hypermedia to increase the productivity of software development teams

Author: Coles L. Stephen
Publication venue
Publication date
Field of study

Rapid progress in low-cost commercial PC-class multimedia workstation technology will potentially have a dramatic impact on the productivity of distributed work groups of 50-100 software developers. Hypermedia/multimedia involves the seamless integration in a graphical user interface (GUI) of a wide variety of data structures, including high-resolution graphics, maps, images, voice, and full-motion video. Hypermedia will normally require the manipulation of large dynamic files for which relational data base technology and SQL servers are essential. Basic machine architecture, special-purpose video boards, video equipment, optical memory, software needed for animation, network technology, and the anticipated increase in productivity that will result for the introduction of hypermedia technology are covered. It is suggested that the cost of the hardware and software to support an individual multimedia workstation will be on the order of $10,000

NASA Technical Reports Server