Search CORE

552 research outputs found

Probabilistic models of information retrieval based on measuring the divergence from randomness

Author: Allan J.
Amati G.
Bookstein A.
Carpineto C.
Cornelis Joost Van Rijsbergen
Croft W.
Damerau F.
Gianni Amati
Harman D.
Harter S. P.
Harter S. P.
Lafferty J.
Margulis E.
Ponte J.
Robertson S.
Robertson S.
Robertson S. E.
Robertson S. E.
Solomonoff R.
Solomonoff R.
van Rijsbergen C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2002
Field of study

We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model

Crossref

Enlighten

Automated design of multidimensional clustering tables for relational databases

Author: B BHATTACHARJEE
S LIGHTSTONE
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Crossref

Automated design of multidimensional clustering tables for relational databases

Author: Agrawal
Bhattacharjee
Chaudhuri
Chaudhuri
Chaudhuri
Frank
Haas
Haas
Lightstone
Liou
Lohman
Padmanabhan
Rao
Rao
Schiefer
Stohr
Stöhr
Transaction Processing Performance Council
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Crossref

Recommended from our members

ReoptSMART: A Learning Query Plan Cache

Author: Fan Wei
Lohman Guy
Markl Volker
Rao Jun
Ross Kenneth A.
Stoyanovich Julia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

The task of query optimization in modern relational database systems is important but can be computationally expensive. Parametric query optimization(PQO) has as its goal the prediction of optimal query execution plans based on historical results, without consulting the query optimizer. We develop machine learning techniques that can accurately model the output of a query optimizer. Our algorithms handle non-linear boundaries in plan space and achieve high prediction accuracy even when a limited amount of data is available for training. We use both predicted and actual query execution times for learning, and are the first to demonstrate a total net win of a PQO method over a state-of-the-art query optimizer for some workloads. ReoptSMART realizes savings not only in optimization time, but also in query execution time, for an over-all improvement by more than an order of magnitude in some cases

Columbia University Academic Commons

Knowledge and Metadata Integration for Warehousing Complex Data

Author: Darmont Jérôme
Ralaivao Jean-Christian
Publication venue
Publication date: 01/01/2007
Field of study

With the ever-growing availability of so-called complex data, especially on the Web, decision-support systems such as data warehouses must store and process data that are not only numerical or symbolic. Warehousing and analyzing such data requires the joint exploitation of metadata and domain-related knowledge, which must thereby be integrated. In this paper, we survey the types of knowledge and metadata that are needed for managing complex data, discuss the issue of knowledge and metadata integration, and propose a CWM-compliant integration solution that we incorporate into an XML complex data warehousing framework we previously designed.Comment: 6th International Conference on Information Systems Technology and its Applications (ISTA 07), Kharkiv : Ukraine (2007

arXiv.org e-Print Archive

HAL Descartes

Remote fabrication of integrated circuits : software support for the M.I.T. computer aided fabrication environment

Author: Kwon Jimmy Y. (Jimmy Yongil)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1995
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 73-74).by Jimmy Y. Kwon.M.S

DSpace@MIT

Cracking the database store

Author: Kersten M.L. (Martin)
Manegold S. (Stefan)
Publication venue: CWI
Publication date: 01/01/2004
Field of study

Query performance strongly depends on finding an execution plan that touches as few superfluous tuples as possible. The access structures d

CWI's Institutional Repository

Evolutionary techniques for updating query cost models in a dynamic multidatabase environment

Author: Larson Per-Åke
Rahal Amira
Zhu Qiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2004
Field of study

Deriving local cost models for query optimization in a dynamic multidatabase system (MDBS) is a challenging issue. In this paper, we study how to evolve a query cost model to capture a slowly-changing dynamic MDBS environment so that the cost model is kept up-to-date all the time. Two novel evolutionary techniques, i.e., the shifting method and the block-moving method, are proposed. The former updates a cost model by taking up-to-date information from a new sample query into consideration at each step, while the latter considers a block (batch) of new sample queries at each step. The relevant issues, including derivation of recurrence updating formulas, development of efficient algorithms, analysis and comparison of complexities, and design of an integrated scheme to apply the two methods adaptively, are studied. Our theoretical and experimental results demonstrate that the proposed techniques are quite promising in maintaining accurate cost models efficiently for a slowly changing dynamic MDBS environment. Besides the application to MDBSs, the proposed techniques can also be applied to the automatic maintenance of cost models in self-managing database systems.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/47868/1/778_2003_Article_110.pd

Crossref

Deep Blue Documents at the University of Michigan