11,099 research outputs found
Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management
Spreadsheet software is the tool of choice for interactive ad-hoc data
management, with adoption by billions of users. However, spreadsheets are not
scalable, unlike database systems. On the other hand, database systems, while
highly scalable, do not support interactivity as a first-class primitive. We
are developing DataSpread, to holistically integrate spreadsheets as a
front-end interface with databases as a back-end datastore, providing
scalability to spreadsheets, and interactivity to databases, an integration we
term presentational data management (PDM). In this paper, we make a first step
towards this vision: developing a storage engine for PDM, studying how to
flexibly represent spreadsheet data within a database and how to support and
maintain access by position. We first conduct an extensive survey of
spreadsheet use to motivate our functional requirements for a storage engine
for PDM. We develop a natural set of mechanisms for flexibly representing
spreadsheet data and demonstrate that identifying the optimal representation is
NP-Hard; however, we develop an efficient approach to identify the optimal
representation from an important and intuitive subclass of representations. We
extend our mechanisms with positional access mechanisms that don't suffer from
cascading update issues, leading to constant time access and modification
performance. We evaluate these representations on a workload of typical
spreadsheets and spreadsheet operations, providing up to 20% reduction in
storage, and up to 50% reduction in formula evaluation time
Using Visualization to Support Data Mining of Large Existing Databases
In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of approximate joins which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database
A survey on the use of relevance feedback for information access systems
Users of online search engines often find it difficult to express their need for information in the form of a query. However, if the user can identify examples of the kind of documents they require then they can employ a technique known as relevance feedback. Relevance feedback covers a range of techniques intended to improve a user's query and facilitate retrieval of information relevant to a user's information need. In this paper we survey relevance feedback techniques. We study both automatic techniques, in which the system modifies the user's query, and interactive techniques, in which the user has control over query modification. We also consider specific interfaces to relevance feedback systems and characteristics of searchers that can affect the use and success of relevance feedback systems
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
- …