137,528 research outputs found
Code Generation for Efficient Query Processing in Managed Runtimes
In this paper we examine opportunities arising from the conver-gence of two trends in data management: in-memory database sys-tems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mismatch ’ problem). Language-integrated query not only gives application developers a more convenient way to query external data sources like IMDBs, but also to use the same querying language to query an application’s in-memory collections. The lat-ter offers further transparency to developers as the query language and all data is represented in the data model of the host program-ming language. However, compared to IMDBs, this additional free-dom comes at a higher cost for query evaluation. Our vision is to improve in-memory query processing of application objects by introducing database technologies to managed runtimes. We focus on querying and we leverage query compilation to im-prove query processing on application objects. We explore dif-ferent query compilation strategies and study how they improve the performance of query processing over application data. We take C] as the host programming language as it supports language-integrated query through the LINQ framework. Our techniques de-liver significant performance improvements over the default LINQ implementation. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and IMDB-inspired runtimes to provide transparent and highly efficient querying. 1
Use of an object-based system with reasoning capabilities to integrate relational databases
The integration of heterogeneous and autonomous information sources
is a requirement for the new type of cooperative information systems.
In this paper we show the advantages of using a terminological system
for integrating pre-existing relational databases. From the resulting
integrated schema point of view, using · a terminological system allows
for the definition of semantically richer integrated schema. From the
integrated schema generation process point of view, the use of a terminological
system permits the definition of a more consistent, broad
and automatic process. Last, from the query processing point of view,
terminological systems provide interesting features for incorporating
semantic and caching query optimization techniques. The advantages
are presented in detail for each main step of the integration process:
translation, integration and query processing
Query processing of geometric objects with free form boundarie sin spatial databases
The increasing demand for the use of database systems as an integrating
factor in CAD/CAM applications has necessitated the development of database
systems with appropriate modelling and retrieval capabilities. One essential
problem is the treatment of geometric data which has led to the development of
spatial databases. Unfortunately, most proposals only deal with simple geometric
objects like multidimensional points and rectangles. On the other hand, there has
been a rapid development in the field of representing geometric objects with free
form curves or surfaces, initiated by engineering applications such as mechanical
engineering, aviation or astronautics. Therefore, we propose a concept for the realization
of spatial retrieval operations on geometric objects with free form
boundaries, such as B-spline or Bezier curves, which can easily be integrated in
a database management system. The key concept is the encapsulation of geometric
operations in a so-called query processor. First, this enables the definition of
an interface allowing the integration into the data model and the definition of the
query language of a database system for complex objects. Second, the approach
allows the use of an arbitrary representation of the geometric objects. After a
short description of the query processor, we propose some representations for free
form objects determined by B-spline or Bezier curves. The goal of efficient query
processing in a database environment is achieved using a combination of decomposition
techniques and spatial access methods. Finally, we present some experimental
results indicating that the performance of decomposition techniques is
clearly superior to traditional query processing strategies for geometric objects
with free form boundaries
An office document retrieval system with the capability of processing incomplete and vague queries
TEXPROS (TEXt PROcessing System) is an intelligent document processing system. The system is a combination of filing and retrieval systems, which supports storing, classifying, categorizing, retrieving and reproducing documents, as well as extracting, browsing, retrieving and synthesizing information from a variety of documents. This dissertation presents a retrieval system for TEXPROS, which is capable of processing incomplete or vague queries and providing semantically meaningful responses to the users. The design of the retrieval system is highly integrated with various mechanisms for achieving these goals. First, a system catalog including a thesaurus is used to store the knowledge about the database. Secondly, there is a query transformation mechanism which consists of context construction and algebraic query formulation modules. Given an incomplete query, the context construction module searches the system for the required terms and constructs a query that has a complete representation. The resulting query is then formulated into an algebraic query. Thirdly, in practice, the user may not have a precise notion of what he is looking for. A browsing mechanism is employed for such situations to assist the user in the retrieval process. With the browser, vague queries can be entered into the system until sufficient information is obtained to the extent that the user is able to construct a query for his request. Finally, when processing of queries responds with an empty answer to the user, a query generalization mechanism is used to give the user a cooperative explanation for the empty answer. The generalizations of any given failed queries (i.e., with an empty answer) are derived by applying both the folder and type substitutions and weakening the search criteria in the original query. An efficient way is investigated for determining whether the empty answer is genuine and whether the original query reflects erroneous presuppositions, and therefore answering any failed query with a meaningful and cooperative response. It incorporates with a methodical approach to reducing the search space of generalized subqueries by analyzing the results of executing the query generalization and by efficiently applying the possible substitutions in a query to generate a small subset of relevant subqueries which are to be evaluated
Techniques for improving efficiency and scalability for the integration of information retrieval and databases
PhDThis thesis is on the topic of integration of Information Retrieval (IR) and Databases (DB), with
particular focuses on improving efficiency and scalability of integrated IR and DB technology
(IR+DB). The main purpose of this study is to develop efficient and scalable techniques for
supporting integrated IR and DB technology, which is a popular approach today for handling
complex queries over text and structured data.
Our specific interest in this thesis is how to efficiently handle queries over large-scale text
and structured data. The work is based on a technology that integrates probability theory and
relational algebra, where retrievals for text and data are to be expressed in probabilistic logical
programs such as probabilistic relational algebra or probabilistic Datalog. To support efficient
processing of probabilistic logical programs, we proposed three optimization techniques
that focus on aspects covered logical and physical layers, which include: scoring-driven query
optimization using scoring expression, query processing with top-k incorporated pipeline, and
indexing with relational inverted index.
Specifically, scoring expressions are proposed for expressing the scoring or probabilistic semantics
of implied scoring functions of PRA expressions, so that efficient query execution plan
can be generated by rule-based scoring-driven optimizer. Secondly, to balance efficiency and
effectiveness so that to improve query response time, we studied methods for incorporating topk
algorithms into pipelined query execution engine for IR+DB systems. Thirdly, the proposed
relational inverted index integrates IR-style inverted index and DB-style tuple-based index, which
can be used to support efficient probability estimation and aggregation as well as conventional
relational operations.
Experiments were carried out to investigate the performances of proposed techniques. Experimental
results showed that the efficiency and scalability of an IR+DB prototype have been
improved, while the system can handle queries efficiently on considerable large data sets for a
number of IR tasks
Ontology Based Data Access in Statoil
Ontology Based Data Access (OBDA) is a prominent approach to query databases which uses an ontology to expose data in a conceptually clear manner by abstracting away from the technical schema-level details of the underlying data. The ontology is ‘connected’ to the data via mappings that allow to automatically translate queries posed over the ontology into data-level queries that can be executed by the underlying database management system. Despite a lot of attention from the research community, there are still few instances of real world industrial use of OBDA systems. In this work we present data access challenges in the data-intensive petroleum company Statoil and our experience in addressing these challenges with OBDA technology. In particular, we have developed a deployment module to create ontologies and mappings from relational databases in a semi-automatic fashion; a query processing module to perform and optimise the process of translating ontological queries into data queries and their execution over either a single DB of federated DBs; and a query formulation module to support query construction for engineers with a limited IT background. Our modules have been integrated in one OBDA system, deployed at Statoil, integrated with Statoil’s infrastructure, and evaluated with Statoil’s engineers and data
The role of expert systems in federated distributed multi-database systems/Ince Levent
A shared information system is a series of computer systems interconnected by some kind of communication network. There are data repositories residing on each computer. These data repositories must somehow be integrated. The purpose for using distributed and multi-database systems is to allow users to view collections of data repositories as if they were a single entity. Multidatabase systems, better known as heterogeneous multidatabase systems, are characterized by dissimilar data models, concurrency and optimization strategies and access methods. Unlike homogenous systems, the data models that compose the global database can be based on different types of data models. It is not necessary that all participant databases use the same data model. Federated distributed database systems are a special case of multidatabase systems. They are completely autonomous and do not rely on the global data dictionary to process distributed queries. Processing distributed query requests in federated databases is very difficult since there are multiple independent databases with their own rules for query optimization, deadlock detection, and concurrency. Expert systems can play a role in this type of environment by supplying a knowledge base that contains rules for data object conversion, rules for resolving naming conflicts, and rules for exchanging data.http://archive.org/details/theroleofexperts109459362Turkish Navy author.Approved for public release; distribution is unlimited
An Expressive Language and Efficient Execution System for Software Agents
Software agents can be used to automate many of the tedious, time-consuming
information processing tasks that humans currently have to complete manually.
However, to do so, agent plans must be capable of representing the myriad of
actions and control flows required to perform those tasks. In addition, since
these tasks can require integrating multiple sources of remote information ?
typically, a slow, I/O-bound process ? it is desirable to make execution as
efficient as possible. To address both of these needs, we present a flexible
software agent plan language and a highly parallel execution system that enable
the efficient execution of expressive agent plans. The plan language allows
complex tasks to be more easily expressed by providing a variety of operators
for flexibly processing the data as well as supporting subplans (for
modularity) and recursion (for indeterminate looping). The executor is based on
a streaming dataflow model of execution to maximize the amount of operator and
data parallelism possible at runtime. We have implemented both the language and
executor in a system called THESEUS. Our results from testing THESEUS show that
streaming dataflow execution can yield significant speedups over both
traditional serial (von Neumann) as well as non-streaming dataflow-style
execution that existing software and robot agent execution systems currently
support. In addition, we show how plans written in the language we present can
represent certain types of subtasks that cannot be accomplished using the
languages supported by network query engines. Finally, we demonstrate that the
increased expressivity of our plan language does not hamper performance;
specifically, we show how data can be integrated from multiple remote sources
just as efficiently using our architecture as is possible with a
state-of-the-art streaming-dataflow network query engine
- …