1,895 research outputs found
A Join Index for XML Data Warehouses
XML data warehouses form an interesting basis for decision-support
applications that exploit complex data. However, native-XML database management
systems (DBMSs) currently bear limited performances and it is necessary to
research for ways to optimize them. In this paper, we propose a new join index
that is specifically adapted to the multidimensional architecture of XML
warehouses. It eliminates join operations while preserving the information
contained in the original warehouse. A theoretical study and experimental
results demonstrate the efficiency of our join index. They also show that
native XML DBMSs can compete with XML-compatible, relational DBMSs when
warehousing and analyzing XML data.Comment: 2008 International Conference on Information Resources Management
(Conf-IRM 08), Niagra Falls : Canada (2008
XWeB: the XML Warehouse Benchmark
With the emergence of XML as a standard for representing business data, new
decision support applications are being developed. These XML data warehouses
aim at supporting On-Line Analytical Processing (OLAP) operations that
manipulate irregular XML data. To ensure feasibility of these new tools,
important performance issues must be addressed. Performance is customarily
assessed with the help of benchmarks. However, decision support benchmarks do
not currently support XML features. In this paper, we introduce the XML
Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from
the relational decision support benchmark TPC-H. It is mainly composed of a
test data warehouse that is based on a unified reference model for XML
warehouses and that features XML-specific structures, and its associate XQuery
decision support workload. XWeB's usage is illustrated by experiments on
several XML database management systems
SODA: Generating SQL for Business Users
The purpose of data warehouses is to enable business analysts to make better
decisions. Over the years the technology has matured and data warehouses have
become extremely successful. As a consequence, more and more data has been
added to the data warehouses and their schemas have become increasingly
complex. These systems still work great in order to generate pre-canned
reports. However, with their current complexity, they tend to be a poor match
for non tech-savvy business analysts who need answers to ad-hoc queries that
were not anticipated. This paper describes the design, implementation, and
experience of the SODA system (Search over DAta Warehouse). SODA bridges the
gap between the business needs of analysts and the technical complexity of
current data warehouses. SODA enables a Google-like search experience for data
warehouses by taking keyword queries of business users and automatically
generating executable SQL. The key idea is to use a graph pattern matching
algorithm that uses the metadata model of the data warehouse. Our results with
real data from a global player in the financial services industry show that
SODA produces queries with high precision and recall, and makes it much easier
for business users to interactively explore highly-complex data warehouses.Comment: VLDB201
Modeling views in the layered view model for XML using UML
In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction
Extracting, Transforming and Archiving Scientific Data
It is becoming common to archive research datasets that are not only large
but also numerous. In addition, their corresponding metadata and the software
required to analyse or display them need to be archived. Yet the manual
curation of research data can be difficult and expensive, particularly in very
large digital repositories, hence the importance of models and tools for
automating digital curation tasks. The automation of these tasks faces three
major challenges: (1) research data and data sources are highly heterogeneous,
(2) future research needs are difficult to anticipate, (3) data is hard to
index. To address these problems, we propose the Extract, Transform and Archive
(ETA) model for managing and mechanizing the curation of research data.
Specifically, we propose a scalable strategy for addressing the research-data
problem, ranging from the extraction of legacy data to its long-term storage.
We review some existing solutions and propose novel avenues of research.Comment: 8 pages, Fourth Workshop on Very Large Digital Libraries, 201
- …