1,472 research outputs found
Enhancing XML Data Warehouse Query Performance by Fragmentation
XML data warehouses form an interesting basis for decision-support
applications that exploit heterogeneous data from multiple sources. However,
XML-native database systems currently suffer from limited performances in terms
of manageable data volume and response time for complex analytical queries.
Fragmenting and distributing XML data warehouses (e.g., on data grids) allow to
address both these issues. In this paper, we work on XML warehouse
fragmentation. In relational data warehouses, several studies recommend the use
of derived horizontal fragmentation. Hence, we propose to adapt it to the XML
context. We particularly focus on the initial horizontal fragmentation of
dimensions' XML documents and exploit two alternative algorithms. We
experimentally validate our proposal and compare these alternatives with
respect to a unified XML warehouse model we advocate for
Data Mining-based Fragmentation of XML Data Warehouses
With the multiplication of XML data sources, many XML data warehouse models
have been proposed to handle data heterogeneity and complexity in a way
relational data warehouses fail to achieve. However, XML-native database
systems currently suffer from limited performances, both in terms of manageable
data volume and response time. Fragmentation helps address both these issues.
Derived horizontal fragmentation is typically used in relational data
warehouses and can definitely be adapted to the XML context. However, the
number of fragments produced by classical algorithms is difficult to control.
In this paper, we propose the use of a k-means-based fragmentation approach
that allows to master the number of fragments through its parameter. We
experimentally compare its efficiency to classical derived horizontal
fragmentation algorithms adapted to XML data warehouses and show its
superiority
Supply Chain Infrastructure Restoration Calculator Software Tool -- Developer Guide and User Manual
This report describes a software tool that calculates costs associated with the reconstruction of supply chain interdependent critical infrastructure in the advent of a catastrophic failure by either outside forces (extreme events) or internal forces (fatigue). This tool fills a gap between search and recover strategies of the Federal Emergency Management Agency (or FEMA) and construction techniques under full recovery. In addition to overall construction costs, the tool calculates reconstruction needs in terms of personnel and their required support. From these estimates, total costs (or the cost of each element to be restored) can be calculated. Estimates are based upon historic reconstruction data, although decision managers do have the choice of entering their own input data to tailor the results to a local area
GenoQuery: a new querying module for functional annotation in a genomic warehouse
Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data
Regional Data Archiving and Management for Northeast Illinois
This project studies the feasibility and implementation options for establishing a regional data archiving system to help monitor
and manage traffic operations and planning for the northeastern Illinois region. It aims to provide a clear guidance to the
regional transportation agencies, from both technical and business perspectives, about building such a comprehensive
transportation information system. Several implementation alternatives are identified and analyzed. This research is carried
out in three phases.
In the first phase, existing documents related to ITS deployments in the broader Chicago area are summarized, and a
thorough review is conducted of similar systems across the country. Various stakeholders are interviewed to collect
information on all data elements that they store, including the format, system, and granularity. Their perception of a data
archive system, such as potential benefits and costs, is also surveyed. In the second phase, a conceptual design of the
database is developed. This conceptual design includes system architecture, functional modules, user interfaces, and
examples of usage. In the last phase, the possible business models for the archive system to sustain itself are reviewed. We
estimate initial capital and recurring operational/maintenance costs for the system based on realistic information on the
hardware, software, labor, and resource requirements. We also identify possible revenue opportunities.
A few implementation options for the archive system are summarized in this report; namely:
1. System hosted by a partnering agency
2. System contracted to a university
3. System contracted to a national laboratory
4. System outsourced to a service provider
The costs, advantages and disadvantages for each of these recommended options are also provided.ICT-R27-22published or submitted for publicationis peer reviewe
BioSilicoSystems - A Multipronged Approach Towards Analysis and Representation of Biological Data (PhD Thesis)
The rising field of integrative bioinformatics provides the vital methods to integrate, manage and also to analyze the diverse data and allows gaining new and deeper insights and a clear understanding of the intricate biological systems. The difficulty is not only to facilitate the study of heterogeneous data within the biological context, but it also more fundamental, how to represent and make the available knowledge accessible. Moreover, adding valuable information and functions that persuade the user to discover the interesting relations hidden within the data is, in itself, a great challenge. Also, the cumulative information can provide greater biological insight than is possible with individual information sources. Furthermore, the rapidly growing number of databases and data types poses the challenge of integrating the heterogeneous data types, especially in biology. This rapid increase in the volume and number of data resources drive for providing polymorphic views of the same data and often overlap in multiple resources. 

In this thesis a multi-pronged approach is proposed that deals with various methods for the analysis and representation of the diverse biological data which are present in different data sources. This is an effort to explain and emphasize on different concepts which are developed for the analysis of molecular data and also to explain its biological significance. The hypotheses proposed are in context with various other results and findings published in the past. The approach demonstrated also explains different ways to integrate the molecular data from various sources along with the need for a comprehensive understanding and clear projection of the concept or the algorithm and its results, but with simple means and methods. The multifarious approach proposed in this work comprises of different tools or methods spanning significant areas of bioinformatics research such as data integration, data visualization, biological network construction / reconstruction and alignment of biological pathways. Each tool deals with a unique approach to utilize the molecular data for different areas of biological research and is built based on the kernel of the thesis. Furthermore these methods are combined with graphical representation that make things simple and comprehensible and also helps to understand with ease the underlying biological complexity. Moreover the human eye is often used to and it is more comfortable with the visual representation of the facts
Big Data Harmonization – Challenges and Applications
As data grow, need for big data solution gets increased day by day. Concept of data harmonization exist since two decades. As data is to be collected from various heterogeneous sources and techniques of data harmonization allow them to be in a single format at same place it is also called data warehouse. Lot of advancement occurred to analyses historical data by using data warehousing. Innovations uncover the challenges and problems faced by data warehousing every now and then. When the volume and variety of data gets increased exponentially, existing tools might not support the OLAP operations by traditional warehouse approach. In this paper we tried to focus on the research being done in the field of big data warehouse category wise. Research issues and proposed approaches on various kind of dataset is shown. Challenges and advantages of using data warehouse before data mining task are also explained in detail
- …