7,258 research outputs found
SODA: Generating SQL for Business Users
The purpose of data warehouses is to enable business analysts to make better
decisions. Over the years the technology has matured and data warehouses have
become extremely successful. As a consequence, more and more data has been
added to the data warehouses and their schemas have become increasingly
complex. These systems still work great in order to generate pre-canned
reports. However, with their current complexity, they tend to be a poor match
for non tech-savvy business analysts who need answers to ad-hoc queries that
were not anticipated. This paper describes the design, implementation, and
experience of the SODA system (Search over DAta Warehouse). SODA bridges the
gap between the business needs of analysts and the technical complexity of
current data warehouses. SODA enables a Google-like search experience for data
warehouses by taking keyword queries of business users and automatically
generating executable SQL. The key idea is to use a graph pattern matching
algorithm that uses the metadata model of the data warehouse. Our results with
real data from a global player in the financial services industry show that
SODA produces queries with high precision and recall, and makes it much easier
for business users to interactively explore highly-complex data warehouses.Comment: VLDB201
Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.
Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)
Regional Data Archiving and Management for Northeast Illinois
This project studies the feasibility and implementation options for establishing a regional data archiving system to help monitor
and manage traffic operations and planning for the northeastern Illinois region. It aims to provide a clear guidance to the
regional transportation agencies, from both technical and business perspectives, about building such a comprehensive
transportation information system. Several implementation alternatives are identified and analyzed. This research is carried
out in three phases.
In the first phase, existing documents related to ITS deployments in the broader Chicago area are summarized, and a
thorough review is conducted of similar systems across the country. Various stakeholders are interviewed to collect
information on all data elements that they store, including the format, system, and granularity. Their perception of a data
archive system, such as potential benefits and costs, is also surveyed. In the second phase, a conceptual design of the
database is developed. This conceptual design includes system architecture, functional modules, user interfaces, and
examples of usage. In the last phase, the possible business models for the archive system to sustain itself are reviewed. We
estimate initial capital and recurring operational/maintenance costs for the system based on realistic information on the
hardware, software, labor, and resource requirements. We also identify possible revenue opportunities.
A few implementation options for the archive system are summarized in this report; namely:
1. System hosted by a partnering agency
2. System contracted to a university
3. System contracted to a national laboratory
4. System outsourced to a service provider
The costs, advantages and disadvantages for each of these recommended options are also provided.ICT-R27-22published or submitted for publicationis peer reviewe
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
On-line analytical processing
On-line analytical processing (OLAP) describes an approach to decision support, which aims to extract knowledge from a data warehouse, or more specifically, from data marts. Its main idea is providing navigation through data to non-expert users, so that they are able to interactively generate ad hoc queries without the intervention of IT professionals. This name was introduced in contrast to on-line transactional processing (OLTP), so that it reflected the different requirements and characteristics between these classes of uses. The concept falls in the area of business intelligence.Peer ReviewedPostprint (author's final draft
- …