146 research outputs found
Integrating data warehouses with web data : a survey
This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML
technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper
reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces
the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for
XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of
information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover
the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research
line
Knowledge visualizations: a tool to achieve optimized operational decision making and data integration
The overabundance of data created by modern information systems (IS) has led to a breakdown in cognitive decision-making. Without authoritative source data, commanders’ decision-making processes are hindered as they attempt to paint an accurate shared operational picture (SOP). Further impeding the decision-making process is the lack of proper interface interaction to provide a visualization that aids in the extraction of the most relevant and accurate data. Utilizing the DSS to present visualizations based on OLAP cube integrated data allow decision-makers to rapidly glean information and build their situation awareness (SA). This yields a competitive advantage to the organization while in garrison or in combat. Additionally, OLAP cube data integration enables analysis to be performed on an organization’s data-flows. This analysis is used to identify the critical path of data throughout the organization. Linking a decision-maker to the authoritative data along this critical path eliminates the many decision layers in a hierarchal command structure that can introduce latency or error into the decision-making process. Furthermore, the organization has an integrated SOP from which to rapidly build SA, and make effective and efficient decisions.http://archive.org/details/knowledgevisuali1094545877Outstanding ThesisOutstanding ThesisMajor, United States Marine CorpsCaptain, United States Marine CorpsApproved for public release; distribution is unlimited
Enrichment of the Phenotypic and Genotypic Data Warehouse analysis using Question Answering systems to facilitate the decision making process in cereal breeding programs
Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.This paper has been partially supported by the MESOLAP (TIN2010-14860) and GEODAS-BI (TIN2012-37493-C03-03) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)
Context-aware OLAP for textual data warehouses
Decision Support Systems (DSS) that leverage business intelligence are based on numerical data and On-line Analytical Processing (OLAP) is often used to implement it. However, business decisions are increasingly dependent on textual data as well. Existing research work on textual data warehouses has the limitation of capturing contextual relationships when comparing only strongly related documents. This paper proposes an Information System (IS) based context-aware model that uses word embedding in conjunction with agglomerative hierarchical clustering algorithms to dynamically categorize documents in order to form the concept hierarchy. The results of the experimental evaluation provide evidence of the effectiveness of integrating textual data into a data warehouse and improving decision making through various OLAP operations
A new multidimensional model with text dimensions: definition and implementation
We present a new multidimensional model with textual dimensions based on a knowledge structure extracted
from the texts, where any textual attribute in a database can be processed, and not only XML texts.
This dimension allows to treat the textual data in the same way as the non-textual one in an automatic
way, without user’s intervention, so all the classical operations in the multidimensional model can been
defined for this textual dimension. While most of the models dealing with texts that can be found in the
literature are not implemented, in this proposal, the multidimensional model and the OLAP system have
been implemented in a software tool, so it can be tested on real data. A case study with medical data is
included in this work.Junta de Andalucia P07-TIC02786
P10-TIC6109
P11-TIC746
Where are we headed in business analytics? A framework based on a paradigmatic analysis of the history of analytics
The explosion of interest in business analytics (BA) comes with multiple problems. With as many as eleven distinct disciplines teaching analytics, it is not clear which areas of study constitute the BA field. If the information systems (IS) field is to exert a significant influence in analytics, what the IS researcher and practitioner need to focus on has to be made clear. Using a paradigmatic historiographical analysis of the field of analytics this study provides evidence for the bifurcation of analytics into data science and BA as founding disciplines of computer science, mathematics and statistics, machine learning and IS contribute to the analytics movement. The results from this analysis also identify a set of conceptual foundations for BA that takes advantage of both the intellectual strengths of the IS field without sacrificing the necessary depth of data science
Implementation of a Linked Open Data Solution for the Statistics Agency of Cantabria's Metadata and Data Bank
Statistics is a fundamental piece inside the Open Government philosophy,
being a basic tool for citizens to know and make informed decisions about the society in which
they participate. Due to the great number of organizations and agencies that collect, process
and publish statistical data all over the world, several standards and methodologies for
information exchange have been created in recent years in order to improve interoperability
between data producers and consumers, of which SDMX is one of the most renowned examples.
Despite having been developed independently of this, the global Semantic Web effort (backed
mainly by the W3C-driven Linked Open Data initiatives) presents itself as an extremely useful
tool for publishing both completely contextualized metadata and data, therefore making them
easily understandable and processable by third parties. This report details the changes made
to the IT systems of the Statistical Agency of Cantabria (Instituto Cántabro de EstadĂstica,
ICANE) with the purpose of implementing a Linked Open Data solution for its website and
statistical data bank, making all data and metadata published by this Agency available not
only to humans, but to automatized consumers, too. Multiple standards, recommendations and
vocabularies were used for this task, ranging from Dublin Core metadata RDFa tagging, through
the creation of several SKOS concept schemes, to providing statistical data using the RDF
Data Cube vocabulary
- …