146 research outputs found

    Integrating data warehouses with web data : a survey

    Get PDF
    This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research line

    Knowledge visualizations: a tool to achieve optimized operational decision making and data integration

    Get PDF
    The overabundance of data created by modern information systems (IS) has led to a breakdown in cognitive decision-making. Without authoritative source data, commanders’ decision-making processes are hindered as they attempt to paint an accurate shared operational picture (SOP). Further impeding the decision-making process is the lack of proper interface interaction to provide a visualization that aids in the extraction of the most relevant and accurate data. Utilizing the DSS to present visualizations based on OLAP cube integrated data allow decision-makers to rapidly glean information and build their situation awareness (SA). This yields a competitive advantage to the organization while in garrison or in combat. Additionally, OLAP cube data integration enables analysis to be performed on an organization’s data-flows. This analysis is used to identify the critical path of data throughout the organization. Linking a decision-maker to the authoritative data along this critical path eliminates the many decision layers in a hierarchal command structure that can introduce latency or error into the decision-making process. Furthermore, the organization has an integrated SOP from which to rapidly build SA, and make effective and efficient decisions.http://archive.org/details/knowledgevisuali1094545877Outstanding ThesisOutstanding ThesisMajor, United States Marine CorpsCaptain, United States Marine CorpsApproved for public release; distribution is unlimited

    Enrichment of the Phenotypic and Genotypic Data Warehouse analysis using Question Answering systems to facilitate the decision making process in cereal breeding programs

    Get PDF
    Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.This paper has been partially supported by the MESOLAP (TIN2010-14860) and GEODAS-BI (TIN2012-37493-C03-03) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)

    Context-aware OLAP for textual data warehouses

    Get PDF
    Decision Support Systems (DSS) that leverage business intelligence are based on numerical data and On-line Analytical Processing (OLAP) is often used to implement it. However, business decisions are increasingly dependent on textual data as well. Existing research work on textual data warehouses has the limitation of capturing contextual relationships when comparing only strongly related documents. This paper proposes an Information System (IS) based context-aware model that uses word embedding in conjunction with agglomerative hierarchical clustering algorithms to dynamically categorize documents in order to form the concept hierarchy. The results of the experimental evaluation provide evidence of the effectiveness of integrating textual data into a data warehouse and improving decision making through various OLAP operations

    A new multidimensional model with text dimensions: definition and implementation

    Get PDF
    We present a new multidimensional model with textual dimensions based on a knowledge structure extracted from the texts, where any textual attribute in a database can be processed, and not only XML texts. This dimension allows to treat the textual data in the same way as the non-textual one in an automatic way, without user’s intervention, so all the classical operations in the multidimensional model can been defined for this textual dimension. While most of the models dealing with texts that can be found in the literature are not implemented, in this proposal, the multidimensional model and the OLAP system have been implemented in a software tool, so it can be tested on real data. A case study with medical data is included in this work.Junta de Andalucia P07-TIC02786 P10-TIC6109 P11-TIC746

    Where are we headed in business analytics? A framework based on a paradigmatic analysis of the history of analytics

    Get PDF
    The explosion of interest in business analytics (BA) comes with multiple problems. With as many as eleven distinct disciplines teaching analytics, it is not clear which areas of study constitute the BA field. If the information systems (IS) field is to exert a significant influence in analytics, what the IS researcher and practitioner need to focus on has to be made clear. Using a paradigmatic historiographical analysis of the field of analytics this study provides evidence for the bifurcation of analytics into data science and BA as founding disciplines of computer science, mathematics and statistics, machine learning and IS contribute to the analytics movement. The results from this analysis also identify a set of conceptual foundations for BA that takes advantage of both the intellectual strengths of the IS field without sacrificing the necessary depth of data science

    Implementation of a Linked Open Data Solution for the Statistics Agency of Cantabria's Metadata and Data Bank

    Get PDF
    Statistics is a fundamental piece inside the Open Government philosophy, being a basic tool for citizens to know and make informed decisions about the society in which they participate. Due to the great number of organizations and agencies that collect, process and publish statistical data all over the world, several standards and methodologies for information exchange have been created in recent years in order to improve interoperability between data producers and consumers, of which SDMX is one of the most renowned examples. Despite having been developed independently of this, the global Semantic Web effort (backed mainly by the W3C-driven Linked Open Data initiatives) presents itself as an extremely useful tool for publishing both completely contextualized metadata and data, therefore making them easily understandable and processable by third parties. This report details the changes made to the IT systems of the Statistical Agency of Cantabria (Instituto Cántabro de Estadística, ICANE) with the purpose of implementing a Linked Open Data solution for its website and statistical data bank, making all data and metadata published by this Agency available not only to humans, but to automatized consumers, too. Multiple standards, recommendations and vocabularies were used for this task, ranging from Dublin Core metadata RDFa tagging, through the creation of several SKOS concept schemes, to providing statistical data using the RDF Data Cube vocabulary
    • …
    corecore