799 research outputs found

    Using Ontologies for the Design of Data Warehouses

    Get PDF
    Obtaining an implementation of a data warehouse is a complex task that forces designers to acquire wide knowledge of the domain, thus requiring a high level of expertise and becoming it a prone-to-fail task. Based on our experience, we have detected a set of situations we have faced up with in real-world projects in which we believe that the use of ontologies will improve several aspects of the design of data warehouses. The aim of this article is to describe several shortcomings of current data warehouse design approaches and discuss the benefit of using ontologies to overcome them. This work is a starting point for discussing the convenience of using ontologies in data warehouse design.Comment: 15 pages, 2 figure

    A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings

    Get PDF
    abstract: Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy. In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases. Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Enrichment of the Phenotypic and Genotypic Data Warehouse analysis using Question Answering systems to facilitate the decision making process in cereal breeding programs

    Get PDF
    Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.This paper has been partially supported by the MESOLAP (TIN2010-14860) and GEODAS-BI (TIN2012-37493-C03-03) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)

    A Prototyped NL-Based Approach for the Design of Multidimensional Data Warehouse

    Get PDF
    Organizations are more and more interested in the Data Warehouse (DW) technology and data analytics to base their decision-making processes on scientific arguments instead of intuition. Despite the efforts invested, the DW design issue remains a great challenging research domain. The design quality of the DW depends on several aspects, as the requirement gathering. In this context, we propose a Natural Language (NL) based design approach, which is twofold, first, it facilitates the involvement of the decision-makers in the DW design process; indeed, NL can encourage the decision-makers to express their requirements as English-like sentences conform to NL-templates. Secondly, our approach aims to generate semi-automatically a DW schema from a set of requirements gathered as analytical queries compliant to the NL-templates. This design approach relies on (i) two easy-to-use NL-templates to specifying the analysis components, and (ii) a set of five heuristic rules for extracting the multidimensional concepts from the requirements. We demonstrate the feasibility of our approach by developing the prototype Natural Language Decisional Requirements to DW Schema (NLDR2DWS)

    Ontology-assisted database integration to support natural language processing and biomedical data-mining

    Get PDF
    Successful biomedical data mining and information extraction require a complete picture of biological phenomena such as genes, biological processes, and diseases; as these exist on different levels of granularity. To realize this goal, several freely available heterogeneous databases as well as proprietary structured datasets have to be integrated into a single global customizable scheme. We will present a tool to integrate different biological data sources by mapping them to a proprietary biomedical ontology that has been developed for the purposes of making computers understand medical natural language

    Word Sense Disambiguation for Ontology Learning

    Get PDF
    Ontology learning aims to automatically extract ontological concepts and relationships from related text repositories and is expected to be more efficient and scalable than manual ontology development. One of the challenging issues associated with ontology learning is word sense disambiguation (WSD). Most WSD research employs resources such as WordNet, text corpora, or a hybrid approach. Motivated by the large volume and richness of user-generated content in social media, this research explores the role of social media in ontology learning. Specifically, our approach exploits social media as a dynamic context rich data source for WSD. This paper presents a method and preliminary evidence for the efficacy of our proposed method for WSD. The research is in progress toward conducting a formal evaluation of the social media based method for WSD, and plans to incorporate the WSD routine into an ontology learning system in the future

    GEM: requirement-driven generation of ETL and multidimensional conceptual designs

    Get PDF
    Technical ReportAt the early stages of a data warehouse design project, the main objective is to collect the business requirements and needs, and translate them into an appropriate conceptual, multidimensional design. Typically, this task is performed manually, through a series of interviews involving two different parties: the business analysts and technical designers. Producing an appropriate conceptual design is an errorprone task that undergoes several rounds of reconciliation and redesigning, until the business needs are satisfied. It is of great importance for the business of an enterprise to facilitate and automate such a process. The goal of our research is to provide designers with a semi-automatic means for producing conceptual multidimensional designs and also, conceptual representation of the extract-transform-load (ETL)processes that orchestrate the data flow from the operational sources to the data warehouse constructs. In particular, we describe a method that combines information about the data sources along with the business requirements, for validating and completing –if necessary– these requirements, producing a multidimensional design, and identifying the ETL operations needed. We present our method in terms of the TPC-DS benchmark and show its applicability and usefulness.Preprin

    Explicitating semantics in Enterprise Information Systems Models

    Get PDF
    140 pages Report for the Post-Doctorate diploma of the Université Henri Poincaré Supervisors: Hervé Panetto and Alexis AubryInteroperability can be defined as the ability of two or more systems to share, to understand and to consume information (IEEE, 1990). The work (Chen et al., 2006) in the INTEROP NoE project has identified three different levels of barriers for interoperability: technical, conceptual and organisational. Our research focuses on the conceptual level of interoperability, namely the ability to understand the exchanged information. Information may be defined as data linked to knowledge about this data. This research memory will show the results obtained during the Post Doc study referring to the published works. It deals with a first phase from our general research work that focuses on the study of the semantic loss that appears in the exchange of information about business concepts. In order to quantify the semantic gap between interoperating ISs, their semantics needs to be enacted and structured by enriching, normalising and analysing their conceptual models. We propose a conceptualisation approach for explicitation of the finest-grained semantics, embedded into conceptual models in order to facilitate the semantic matching between two different information systems that have to interoperate. The structure of the document represents the different steps and the research domain on which the study focused
    • …
    corecore