440 research outputs found

    Context-aware OLAP for textual data warehouses

    Get PDF
    Decision Support Systems (DSS) that leverage business intelligence are based on numerical data and On-line Analytical Processing (OLAP) is often used to implement it. However, business decisions are increasingly dependent on textual data as well. Existing research work on textual data warehouses has the limitation of capturing contextual relationships when comparing only strongly related documents. This paper proposes an Information System (IS) based context-aware model that uses word embedding in conjunction with agglomerative hierarchical clustering algorithms to dynamically categorize documents in order to form the concept hierarchy. The results of the experimental evaluation provide evidence of the effectiveness of integrating textual data into a data warehouse and improving decision making through various OLAP operations

    A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings

    Get PDF
    abstract: Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy. In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases. Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing

    Get PDF
    This thesis presents several state-of-the-art approaches constructed for the purpose of (i) studying the trustworthiness of users in Online Social Network platforms, (ii) deriving concealed knowledge from their textual content, and (iii) classifying and predicting the domain knowledge of users and their content. The developed approaches are refined through proof-of-concept experiments, several benchmark comparisons, and appropriate and rigorous evaluation metrics to verify and validate their effectiveness and efficiency, and hence, those of the applied frameworks

    Dynamic topic herarchies and segmented rankings in textual OLAP technology.

    Get PDF
    Programa de P?s-Gradua??o em Ci?ncia da Computa??o. Departamento de Ci?ncia da Computa??o, Instituto de Ci?ncias Exatas e Biol?gicas, Universidade Federal de Ouro Preto.A tecnologia OLAP tem se consolidado h? 20 anos e recentemente foi redesenhada para que suas dimens?es, hierarquias e medidas possam suportar as particularidades dos dados textuais. A tarefa de organizar dados textuais de forma hier?rquica pode ser resolvida com a constru??o de hierarquias de t?picos. Atualmente, a hierarquia de t?picos ? definida apenas uma vez no cubo de dados, ou seja, para todo o \textit{lattice} de cuboides. No entanto, tal hierarquia ? sens?vel ao conte?do da cole??o de documentos, portanto em um mesmo cubo de dados podem existir c?lulas com conte?dos completamente diferentes, agregando cole??es de documentos distintas, provocando potenciais altera??es na hierarquia de t?picos. Al?m disso, o segmento de texto utilizado na an?lise OLAP tamb?m influencia diretamente nos t?picos elencados por tal hierarquia. Neste trabalho, apresentamos um cubo de dados textual com m?ltiplas e din?micas hierarquias de t?picos. M?ltiplas por serem constru?das a partir de diferentes segmentos de texto e din?micas por serem constru?das para cada c?lula do cubo. Outra contribui??o deste trabalho refere-se ? resposta das consultas multidimensionais. O estado da arte normalmente retorna os top-k documentos mais relevantes para um determinado t?pico. Vamos al?m disso, retornando outros segmentos de texto, como os t?tulos mais significativos, resumos e par?grafos. A abordagem ? projetada em quatro etapas adicionais, onde cada passo atenua um pouco mais o impacto da constru??o de v?rias hierarquias de t?picos e rankings de segmentos por c?lula de cubo. Experimentos que utilizam parte dos documentos da DBLP como uma cole??o de documentos refor?am nossas hip?teses.The OLAP technology emerged 20 years ago and recently has been redesigned so that its dimensions, hierarchies and measures can support the particularities of textual data. Organizing textual data hierarchically can be solved with topic hierarchies. Currently, the topic hierarchy is de ned only once in the data cube, e.g., forthe entire lattice of cubo ids. However, such hierarchy is sensitive to the document collection content. Thus, a data cube cell can contain a collection of documents distinct fromothers in the same cube, causing potential changes in the topic hierarchy. Further more, the text segment used in OLAP analysis also changes this hierarchy. In this work, we present a textual data cube with multiple dynamic topic hierarchies for each cube cell. Multiple hierarchies, since the presented approach builds a topic hierarchy per text segment. Another contribution of this work refers to query response. The state-of-the-art normally returns the top-k documents to the topic selected in the query. We go beyond by returning other text segments, such as the most signi cant titles, abstracts and paragraphs. The approach is designed in four complementary steps and each step attenuates a bit more the impact of building multiple to pic hierarchies and segmented rankings per cube cell. Experiments using part of the DBLP papers as a document collection reinforce our hypotheses

    Enhancing Business Intelligence Quality with Visualization: An Experiment on Stakeholder Network Analysis

    Get PDF
    Business intelligence (BI) has gained a strategic importance in today’s global competitive environment. However, high-quality BI is not easy to obtain on the Web due to information overload and difficulty to present complicated relationships among various types of business stakeholders. Unfortunately, existing BI tools lack the capability of analyzing and visualizing such relationships and research on BI systems is sparse. In this paper, we review the current market of BI tools and related research, describe an approach to support the development of tools that provide high-quality BI, and report the findings of a user evaluation study of the prototype developed based on the proposed approach. The approach combines information visualization and Web mining techniques with human knowledge to enable business analysts to analyze and visualize complicated business stakeholder relationships. Results of an experiment involving 62 subjects show that the prototype significantly outperformed a traditional method of BI analysis in terms of efficiency, quality of BI, and user satisfaction. The subjects provided favorable comments and expressed strong preferences toward the prototype in most applications. This research contributes to advancing BI research and to providing new empirical findings for BI systems evaluation. Available at: https://aisel.aisnet.org/pajais/vol1/iss1/9
    corecore