Search CORE

300 research outputs found

Data Cube Approximation and Mining using Probabilistic Modeling

Author: Boujenoui Ameur
Goutte Cyril
Missaoui Rokia
Publication venue
Publication date: 01/01/2007
Field of study

On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches

OLAP over Probabilistic Data Cubes II:Parallel Materialization and Extended Aggregates

Author: Hao X.
Jin Peiquan
Pedersen T. B.
Xie X.
Yang W.
Zou K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2020
Field of study

VBN

Ontology Based Statistical Automated Inference - New Approach to Artificial Intelligence

Author: Borkowski Wlodzimierz
Mielniczuk Hanna
Publication venue: 'Lifescience Global'
Publication date: 20/12/2012
Field of study

Statistical analysis requires understanding the nature of the phenomenon under study, as well as understanding sense of mathematical statistics. Bridging the gap between semantic web based on knowledge representation languages, and concepts described by mathematical formula is a challenge for AI. In order to overcome this gap the ontology language P-ONT (based on directed graph) has been invented. To illustrate the capabilities of the P-ONT language, semantic web (built on the P-ONT ontology) OLAP cube, relational data bases and generalized hierarchical statistical regression models are presented

A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy. In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases. Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.Dissertation/ThesisDoctoral Dissertation Computer Science 201