Search CORE

24 research outputs found

Data warehouse structuring methodologies for efficient mining of Western Australian petroleum data sources

Author: Dreher Heinz
Nimmagadda Shastri
Rudra Amit
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Representing the knowledge domain of a petroleum system is a complex problem. In the present study, logical modelling of shared attributes of resources industry entities (dimensions or objects) has been used for construction of a dynamic and time-variant metadata model. This work demonstrates effectiveness of multidimensional data modelling for petroleum industry, which will be further investigated for fine-grain data presentation and interpretation for quality knowledge discovery

CiteSeerX

espace@Curtin

A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy. In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases. Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Ontology based data warehousing for mining of heterogeneous and multidimensional data sources

Author: Nimmagadda Shastri Lakshman
Publication venue: Curtin University
Publication date: 01/01/2015
Field of study

Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals

espace@Curtin

Reordering Columns for Smaller Indexes

Author: Abadi
Alber
Anantha
Anh
Antoshenkov
Aouiche
Barnard
Bassiouni
Bhattacharjee
Cai
Chen
Daniel Lemire
Dehne
Eavis
Engene
Faloutsos
Fang
Flahive
Flahive
Garey
Golomb
Haddadi
Hamilton
Haverkort
Holloway
Holloway
Kamel
Kaser
Lemire
Lemke
Moffat
Moffat
Ng
Niedermeier
Owen Kaser
Peano
Pinar
Richards
Savage
Scholer
Vo
Witten
Wu
Zobel
Publication venue: 'Elsevier BV'
Publication date: 22/02/2011
Field of study

Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right order before sorting can reduce the number of runs by a factor of two or more. Unfortunately, determining the best column order is NP-hard. For many cases, we prove that the number of runs in table columns is minimized if we sort columns by increasing cardinality. Experimentally, sorting based on Hilbert space-filling curves is poor at minimizing the number of runs.Comment: to appear in Information Science

arXiv.org e-Print Archive

R-libre

Crossref

IDEAS-1997-2021-Final-Programs

Author: Desai Bipin C.
Publication venue
Publication date: 31/08/2021
Field of study

This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

Concordia University Research Repository

31th International Conference on Information Modelling and Knowledge Bases

Author
Publication venue: Universitatsbibliothek Kiel
Publication date: 01/01/2021
Field of study

Information modelling is becoming more and more important topic for researchers, designers, and users of information systems.The amount and complexity of information itself, the number of abstractionlevels of information, and the size of databases and knowledge bases arecontinuously growing. Conceptual modelling is one of the sub-areas ofinformation modelling. The aim of this conference is to bring together experts from different areas of computer science and other disciplines, who have a common interest in understanding and solving problems on information modelling and knowledge bases, as well as applying the results of research to practice. We also aim to recognize and study new areas on modelling and knowledge bases to which more attention should be paid. Therefore philosophy and logic, cognitive science, knowledge management, linguistics and management science are relevant areas, too. In the conference, there will be three categories of presentations, i.e. full papers, short papers and position papers

MACAU: Open Access Repository of Kiel University

Six papers on computational methods for the analysis of structured and unstructured data in the economic domain

Author: NICOLA GIANCARLO
Publication venue: Università degli studi di Pavia
Publication date: 10/06/2019
Field of study

This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common goal of promoting the stability of the financial system: systemic risk and bank supervision. The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor bank news and deep learning for text classification. New applications and variants of these models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in Italian, based on deep learning, to simplify future researches relying on sentiment analysis. The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted for inspection and descriptive tasks while deep learning has been applied more for predictive (classification) problems. Overall, the integration of textual (unstructured) and numerical (structured) information has proven useful for systemic risk and bank supervision related analysis. The integration of textual data with numerical data in fact, has brought either to higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events.This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common goal of promoting the stability of the financial system: systemic risk and bank supervision. The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor bank news and deep learning for text classification. New applications and variants of these models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in Italian, based on deep learning, to simplify future researches relying on sentiment analysis. The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted for inspection and descriptive tasks while deep learning has been applied more for predictive (classification) problems. Overall, the integration of textual (unstructured) and numerical (structured) information has proven useful for systemic risk and bank supervision related analysis. The integration of textual data with numerical data in fact, has brought either to higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

LIPIcs, Volume 261, ICALP 2023, Complete Volume

Author: Etessami Kousha
Feige Uriel
Puppis Gabriele
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 261, ICALP 2023, Complete Volum

Dagstuhl Research Online Publication Server