Search CORE

44 research outputs found

Creation and management of versions in multiversion data warehouse

Author: Bartosz Bębel
Christian Koncilia
Johann Eder
Robert Wrembel
Tadeusz Morzy
Publication venue
Publication date: 01/01/2004
Field of study

ABSTRACT A data warehouse (DW) provides an information for analytical processing, decision making, and data mining tools. On the one hand, the structure and content of a data warehouse reflects a real world, i.e. data stored in a DW come from real production systems. On the other hand, a DW and its tools may be used for predicting trends and simulating a virtual business scenarios. This activity is often called the what-if analysis. Traditional DW systems have static structure of their schemas and relationships between data, and therefore they are not able to support any dynamics in their structure and content. For these purposes, multiversion data warehouses seem to be very promising. In this paper we present a concept and an ongoing implementation of a multiversion data warehouse that is capable of handling changes in the structure of its schema as well as simulating alternative business scenarios

CiteSeerX

Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage

Author: Leśniewska Anna
Morzy Tadeusz
Okoniewski Michał J.
Ryan Martin
Schlapbach Ralph
Schäfer Beat
Szabelska Alicja
Wachtel Marco
Zyprych-Walczak Joanna
Publication venue
Publication date: 02/08/2017
Field of study

The informational content of RNA sequencing is currently far from being completely explored. Most of the analyses focus on processing tables of counts or finding isoform deconvolution via exon junctions. This article presents a comparison of several techniques that can be used to estimate differential expression of exons or small genomic regions of expression, based on their coverage function shapes. The problem is defined as finding the differentially expressed exons between two samples using local expression profile normalization and statistical measures to spot the differences between two profile shapes. Initial experiments have been done using synthetic data, and real data modified with synthetically created differential patterns. Then, 160 pipelines (5 types of generator × 4 normalizations × 8 difference measures) are compared. As a result, the best analysis pipelines are selected based on linearity of the differential expression estimation and the area under the ROC curve. These platform-independent techniques have been implemented in the Bioconductor package rnaSeqMap. They point out the exons with differential expression or internal splicing, even if the counts of reads may not show this. The areas of application include significant difference searches, splicing identification algorithms and finding suitable regions for QPCR primer

RERO DOC Digital Library

Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage

Author: Alicja Szabelska
Anders
Anna Leśniewska
Beat Schäfer
Bohnert
Choe
Dabney
Garber
Gardina
Guttman
Hower
Jiang
Joanna Zyprych-Walczak
Langmead
Leśniewska
Li
Marco Wachtel
Martin Ryan
Michał J. Okoniewski
Ralph Schlapbach
Roberts
Robertson
Robinson
Robinson
Tadeusz Morzy
Tarazona
Trapnell
Wang
Wu
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Repository for Publications and Research Data

Crossref

PubMed Central

ZORA

Data mining

Author: Morzy Tadeusz
Publication venue: Polska Akademia Nauk. Czytelnia Czasopism PAN
Publication date: 01/01/2007
Field of study

Recent advances in data capture, data transmission and data storage technologies have resulted in a growing gap between more powerful database systems and users' ability to understand and effectively analyze the information collected. Many companies and organizations gather gigabytes or terabytes of business transactions, scientific data, web logs, satellite pictures, textreports, which are simply too large and too complex to support a decision making process. Traditional database and data warehouse querying models are not sufficient to extract trends, similarities and correlations hidden in very large databases. The value of the existing databases and data warehouses can be significantly enhanced with help of data mining. Data mining is a new research area which aims at nontrivial extraction of implicit, previously unknown and potentially useful information from large databases and data warehouses. Data mining, also referred to as database mining or knowledge discovery in databases, can help answer business questions that were too time consuming to resolve with traditional data processing techniques. The process of mining the data can be perceived as a new way of querying – with questions such as ”which clients are likely to respond to our next promotional mailing, and why?”. The aim of this paper is to present an overall picture of the data mining field as well as presents briefly few data mining methods. Finally, we summarize the concepts presented in the paper and discuss some problems related with data mining technology

Biblioteka Nauki - repozytorium artykuÅÃ³w

On querying versions of multiversion data warehouse

Author: Robert Wrembel
Tadeusz Morzy
Publication venue
Publication date: 01/01/2004
Field of study

ABSTRACT A data warehouse (DW) is fed with data that come from external data sources that are production systems. External data sources, which are usually autonomous, often change not only their content but also their structure. The evolution of external data sources has to be reflected in a DW, that uses the sources. Traditional DW systems offer a limited support for handling dynamics in their structure and content. A promising approach to handling changes in DW structure and content is based on a multiversion data warehouse. In such a DW, each DW version describes a schema and data at certain period of time or a given business scenario, created for simulation purposes. In order to appropriately analyze multiversion data, an extension to a traditional SQL language is required. In this paper we propose an approach to querying a multiversion DW. To this end, we extended a SQL language and built a multiversion query language interface with functionality that allows: (1) expressing queries that address several DW versions and (2) presenting their results annotated with metadata information

CiteSeerX

Zakrzewicz M.: Efficient Constraint-Based Sequential Pattern Mining Using Dataset Filtering Techniques

Author: Maciej Zakrzewicz
Marek Wojciechowski
Tadeusz Morzy
Publication venue
Publication date: 07/02/2008
Field of study

Abstract Basic formulation of the sequential pattern discovery problem assumes that the only constraint to be satisfied by discovered patterns is the minimum support threshold. However, very often users want to restrict the set of patterns to be discovered by adding extra constraints on the structure of patterns. Data mining systems should be able to exploit such constraints to speed-up the mining process. In this paper we discuss efficient constraint-based sequential pattern mining using dataset filtering techniques. We show how to transform a given data mining task into an equivalent one operating on a smaller dataset. We present an extension of the GSP algorithm using dataset filtering techniques and experimentally evaluate performance gains offered by the proposed method

CiteSeerX

Advances in Databases and Information Systems

Author: Tadeusz Morzy Theo H
Publication venue: Springer Berlin Heidelberg
Publication date
Field of study

Open Library