Search CORE

31,777 research outputs found

Integrating Data Transformation in Principal Components Analysis

Author: Hu Jianhua
Huang Jianhua Z.
Maadooliat Mehdi
Publication venue: e-Publications@Marquette
Publication date: 01/01/2015
Field of study

Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated using simulated and real-world data examples. Supplementary materials for this article are available online

epublications@Marquette

Crossref

PubMed Central

FigShare

A Data Transformation System for Biological Data Sources

Author: Buneman Peter
Davidson Susan
Hart Kyle
Overton Chris
Wong L.
Publication venue
Publication date: 01/01/1995
Field of study

Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data

CiteSeerX

Edinburgh Research Explorer

ScholarlyCommons@Penn

A Randomized Procedure for Choosing Data Transformation

Author: Norman R. Swanson
Valentina Corradi
Publication venue
Publication date
Field of study

Standard unit root and stationarity tests (see e.g. Dickey and Fuller (1979)) assume linearity under both the null and the alternative hypothesis. Violation of this linearity assumption can result in severe size and power distortion, both in finite and large samples. Thus, it is reasonable to address the problem of data transformation before running a unit root test. In this paper we propose a simple randomized procedure, coupled with sample conditioning, for choosing between levels and log-levels specifications in the presence of deterministic and/or stochastic trends. In particular, we add a randomized component to a basic test statistic, proceed by conditioning on the sample, and show that for all samples except a set of measure zero, the statistic has a X2 limiting distribution under the null hypothesis (log linearity), while it diverges under the alternative hypothesis (level linearity). Once we have chosen the proper data transformation, we remain with the standard problem of testing for a unit root, either in levels or in logs. Monte Carlo findings suggest that the proposed test has good finite sample properties for samples of at least 300 observations. In addition, an examination of the King, Plosser, Stock and Watson (1991) data set is carried out, and evidence in favor of using logged data is provided.Deterministic trend, nonlinear transformation, nonstationarity, randomized procedure.

Research Papers in Economics

Software Usability:A Comparison Between Two Tree-Structured Data Transformation Languages

Author: Sas Corina
Schmidt N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

This paper presents the results of a software usability study, involving both subjective and objective evaluation. It compares a popular XML data transformation language (XSLT) and a general purpose rule-based tree manipulation language which addresses some of the XML and XSLT limitations. The benefits of the evaluation study are discussed

Lancaster E-Prints

Dynamic data transformation for low latency querying in big data systems

Author: De Turck Filip
Ordonez Ante Leandro
Van Seghbroeck Gregory
Vanhove Thomas
Volckaert Bruno
Wauters Tim
Publication venue
Publication date: 01/01/2017
Field of study

Ghent University Academic Bibliography

Two-Sample Tests for High Dimensional Means with Thresholding and Data Transformation

Author: Chen Song Xi
Li Jun
Zhong Ping-Shou
Publication venue
Publication date: 01/01/2014
Field of study

We consider testing for two-sample means of high dimensional populations by thresholding. Two tests are investigated, which are designed for better power performance when the two population mean vectors differ only in sparsely populated coordinates. The first test is constructed by carrying out thresholding to remove the non-signal bearing dimensions. The second test combines data transformation via the precision matrix with the thresholding. The benefits of the thresholding and the data transformations are showed by a reduced variance of the test thresholding statistics, the improved power and a wider detection region of the tests. Simulation experiments and an empirical study are performed to confirm the theoretical findings and to demonstrate the practical implementations.Comment: 64 page

arXiv.org e-Print Archive

Munich RePEc Personal Archive

Prototyping Virtual Data Technologies in ATLAS Data Challenge 1 Production

Author: De K.
Malon D.
Nevski P.
Vaniachine A.
Publication venue
Publication date: 16/06/2003
Field of study

For efficiency of the large production tasks distributed worldwide, it is essential to provide shared production management tools comprised of integratable and interoperable services. To enhance the ATLAS DC1 production toolkit, we introduced and tested a Virtual Data services component. For each major data transformation step identified in the ATLAS data processing pipeline (event generation, detector simulation, background pile-up and digitization, etc) the Virtual Data Cookbook (VDC) catalogue encapsulates the specific data transformation knowledge and the validated parameters settings that must be provided before the data transformation invocation. To provide for local-remote transparency during DC1 production, the VDC database server delivered in a controlled way both the validated production parameters and the templated production recipes for thousands of the event generation and detector simulation jobs around the world, simplifying the production management solutions.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 5 pages, 3 figures, pdf. PSN TUCP01

arXiv.org e-Print Archive

CERN Document Server

Data Transformation and Forecasting in Models with Unit Roots and Cointegration

Author: John C. Chao
Norman R. Swanson
Valentina Corradi
Publication venue
Publication date
Field of study

We perform a series of Monte Carlo experiments in order to evaluate the impact of data transformation on forecasting models, and find that vector error-corrections dominate differenced data vector autoregressions when the correct data transformation is used, but not when data are incorrectly tansformed, even if the true model contains cointegrating restrictions. We argue that one reason for this is the failure of standard unit root and cointegration tests under incorrect data transformation.Integratedness, Cointegratedness, Nonlinear transformation

Research Papers in Economics