31,777 research outputs found

    Integrating Data Transformation in Principal Components Analysis

    Get PDF
    Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated using simulated and real-world data examples. Supplementary materials for this article are available online

    A Data Transformation System for Biological Data Sources

    Get PDF
    Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data

    A Randomized Procedure for Choosing Data Transformation

    Get PDF
    Standard unit root and stationarity tests (see e.g. Dickey and Fuller (1979)) assume linearity under both the null and the alternative hypothesis. Violation of this linearity assumption can result in severe size and power distortion, both in finite and large samples. Thus, it is reasonable to address the problem of data transformation before running a unit root test. In this paper we propose a simple randomized procedure, coupled with sample conditioning, for choosing between levels and log-levels specifications in the presence of deterministic and/or stochastic trends. In particular, we add a randomized component to a basic test statistic, proceed by conditioning on the sample, and show that for all samples except a set of measure zero, the statistic has a X2 limiting distribution under the null hypothesis (log linearity), while it diverges under the alternative hypothesis (level linearity). Once we have chosen the proper data transformation, we remain with the standard problem of testing for a unit root, either in levels or in logs. Monte Carlo findings suggest that the proposed test has good finite sample properties for samples of at least 300 observations. In addition, an examination of the King, Plosser, Stock and Watson (1991) data set is carried out, and evidence in favor of using logged data is provided.Deterministic trend, nonlinear transformation, nonstationarity, randomized procedure.

    Software Usability:A Comparison Between Two Tree-Structured Data Transformation Languages

    Get PDF
    This paper presents the results of a software usability study, involving both subjective and objective evaluation. It compares a popular XML data transformation language (XSLT) and a general purpose rule-based tree manipulation language which addresses some of the XML and XSLT limitations. The benefits of the evaluation study are discussed

    Two-Sample Tests for High Dimensional Means with Thresholding and Data Transformation

    Get PDF
    We consider testing for two-sample means of high dimensional populations by thresholding. Two tests are investigated, which are designed for better power performance when the two population mean vectors differ only in sparsely populated coordinates. The first test is constructed by carrying out thresholding to remove the non-signal bearing dimensions. The second test combines data transformation via the precision matrix with the thresholding. The benefits of the thresholding and the data transformations are showed by a reduced variance of the test thresholding statistics, the improved power and a wider detection region of the tests. Simulation experiments and an empirical study are performed to confirm the theoretical findings and to demonstrate the practical implementations.Comment: 64 page

    Prototyping Virtual Data Technologies in ATLAS Data Challenge 1 Production

    Full text link
    For efficiency of the large production tasks distributed worldwide, it is essential to provide shared production management tools comprised of integratable and interoperable services. To enhance the ATLAS DC1 production toolkit, we introduced and tested a Virtual Data services component. For each major data transformation step identified in the ATLAS data processing pipeline (event generation, detector simulation, background pile-up and digitization, etc) the Virtual Data Cookbook (VDC) catalogue encapsulates the specific data transformation knowledge and the validated parameters settings that must be provided before the data transformation invocation. To provide for local-remote transparency during DC1 production, the VDC database server delivered in a controlled way both the validated production parameters and the templated production recipes for thousands of the event generation and detector simulation jobs around the world, simplifying the production management solutions.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 5 pages, 3 figures, pdf. PSN TUCP01

    Data Transformation and Forecasting in Models with Unit Roots and Cointegration

    Get PDF
    We perform a series of Monte Carlo experiments in order to evaluate the impact of data transformation on forecasting models, and find that vector error-corrections dominate differenced data vector autoregressions when the correct data transformation is used, but not when data are incorrectly tansformed, even if the true model contains cointegrating restrictions. We argue that one reason for this is the failure of standard unit root and cointegration tests under incorrect data transformation.Integratedness, Cointegratedness, Nonlinear transformation
    corecore