237,509 research outputs found

    Use of pre-transformation to cope with outlying values in important candidate genes

    Get PDF
    Outlying values in predictors often strongly affect the results of statistical analyses in high-dimensional settings. Although they frequently occur with most high-throughput techniques, the problem is often ignored in the literature. We suggest to use a very simple transformation, proposed before in a different context by Royston and Sauerbrei, as an intermediary step between array normalization and high-level statistical analysis. This straightforward univariate transformation identifies extreme values and reduces the influence of outlying values considerably in all further steps of statistical analysis without eliminating the incriminated observation or feature. The use of the transformation and its effects are demonstrated for diverse univariate and multivariate statistical analyses using nine publicly available microarray data sets

    Comparison of Data Normalization for Wine Classification Using K-NN Algorithm

    Get PDF
    The range of values that are not balanced on each attribute can affect the quality of data mining results. For this reason, it is necessary to pre-process the data. This preprocessing is expected to increase the accuracy of the results from the wine dataset classification. The preprocessing method used is data transformation with normalization. There are three ways to do data transformation with normalization, namely min-max normalization, z-score normalization, and decimal scaling. Data that has been processed from each normalization method will be compared to see the results of the best classification accuracy using the K-NN algorithm. The K used in the comparisons were 1, 3, 5, 7, 9, 11. Before classifying the normalized wine dataset, it was divided into test data and training data with k-fold cross validation. The division of the data using k is equal to 10. The results of the classification test with the K-NN algorithm show that the best accuracy lies in the wine dataset which has been normalized using the min-max normalization method with K = 1 of 65.92%. The average obtained is 59.68%

    Preprocessing Strategies for Multiplex Bead Assay Data for Use in Quantitative Trait Loci Analysis

    Get PDF
    INTRODUCTION: Host genetic variants are known to impact infectious disease susceptibility and outcomes. However, the genes underlying these impacts are not well characterized. Multiplex bead assays (MBA) provide an affordable and rapid means of large scale screening for multiple phenotypic measures of immune response. Transformation and normalization approaches for MBA data have not been agreed upon, especially concerning screening applications. AIM: To compare preprocessing techniques in improving validity of quantitative loci trait analyses which utilize MBA phenotypic data with high levels of predictor technical variability using experimental data. METHODS: This research uses primary dendritic cells derived from a set of sixty-one genetically diverse mouse strains to study activation response of an antiviral pathway (RIG-I). Primary outcomes were IFNĪ± and IFNĪ² secretion following RIG-I agonist treatment. Multiple transformation and normalization approaches were used to estimate true IFNĪ± and IFNĪ² responses. Evaluation criteria included three quantitative measures (tail length, kurtosis, skewness) and three qualitative measures (QQ-plot, Bland-Altman plot, Mean-SD plot). RESULTS: Most qualitative measures and quantitative measures found log transformation with quantile normalization was most appropriate for normalizing data and reducing technical variability between batches and replicates. Unfortunately, no statistically significant (Ī± = 0.90) loci of interest were identified with this normalized data. DISCUSSION: The data used to test these methods had notable limitations, mainly only two phenotypic markers and dramatic variability in both technical and biological replicates. While normalization and transformation techniques did ameliorate these issues, additional approaches such as mixed effects modeling may be able to further improve these types of analysis

    Automatic Radiometric Normalization of Multitemporal Satellite Imagery

    Get PDF
    The linear scale invariance of the multivariate alteration detection (MAD) transformation is used to obtain invariant pixels for automatic relative radiometric normalization of time series of multispectral data. Normalization by means of ordinary least squares regression method is compared with normalization using orthogonal regression. The procedure is applied to Landsat TM images over Nevada, Landsat ETM+ images over Morocco, and SPOT HRV images over Kenya. Results from this new automatic, combined MAD/orthogonal regression method, based on statistical analysis of test pixels not used in the actual normalization, compare favorably with results from normalization from manually obtained time-invariant features. (C) 2004 Elsevier Inc. All rights reserved

    Min Max Normalization Based Data Perturbation Method for Privacy Protection

    Get PDF
    Data mining system contain large amount of private and sensitive data such as healthcare, financial and criminal records. These private and sensitive data can not be share to every one, so privacy protection of data is required in data mining system for avoiding privacy leakage of data. Data perturbation is one of the best methods for privacy preserving. We used data perturbation method for preserving privacy as well as accuracy. In this method individual data value are distorted before data mining application. In this paper we present min max normalization transformation based data perturbation. The privacy parameters are used for measurement of privacy protection and the utility measure shows the performance of data mining technique after data distortion. We performed experiment on real life dataset and the result show that min max normalization transformation based data perturbation method is effective to protect confidential information and also maintain the performance of data mining technique after data distortion

    Data Preprocessing for Machine Learning Modules

    Get PDF
    Data preprocessing is an essential step when building machine learning solutions. It significantly impacts the success of machine learning modules and the output of these algorithms. Typically, data preprocessing is made-up of data sanitization, feature engineering, normalization, and transformation. This paper outlines the data preprocessing methodology implemented for a data-driven predictive maintenance solution. The above-mentioned project entails acquiring historical electrical data from industrial assets and creating a health index indicating each asset\u27s remaining useful life. This solution is built using machine learning algorithms and requires several data processing steps to increase the solution\u27s accuracy and efficiency. In this project, the preprocessing measures implemented are data sanitization, daylight savings transformation, feature encoding, and data normalization. The purpose and results of each of the above steps are explained to highlight the importance of data preprocessing in machine learning projects

    Connection problem for the sine-Gordon/Painlev\'e III tau function and irregular conformal blocks

    Get PDF
    The short-distance expansion of the tau function of the radial sine-Gordon/Painlev\'e III equation is given by a convergent series which involves irregular c=1c=1 conformal blocks and possesses certain periodicity properties with respect to monodromy data. The long-distance irregular expansion exhibits a similar periodicity with respect to a different pair of coordinates on the monodromy manifold. This observation is used to conjecture an exact expression for the connection constant providing relative normalization of the two series. Up to an elementary prefactor, it is given by the generating function of the canonical transformation between the two sets of coordinates.Comment: 18 pages, 1 figur

    Cross-Platform Normalization of Microarray and Rna-Seq Data for Machine Learning Applications

    Get PDF
    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log 2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language
    • ā€¦
    corecore