2 research outputs found

    Towards Enhanced Diagnosis of Diseases using Statistical Analysis of Genomic Copy Number Data

    Get PDF
    Genomic copy number data are a rich source of information about the biological systems they are collected from. They can be used for the diagnoses of various diseases by identifying the locations and extent of aberrations in DNA sequences. However, copy number data are often contaminated with measurement noise which drastically affects the quality and usefulness of the data. The objective of this project is to apply some of the statistical filtering and fault detection techniques to improve the accuracy of diagnosis of diseases by enhancing the accuracy of determining the locations of such aberrations. Some of these techniques include multiscale wavelet-based filtering and hypothesis testing based fault detection. The filtering techniques include Mean Filtering (MF), Exponentially Weighted Moving Average (EWMA), Standard Multiscale Filtering (SMF) and Boundary Corrected Translation Invariant filtering (BCTI). The fault detection techniques include the Shewhart chart, EWMA and Generalized Likelihood Ratio (GLR). The performance of these techniques is illustrated using Monte Carlo simulations and through their application on real copy number data. Based on the Monte Carlo simulations, the non-linear filtering techniques performed better than the linear techniques, with BCTI performing with the least error . At an SNR of 1, BCTI technique had an average mean squared error of 2.34% whereas mean filtering technique had the highest error of 5.24%. As for the fault detection techniques, GLR had the lowest missed detection rate of 1.88% at a fixed false alarm rate of around 4%. At around the same false alarm rate, the Shewhart chart had the highest missed detection of 67.4%. Furthermore, these techniques were applied on real genomic copy number data sets. These included data from breast cancer cell lines (MPE600) and colorectal cancer cell lines (SW837)

    Corrected Interval Multiscale Analysis (CIMSA) for the Decomposition and Reconstruction of Interval Data

    Get PDF
    Multi-Scale Analysis (MSA) is a powerful tool used in process systems engineering and has been utilized in many applications such as fault detection and filtering. In this paper, the extension of MSA for interval data is studied. Unlike single-valued data, interval data use bounds to denote the uncertainties within data points. Data aggregation can be used to convert a set of single-valued data into a smaller set of interval data. The literature on MSA of interval data is sparse and its use in process engineering has not been documented. Therefore, in this paper, three methods of handling interval data are studied: an interval arithmetic (IA) method, a center and radii (CR) method, and an upper and lower (UL) bound method. The main drawback identified when working with intervals is interval inflation/over-estimation. In this paper, interval inflation caused when applying MSA on interval data is described in detail. New algorithms to correct for the over-estimations have been proposed. The overestimations in interval data were corrected, and all three methods performed equally well in decomposing and reconstructing the signals. The Interval MSA algorithms developed were utilized to filter noisy interval data. The CIMSA-CR (the center and radii method) performed the best amongst the three methods for the filtering application. The optimum depth of decomposition, the shape of features in the input signal were also studied to understand how it affects the filtering performance
    corecore