2 research outputs found

    Towards Enhanced Diagnosis of Diseases using Statistical Analysis of Genomic Copy Number Data

    Get PDF
    Genomic copy number data are a rich source of information about the biological systems they are collected from. They can be used for the diagnoses of various diseases by identifying the locations and extent of aberrations in DNA sequences. However, copy number data are often contaminated with measurement noise which drastically affects the quality and usefulness of the data. The objective of this project is to apply some of the statistical filtering and fault detection techniques to improve the accuracy of diagnosis of diseases by enhancing the accuracy of determining the locations of such aberrations. Some of these techniques include multiscale wavelet-based filtering and hypothesis testing based fault detection. The filtering techniques include Mean Filtering (MF), Exponentially Weighted Moving Average (EWMA), Standard Multiscale Filtering (SMF) and Boundary Corrected Translation Invariant filtering (BCTI). The fault detection techniques include the Shewhart chart, EWMA and Generalized Likelihood Ratio (GLR). The performance of these techniques is illustrated using Monte Carlo simulations and through their application on real copy number data. Based on the Monte Carlo simulations, the non-linear filtering techniques performed better than the linear techniques, with BCTI performing with the least error . At an SNR of 1, BCTI technique had an average mean squared error of 2.34% whereas mean filtering technique had the highest error of 5.24%. As for the fault detection techniques, GLR had the lowest missed detection rate of 1.88% at a fixed false alarm rate of around 4%. At around the same false alarm rate, the Shewhart chart had the highest missed detection of 67.4%. Furthermore, these techniques were applied on real genomic copy number data sets. These included data from breast cancer cell lines (MPE600) and colorectal cancer cell lines (SW837)
    corecore