395 research outputs found

    Towards Enhanced Diagnosis of Diseases using Statistical Analysis of Genomic Copy Number Data

    Get PDF
    Genomic copy number data are a rich source of information about the biological systems they are collected from. They can be used for the diagnoses of various diseases by identifying the locations and extent of aberrations in DNA sequences. However, copy number data are often contaminated with measurement noise which drastically affects the quality and usefulness of the data. The objective of this project is to apply some of the statistical filtering and fault detection techniques to improve the accuracy of diagnosis of diseases by enhancing the accuracy of determining the locations of such aberrations. Some of these techniques include multiscale wavelet-based filtering and hypothesis testing based fault detection. The filtering techniques include Mean Filtering (MF), Exponentially Weighted Moving Average (EWMA), Standard Multiscale Filtering (SMF) and Boundary Corrected Translation Invariant filtering (BCTI). The fault detection techniques include the Shewhart chart, EWMA and Generalized Likelihood Ratio (GLR). The performance of these techniques is illustrated using Monte Carlo simulations and through their application on real copy number data. Based on the Monte Carlo simulations, the non-linear filtering techniques performed better than the linear techniques, with BCTI performing with the least error . At an SNR of 1, BCTI technique had an average mean squared error of 2.34% whereas mean filtering technique had the highest error of 5.24%. As for the fault detection techniques, GLR had the lowest missed detection rate of 1.88% at a fixed false alarm rate of around 4%. At around the same false alarm rate, the Shewhart chart had the highest missed detection of 67.4%. Furthermore, these techniques were applied on real genomic copy number data sets. These included data from breast cancer cell lines (MPE600) and colorectal cancer cell lines (SW837)

    A new class of multiscale lattice cell (MLC) models for spatio-temporal evolutionary image representation

    Get PDF
    Spatio-temporal evolutionary (STE) images are a class of complex dynamical systems that evolve over both space and time. With increased interest in the investigation of nonlinear complex phenomena, especially spatio-temporal behaviour governed by evolutionary laws that are dependent on both spatial and temporal dimensions, there has been an increased need to investigate model identification methods for this class of complex systems. Compared with pure temporal processes, the identification of spatio-temporal models from observed images is much more difficult and quite challenging. Starting with an assumption that there is no apriori information about the true model but only observed data are available, this study introduces a new class of multiscale lattice cell (MLC) models to represent the rules of the associated spatio-temporal evolutionary system. An application to a chemical reaction exhibiting a spatio-temporal evolutionary behaviour, is investigated to demonstrate the new modelling framework

    MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework

    Get PDF
    We present MUSIC, a signal processing approach for identification of enriched regions in ChIP-Seq data, available at music.gersteinlab.org. MUSIC first filters the ChIP-Seq read-depth signal for systematic noise from non-uniform mappability, which fragments enriched regions. Then it performs a multiscale decomposition, using median filtering, identifying enriched regions at multiple length scales. This is useful given the wide range of scales probed in ChIP-Seq assays. MUSIC performs favorably in terms of accuracy and reproducibility compared with other methods. In particular, analysis of RNA polymerase II data reveals a clear distinction between the stalled and elongating forms of the polymerase. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-014-0474-3) contains supplementary material, which is available to authorized users

    Generalised additive multiscale wavelet models constructed using particle swarm optimisation and mutual information for spatio-temporal evolutionary system representation

    Get PDF
    A new class of generalised additive multiscale wavelet models (GAMWMs) is introduced for high dimensional spatio-temporal evolutionary (STE) system identification. A novel two-stage hybrid learning scheme is developed for constructing such an additive wavelet model. In the first stage, a new orthogonal projection pursuit (OPP) method, implemented using a particle swarm optimisation(PSO) algorithm, is proposed for successively augmenting an initial coarse wavelet model, where relevant parameters of the associated wavelets are optimised using a particle swarm optimiser. The resultant network model, obtained in the first stage, may however be a redundant model. In the second stage, a forward orthogonal regression (FOR) algorithm, implemented using a mutual information method, is then applied to refine and improve the initially constructed wavelet model. The proposed two-stage hybrid method can generally produce a parsimonious wavelet model, where a ranked list of wavelet functions, according to the capability of each wavelet to represent the total variance in the desired system output signal is produced. The proposed new modelling framework is applied to real observed images, relative to a chemical reaction exhibiting a spatio-temporal evolutionary behaviour, and the associated identification results show that the new modelling framework is applicable and effective for handling high dimensional identification problems of spatio-temporal evolution sytems

    Improved Shewhart Chart Using Multiscale Representation

    Get PDF
    Most univariate process monitoring techniques operate under three main assumptions, that the process residuals being evaluated are Gaussian, independent and contain a moderate level of noise. The performance of the conventional Shewhart chart, for example, is adversely affected when these assumptions are violated. Multiscale wavelet-based representation is a powerful data analysis tool that can help better satisfy these assumptions, i.e., decorrelate autocorrelated data, separate noise from features, and transform the data to better follow a Gaussian distribution at multiple scales. This research focused on developing an algorithm to extend the conventional Shewhart chart using multiscale representation to enhance its performance. Through simulated synthetic data, the developed multiscale Shewhart chart showed improved performance (with lower missed detection and false alarm rates) than the conventional Shewhart chart. The developed multiscale Shewhart chart was also applied to two real world applications, simulated distillation column data, and genomic copy number data, to illustrate the advantage of using the multiscale Shewhart chart for process monitoring over the conventional one

    Geometric deep learning: going beyond Euclidean data

    Get PDF
    Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques. In particular, we would like to use deep neural networks, which have recently proven to be powerful tools for a broad range of problems from computer vision, natural language processing, and audio analysis. However, these tools have been most successful on data with an underlying Euclidean or grid-like structure, and in cases where the invariances of these structures are built into networks used to model them. Geometric deep learning is an umbrella term for emerging techniques attempting to generalize (structured) deep neural models to non-Euclidean domains such as graphs and manifolds. The purpose of this paper is to overview different examples of geometric deep learning problems and present available solutions, key difficulties, applications, and future research directions in this nascent field

    Tiling array data analysis: a multiscale approach using wavelets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wavelets. In doing this, we used specific wavelet basis functions, <it>Coiflets</it>, since their triangular shape closely resembles the expected profiles of true ChIP-chip peaks.</p> <p>Results</p> <p>In our wavelet-transformed data, we observed that noise tends to be confined to small scales while the useful signal-of-interest spans multiple large scales. We were also able to show that wavelet coefficients due to non-specific cross-hybridization follow a log-normal distribution, and we used this fact in developing a thresholding procedure. In particular, wavelets allow one to set an unambiguous, absolute threshold, which has been hard to define in ChIP-chip experiments. One can set this threshold by requiring a similar confidence level at different length-scales of the transformed signal. We applied our algorithm to a number of representative ChIP-chip data sets, including those of Pol II and histone modifications, which have a diverse distribution of length-scales of biochemical activity, including some broad peaks.</p> <p>Conclusions</p> <p>Finally, we benchmarked our method in comparison to other approaches for scoring ChIP-chip data using spike-ins on the ENCODE Nimblegen tiling array. This comparison demonstrated excellent performance, with wavelets getting the best overall score.</p
    corecore