5,148 research outputs found

    A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data

    Get PDF
    DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes

    Two-leak isolation in water distribution networks based on k-NN and linear discriminant classifiers

    Get PDF
    In this paper, the two-simultaneous-leak isolation problem in water distribution networks is addressed. This methodology relies on optimal sensor placement together with a leak location strategy using two well-known classifiers: k-NN and discriminant analysis. First, zone segmentation of the water distribution network is proposed, aiming to reduce the computational cost that involves all possible combinations of two-leak scenarios. Each zone is composed of at least two consecutive nodes, which means that the number of zones is at most half the number of nodes. With this segmentation, the leak identification task is to locate the zones where the pair of leaks are occurring. To quantify the uncertainty degree, a relaxation node criterion is used. The simulation results evidenced that the outcomes are accurate in most cases by using one-relaxation-node and two-relaxation-node criteria.The APC was funded by Tecnológico de MonterreyPeer ReviewedPostprint (published version
    • …
    corecore