54 research outputs found

    SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes that motifs and background signals can be equivalently characterized. This assumption has some limitations because both sequence signals have distinct properties.</p> <p>Results</p> <p>This paper aims to develop a Self-Organizing Map (SOM) based clustering algorithm for extracting binding sites in DNA sequences. Our framework is based on a novel intra-node soft competitive procedure to achieve maximum discrimination of motifs from background signals in datasets. The intra-node competition is based on an adaptive weighting technique on two different signal models to better represent these two classes of signals. Using several real and artificial datasets, we compared our proposed method with several motif discovery tools. Compared to SOMBRERO, a state-of-the-art SOM based motif discovery tool, it is found that our algorithm can achieve significant improvements in the average precision rates (i.e., about 27%) on the real datasets without compromising its sensitivity. Our method also performed favourably comparing against other motif discovery tools.</p> <p>Conclusions</p> <p>Motif discovery with model based clustering framework should consider the use of heterogeneous model to represent the two classes of signals in DNA sequences. Such heterogeneous model can achieve better signal discrimination compared to the homogeneous model.</p

    MLP Neural Networks using Octave NN Package

    Get PDF
    This tutorial gives an overview of how to construct multilayer backpropagation neural network using Octave Neural Network package. Exercises are provided

    SOMIX: Motifs Discovery in Gene Regulatory Sequences Using Self-Organizing Maps

    Get PDF
    We present a clustering algorithm called Self-organizing Map Neural Network with mixed signals discrimination (SOMIX), to discover binding sites in a set of regulatory regions. Our framework integrates a novel intra-node soft competitive procedure in each node model to achieve maximum discrimination of motif from background signals. The intra-node competition is based on an adaptive weighting technique on two different signal models: position specific scoring matrix and markov chain. Simulations on real and artificial datasets showed that, SOMIX could achieve significant performance improvement in terms of sensitivity and specificity over SOMBRERO, which is a well-known SOM based motif discovery tool. SOMIX has also been found promising comparing against other popular motif discovery tools

    Ensemble Prediction of Enhancers Associated Marks Using K-mer Feature

    Get PDF

    Computational Discovery of Motifs Using Hierarchical Clustering Techniques

    Get PDF
    Discovery of motifs plays a key role in understanding gene regulation in organisms. Existing tools for motif discovery demonstrate some weaknesses in dealing with reliability and scalability. Therefore, development of advanced algorithms for resolving this problem will be useful. This paper aims to develop data mining techniques for discovering motifs. A mismatch based hierarchical clustering algorithm is proposed in this paper, where three heuristic rules for classifying clusters and a post-processing for ranking and refining the clusters are employed in the algorithm. Our algorithm is evaluated using two sets of DNA sequences with comparisons. Results demonstrate that the proposed techniques in this paper outperform MEME, AlignACE and SOMBRERO for most of the testing datasets

    Realization of Generalized RBF Network

    Get PDF
    Neural classifiers have been widely used in many application areas. This paper describes generalized neural classifier based on the radial basis function network. The contributions of this work are: i) improvement on the standard radial basis function network architecture, ii) proposed a new cost function for classification, iii) hidden units feature subset selection algorithm, and iv) optimizing the neural classifier using the genetic algorithm with a new cost function. Comparative studies on the proposed neural classifier on protein classification problem are given

    MISCORE: Mismatch-Based Matrix Similarity Scores for DNA Motif Detection

    Get PDF
    To detect or discover motifs in DNA sequences, two important concepts related to existing computational approaches are motif model and similarity score. One of motif models, represented by a position frequency matrix (PFM), has been widely employed to search for putative motifs. Detection and discovery of motifs can be done by comparing kmers with a motif model, or clustering kmers according to some criteria. In the past, information content based similarity scores have been widely used in searching tools. In this paper, we present a mismatchbased matrix similarity score (namely, MISCORE) for motif searching and discovering purpose. The proposed MISCORE can be biologically interpreted as an evolutionary metric for predicting a kmer as a motif member or not. Weighting factors, which are meaningful for biological data mining practice, are introduced in the MISCORE. The effectiveness of the MISCORE is investigated through exploring its separability, recognizability and robustness. Three well-known information contentbased matrix similarity scores are compared, and results show that our MISCORE works well

    Optimization of MISCORE-based Motif Identification Systems

    Get PDF
    Identification of motifs in DNA sequences using classification techniques is one of computational approaches to discovering novel binding sites. In the previous work [16], we proposed a simple and effective method for motif detection using a single crisp rule governed by a mismatch-based matrix similarity score (MISCORE). In this paper, we consider the problem of finding suitable motif cut-off value for MISCORE-based motif identification systems using cost-sensitivity metric. We utilize phylogenetic footprinting data to estimate the parameters in the cost function. We also extend the MISCORE to include entropy to weigh each motif model position to minimize the false positive rate. The performance evaluation is done by using artificial and real DNA sequences. The results demonstrate the feasibility and usefulness of our proposed approach for model based cut-off value estimation

    Improved H3K27ac Histone Mark Prediction using K-mer Proximity Feature

    Get PDF
    Prediction of gene regulatory elements-enhancers is computationally challenging because features associated with them are ill-understood. Several histone marks are known to be associated with enhancers locations and have been successfully used to predict multiple thousands of enhancers approximate locations. The k-mer (a short continuous nucleotides of length k) is one of the most commonly engineered features from histone sequences for machine learning task. However, usually large kmer (i.e. 5 ≤ k ≤ 7) feature set is needed to perform well and no domain knowledge is used. In this study we proposed the kmer proximity feature which is domain dependent to represent the H3K27ac histone enrichment in DNA sequences. This feature represents the spatial content of DNA sequences. We compare the performances of using the proximity and the k-mer feature for H3K27ac marks prediction and results indicate that the proposed feature gives higher prediction accuracy rates. These findings supported that the proximity feature is a more distinguishing feature of DNA sequences with histone modification enrichment

    Potential Perils of Biological Sequence Visualization using Sequence Logo

    Get PDF
    Sequence motif’s characteristics are commonly visualized by using a sequence logo. This paper describes a user study aimed at evaluating the effectiveness of sequence logo as evaluation metric for motif prediction tools. We also investigate the nature of confirmation biases in using sequence logos in result reporting in publications. While sequence logos have been widely used for visualizing sequence motifs in the past 20 years, no study has reported its effectiveness and possible misuses in decision making. We conducted a paper-and-pencil test to determine the effectiveness of sequence logos in some of their common usages. A survey study was also performed to investigate sequence logos’ learnability. We found that there are great mismatches between users’ perception and actual quality of motifs when sequence logos were used as an evaluation metric. Therefore, evaluation of motif prediction tools based on sequence logos has to be interpreted cautiously. Our result also suggests that there are still room for improvements in the current sequence logo’s layout design
    corecore