6 research outputs found

    GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA

    Get PDF
    BACKGROUND: The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence- and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence. RESULTS: GANN (available at ) is a machine learning tool for the detection of conserved features in DNA. The software suite contains programs to extract different regions of genomic DNA from flat files and convert these sequences to indices that reflect sequence and structural composition or the presence of specific protein binding sites. The machine learning component allows the classification of different types of sequences based on subsamples of these indices, and can identify the best combinations of indices and machine learning architecture for sequence discrimination. Another key feature of GANN is the replicated splitting of data into training and test sets, and the implementation of negative controls. In validation experiments, GANN successfully merged important sequence and structural features to yield good predictive models for synthetic and real regulatory regions. CONCLUSION: GANN is a flexible tool that can search through large sets of sequence and structural feature combinations to identify those that best characterize a set of sequences

    Assessing Computational Methods of Cis-Regulatory Module Prediction

    Get PDF
    Computational methods attempting to identify instances of cis-regulatory modules (CRMs) in the genome face a challenging problem of searching for potentially interacting transcription factor binding sites while knowledge of the specific interactions involved remains limited. Without a comprehensive comparison of their performance, the reliability and accuracy of these tools remains unclear. Faced with a large number of different tools that address this problem, we summarized and categorized them based on search strategy and input data requirements. Twelve representative methods were chosen and applied to predict CRMs from the Drosophila CRM database REDfly, and across the human ENCODE regions. Our results show that the optimal choice of method varies depending on species and composition of the sequences in question. When discriminating CRMs from non-coding regions, those methods considering evolutionary conservation have a stronger predictive power than methods designed to be run on a single genome. Different CRM representations and search strategies rely on different CRM properties, and different methods can complement one another. For example, some favour homotypical clusters of binding sites, while others perform best on short CRMs. Furthermore, most methods appear to be sensitive to the composition and structure of the genome to which they are applied. We analyze the principal features that distinguish the methods that performed well, identify weaknesses leading to poor performance, and provide a guide for users. We also propose key considerations for the development and evaluation of future CRM-prediction methods

    Genome wide prediction of HNF4α functional binding sites by the use of local and global sequence context

    Get PDF
    An application of machine learning algorithms enables prediction of the functional context of transcription factor binding sites in the human genome

    Automatic annotation of genomic regulatory sequences by searching for composite clusters

    No full text
    A new method was developed for revealing of composite clusters of cis-elements in promoters of eukaryotic genes that are functionally related or coexpressed. A software system “ClusterScan” have been created that enables: (i) to train system on representative samples of promoters to reveal cis-elements that tend to cluster; (ii) to train system on a number of samples of functionally related promoters to identify functionally coupled transcription factors; (iii) to provide tools for searching of this clusters in genomic sequences to identify and functionally characterize regulatory regions in genome. A number of training samples of different functional and structural groups of promoters were analysed. Search for composite clusters in human chromosomes 21 and 22 reveals a number of interesting examples. Finally, a decision tree system was constructed to classify promoters of several functionally related gene groups. The decision tree system enables to identify new promoters and computationally predict their possible function. 1

    Aplicación de herramientas cuantitativas para el estudio epidemiológico de zoonosis

    Get PDF
    En la presente tesis doctoral se ha explorado la utilidad de la aplicación de técnicas de análisis cuantitativo para el estudio de la epidemiología de enfermedades de relevancia en sanidad animal y salud pública.La mayor parte de estudios epidemiológicos se fundamentan en la clasificación de los individuos de la población en las categorías de infectado y no infectado en función de los resultados de una prueba diagnóstica. No obstante, con frecuencia la fiabilidad de esa prueba no se evalúa previamente, o bien dicha evaluación tiene lugar en unas circunstancias no aplicables posteriormente a la población de interés. En esta tesis se aplicó la estadística Bayesiana en dos escenarios muy diferentes con el fin de estimar la fiabilidad de pruebas diagnósticas ante-mortem. En primer lugar, se evaluó la fiabilidad de las pruebas tradicionales de diagnóstico de la leishmaniosis, la inmunofluorescencia indirecta de anticuerpos (IFAT) y la PCR anidada, en conejos y liebres, recientemente identificados como reservorios competentes del parásito L. infantum y sobre los que no existía información previa..
    corecore