17,171 research outputs found

    Transcription factor binding site identification using the Self-Organizing Map

    Get PDF
    MOTIVATION: The automatic identification of over-represented motifs present in a collection of sequences continues to be a challenging problem in computational biology. Many existing approaches to motif identification do not always find the relevant biological motifs, or find only a subset of the occurrences of a motif. In this paper, we propose a self-organizing map of position weight matrices as an alternative method for motif discovery. The advantage of this approach is that it can be used to simultaneously characterize every feature present in the data set, thus lessening the chance that weaker signals will be missed. Features identified are ranked in terms of over-representation relative to a background model. RESULTS: We present an implementation of this approach, named SOMBRERO, which is capable of discovering multiple distinct motifs present in a single data set. Demonstrated here are the advantages of our approach on various data sets and SOMBRERO’s improved performance over two popular motif-finding programs; MEME and AlignACE. AVAILABILITY: SOMBRERO is available free of charge from http://bioinf.nuigalway.ie/sombrero

    SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes that motifs and background signals can be equivalently characterized. This assumption has some limitations because both sequence signals have distinct properties.</p> <p>Results</p> <p>This paper aims to develop a Self-Organizing Map (SOM) based clustering algorithm for extracting binding sites in DNA sequences. Our framework is based on a novel intra-node soft competitive procedure to achieve maximum discrimination of motifs from background signals in datasets. The intra-node competition is based on an adaptive weighting technique on two different signal models to better represent these two classes of signals. Using several real and artificial datasets, we compared our proposed method with several motif discovery tools. Compared to SOMBRERO, a state-of-the-art SOM based motif discovery tool, it is found that our algorithm can achieve significant improvements in the average precision rates (i.e., about 27%) on the real datasets without compromising its sensitivity. Our method also performed favourably comparing against other motif discovery tools.</p> <p>Conclusions</p> <p>Motif discovery with model based clustering framework should consider the use of heterogeneous model to represent the two classes of signals in DNA sequences. Such heterogeneous model can achieve better signal discrimination compared to the homogeneous model.</p

    On the use of algorithms to discover motifs in DNA sequences

    Get PDF
    Many approaches are currently devoted to find DNA motifs in nucleotide sequences. However, this task remains challenging for specialists nowadays due to the difficulties they find to deeply understand gene regulatory mechanisms, especially when analyzing binding sites in DNA. These sites or specific nucleotide sequences are known to be responsible for transcription processes. Thus, this work aims at providing an updated overview on strategies developed to discover meaningful motifs in DNA-related sequences, and, in particular, their attempts to find out relevant binding sites. From all existing approaches, this work is focused on dictionary, ensemble, and artificial intelligence-based algorithms since they represent the classical and the leading ones, respectively.Ministerio de Ciencia y Tecnología TIN2007- 68084-C-00Junta de Andalucia P07-TIC- 02611

    Regulation of polarised growth in fungi

    Get PDF
    Polarised growth in fungi occurs through the delivery of secretory vesicles along tracks formed by cytoskeletal elements to specific sites on the cell surface where they dock with a multiprotein structure called the exocyst before fusing with the plasmamembrane. The budding yeast, Saccharomyces cerevisiae has provided a useful model to investigate the mechanisms involved and their control. Cortical markers, provided by bud site selection pathways during budding, the septin ring during cytokinesis or the stimulation of the pheromone response receptors during mating, act through upstream signalling pathways to localise Cdc24, the GEF for the rho family GTPase, Cdc42. Cdc42 in its GTP-bound activates a multiprotein protein complex called the polarisome which nucleates actin cables along which the secretory vesicles are transported to the cell surface. Hyphae can elongate at a rate orders of magnitude faster than the extension of a yeast bud, so understanding hyphal growth will require substantial modification of the yeast paradigm. The rapid rate of hyphal growth is driven by a structure called the Spitzenkörper, located just behind the growing tip and which is rich in secretory vesicles. It is thought that secretory vesicles are delivered to the apical region where they accumulate in the Spitzenkörper. The Spitzenkörper then acts as vesicle supply centre in which vesicles exit the Spitzenkörper in all directions, but because of its proximity, the tip receives a greater concentration of vesicles per unit area than subapical regions. There are no obvious equivalents to the bud site selection pathway to provide a spatial landmark for polarised growth in hyphae. However, an emerging model is the way that the site of polarised growth in the fission yeast, Schizosaccharomyces pombe, is marked by delivery of the kelch repeat protein, Tea1, along microtubules. The relationship of the Spitzenkörper to the polarisome and the mechanisms that promote its formation are key questions that form the focus of current research

    An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis

    Get PDF
    Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC) is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.Comment: 54 pages (19 pages main text; 11 Figures; 26 pages of supplementary information). Revised after critical reviews. Accepted for Publication in PLoS ON

    Plasmodium knowlesi Genome Sequences from Clinical Isolates Reveal Extensive Genomic Dimorphism.

    Get PDF
    Plasmodium knowlesi is a newly described zoonosis that causes malaria in the human population that can be severe and fatal. The study of P. knowlesi parasites from human clinical isolates is relatively new and, in order to obtain maximum information from patient sample collections, we explored the possibility of generating P. knowlesi genome sequences from archived clinical isolates. Our patient sample collection consisted of frozen whole blood samples that contained excessive human DNA contamination and, in that form, were not suitable for parasite genome sequencing. We developed a method to reduce the amount of human DNA in the thawed blood samples in preparation for high throughput parasite genome sequencing using Illumina HiSeq and MiSeq sequencing platforms. Seven of fifteen samples processed had sufficiently pure P. knowlesi DNA for whole genome sequencing. The reads were mapped to the P. knowlesi H strain reference genome and an average mapping of 90% was obtained. Genes with low coverage were removed leaving 4623 genes for subsequent analyses. Previously we identified a DNA sequence dimorphism on a small fragment of the P. knowlesi normocyte binding protein xa gene on chromosome 14. We used the genome data to assemble full-length Pknbpxa sequences and discovered that the dimorphism extended along the gene. An in-house algorithm was developed to detect SNP sites co-associating with the dimorphism. More than half of the P. knowlesi genome was dimorphic, involving genes on all chromosomes and suggesting that two distinct types of P. knowlesi infect the human population in Sarawak, Malaysian Borneo. We use P. knowlesi clinical samples to demonstrate that Plasmodium DNA from archived patient samples can produce high quality genome data. We show that analyses, of even small numbers of difficult clinical malaria isolates, can generate comprehensive genomic information that will improve our understanding of malaria parasite diversity and pathobiology

    Chromatin accessibility reveals insights into androgen receptor activation and transcriptional specificity

    Get PDF
    BACKGROUND: Epigenetic mechanisms such as chromatin accessibility impact transcription factor binding to DNA and transcriptional specificity. The androgen receptor (AR), a master regulator of the male phenotype and prostate cancer pathogenesis, acts primarily through ligand-activated transcription of target genes. Although several determinants of AR transcriptional specificity have been elucidated, our understanding of the interplay between chromatin accessibility and AR function remains incomplete. RESULTS: We used deep sequencing to assess chromatin structure via DNase I hypersensitivity and mRNA abundance, and paired these datasets with three independent AR ChIP-seq datasets. Our analysis revealed qualitative and quantitative differences in chromatin accessibility that corresponded to both AR binding and an enrichment of motifs for potential collaborating factors, one of which was identified as SP1. These quantitative differences were significantly associated with AR-regulated mRNA transcription across the genome. Base-pair resolution of the DNase I cleavage profile revealed three distinct footprinting patterns associated with the AR-DNA interaction, suggesting multiple modes of AR interaction with the genome. CONCLUSIONS: In contrast with other DNA-binding factors, AR binding to the genome does not only target regions that are accessible to DNase I cleavage prior to hormone induction. AR binding is invariably associated with an increase in chromatin accessibility and, consequently, changes in gene expression. Furthermore, we present the first in vivo evidence that a significant fraction of AR binds only to half of the full AR DNA motif. These findings indicate a dynamic quantitative relationship between chromatin structure and AR-DNA binding that impacts AR transcriptional specificity

    Characterization of Genetic Signal Sequences with Batch-Learning SOM

    Get PDF
    An unsupervised clustering algorithm Kohonen's SOM is an effective tool for clustering and visualizing high-dimensional complex data on a single map. We previously modified the conventional SOM for genome informatics, making the learning process and resulting map independent of the order of data input on the basis of Batch Learning SOM (BL-SOM). We generated BL-SOMs for tetra- and pentanucleotide frequencies in 300,000 10-kb sequences from 13 eukaryotes for which almost complete genomic sequences are available. BL-SOM recognized species-specific characteristics of oligonucleotide frequencies in most 10-kb sequences, permitting species-specific classification of sequences without any information regarding the species. We next constructed BL-SOMs with tetra- and pentanucleotide frequencies in 37,086 full-length mouse cDNA sequences. With BL-SOM we also analyzed occurrence patterns of the oligonucleotides that are thought to be involved in transcriptional regulation on the human genome

    SWI/SNF-like chromatin remodeling factor Fun30 supports point centromere function in S. cerevisiae

    Get PDF
    Budding yeast centromeres are sequence-defined point centromeres and are, unlike in many other organisms, not embedded in heterochromatin. Here we show that Fun30, a poorly understood SWI/SNF-like chromatin remodeling factor conserved in humans, promotes point centromere function through the formation of correct chromatin architecture at centromeres. Our determination of the genome-wide binding and nucleosome positioning properties of Fun30 shows that this enzyme is consistently enriched over centromeres and that a majority of CENs show Fun30-dependent changes in flanking nucleosome position and/or CEN core micrococcal nuclease accessibility. Fun30 deletion leads to defects in histone variant Htz1 occupancy genome-wide, including at and around most centromeres. FUN30 genetically interacts with CSE4, coding for the centromere-specific variant of histone H3, and counteracts the detrimental effect of transcription through centromeres on chromosome segregation and suppresses transcriptional noise over centromere CEN3. Previous work has shown a requirement for fission yeast and mammalian homologs of Fun30 in heterochromatin assembly. As centromeres in budding yeast are not embedded in heterochromatin, our findings indicate a direct role of Fun30 in centromere chromatin by promoting correct chromatin architecture
    corecore