106 research outputs found

    An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs

    Get PDF
    Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty. Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed. Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven

    4DXpress: a database for cross-species expression pattern comparisons

    Get PDF
    In the major animal model species like mouse, fish or fly, detailed spatial information on gene expression over time can be acquired through whole mount in situ hybridization experiments. In these species, expression patterns of many genes have been studied and data has been integrated into dedicated model organism databases like ZFIN for zebrafish, MEPD for medaka, BDGP for Drosophila or GXD for mouse. However, a central repository that allows users to query and compare gene expression patterns across different species has not yet been established. Therefore, we have integrated expression patterns for zebrafish, Drosophila, medaka and mouse into a central public repository called 4DXpress (expression database in four dimensions). Users can query anatomy ontology-based expression annotations across species and quickly jump from one gene to the orthologues in other species. Genes are linked to public microarray data in ArrayExpress. We have mapped developmental stages between the species to be able to compare developmental time phases. We store the largest collection of gene expression patterns available to date in an individual resource, reflecting 16 505 annotated genes. 4DXpress will be an invaluable tool for developmental as well as for computational biologists interested in gene regulation and evolution. 4DXpress is available at http://ani.embl.de/4DXpress

    Bio++: Efficient Extensible Libraries and Tools for Computational Molecular Evolution

    Get PDF
    Efficient algorithms and programs for the analysis of the ever-growing amount of biological sequence data are strongly needed in the genomics era. The pace at which new data and methodologies are generated calls for the use of pre-existing, optimized—yet extensible—code, typically distributed as libraries or packages. This motivated the Bio++ project, aiming at developing a set of C++ libraries for sequence analysis, phylogenetics, population genetics, and molecular evolution. The main attractiveness of Bio++ is the extensibility and reusability of its components through its object-oriented design, without compromising the computer-efficiency of the underlying methods. We present here the second major release of the libraries, which provides an extended set of classes and methods. These extensions notably provide built-in access to sequence databases and new data structures for handling and manipulating sequences from the omics era, such as multiple genome alignments and sequencing reads libraries. More complex models of sequence evolution, such as mixture models and generic n-tuples alphabets, are also included

    Metadata matters: access to image data in the real world

    Get PDF
    Data sharing is important in the biological sciences to prevent duplication of effort, to promote scientific integrity, and to facilitate and disseminate scientific discovery. Sharing requires centralized repositories, and submission to and utility of these resources require common data formats. This is particularly challenging for multidimensional microscopy image data, which are acquired from a variety of platforms with a myriad of proprietary file formats (PFFs). In this paper, we describe an open standard format that we have developed for microscopy image data. We call on the community to use open image data standards and to insist that all imaging platforms support these file formats. This will build the foundation for an open image data repository

    Reticulated origin of domesticated emmer wheat supports a dynamic model for the emergence of agriculture in the fertile crescent

    Get PDF
    We used supernetworks with datasets of nuclear gene sequences and novel markers detecting retrotransposon insertions in ribosomal DNA loci to reassess the evolutionary relationships among tetraploid wheats. We show that domesticated emmer has a reticulated genetic ancestry, sharing phylogenetic signals with wild populations from all parts of the wild range. The extent of the genetic reticulation cannot be explained by post-domestication gene flow between cultivated emmer and wild plants, and the phylogenetic relationships among tetraploid wheats are incompatible with simple linear descent of the domesticates from a single wild population. A more parsimonious explanation of the data is that domesticated emmer originates from a hybridized population of different wild lineages. The observed diversity and reticulation patterns indicate that wild emmer evolved in the southern Levant, and that the wild emmer populations in south-eastern Turkey and the Zagros Mountains are relatively recent reticulate descendants of a subset of the Levantine wild populations. Based on our results we propose a new model for the emergence of domesticated emmer. During a pre-domestication period, diverse wild populations were collected from a large area west of the Euphrates and cultivated in mixed stands. Within these cultivated stands, hybridization gave rise to lineages displaying reticulated genealogical relationships with their ancestral populations. Gradual movement of early farmers out of the Levant introduced the pre-domesticated reticulated lineages to the northern and eastern parts of the Fertile Crescent, giving rise to the local wild populations but also facilitating fixation of domestication traits. Our model is consistent with the protracted and dispersed transition to agriculture indicated by the archaeobotanical evidence, and also with previous genetic data affiliating domesticated emmer with the wild populations in southeast Turkey. Unlike other protracted models, we assume that humans played an intuitive role throughout the process.Natural Environment Research Council [NE/E015948/1]; Slovak Research and Development Agency [APVV-0661-10, APVV-0197-10]info:eu-repo/semantics/publishedVersio

    Evolution of Synonymous Codon Usage in Neurospora tetrasperma and Neurospora discreta

    Get PDF
    Neurospora comprises a primary model system for the study of fungal genetics and biology. In spite of this, little is known about genome evolution in Neurospora. For example, the evolution of synonymous codon usage is largely unknown in this genus. In the present investigation, we conducted a comprehensive analysis of synonymous codon usage and its relationship to gene expression and gene length (GL) in Neurospora tetrasperma and Neurospora discreta. For our analysis, we examined codon usage among 2,079 genes per organism and assessed gene expression using large-scale expressed sequenced tag (EST) data sets (279,323 and 453,559 ESTs for N. tetrasperma and N. discreta, respectively). Data on relative synonymous codon usage revealed 24 codons (and two putative codons) that are more frequently used in genes with high than with low expression and thus were defined as optimal codons. Although codon-usage bias was highly correlated with gene expression, it was independent of selectively neutral base composition (introns); thus demonstrating that translational selection drives synonymous codon usage in these genomes. We also report that GL (coding sequences [CDS]) was inversely associated with optimal codon usage at each gene expression level, with highly expressed short genes having the greatest frequency of optimal codons. Optimal codon frequency was moderately higher in N. tetrasperma than in N. discreta, which might be due to variation in selective pressures and/or mating systems

    Combining Computational Prediction of Cis-Regulatory Elements with a New Enhancer Assay to Efficiently Label Neuronal Structures in the Medaka Fish

    Get PDF
    The developing vertebrate nervous system contains a remarkable array of neural cells organized into complex, evolutionarily conserved structures. The labeling of living cells in these structures is key for the understanding of brain development and function, yet the generation of stable lines expressing reporter genes in specific spatio-temporal patterns remains a limiting step. In this study we present a fast and reliable pipeline to efficiently generate a set of stable lines expressing a reporter gene in multiple neuronal structures in the developing nervous system in medaka. The pipeline combines both the accurate computational genome-wide prediction of neuronal specific cis-regulatory modules (CRMs) and a newly developed experimental setup to rapidly obtain transgenic lines in a cost-effective and highly reproducible manner. 95% of the CRMs tested in our experimental setup show enhancer activity in various and numerous neuronal structures belonging to all major brain subdivisions. This pipeline represents a significant step towards the dissection of embryonic neuronal development in vertebrates

    Population structure and genetic bottleneck in sweet cherry estimated with SSRs and the gametophytic self-incompatibility locus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Domestication and breeding involve the selection of particular phenotypes, limiting the genomic diversity of the population and creating a bottleneck. These effects can be precisely estimated when the location of domestication is established. Few analyses have focused on understanding the genetic consequences of domestication and breeding in fruit trees. In this study, we aimed to analyse genetic structure and changes in the diversity in sweet cherry <it>Prunus avium </it>L.</p> <p>Results</p> <p>Three subgroups were detected in sweet cherry, with one group of landraces genetically very close to the analysed wild cherry population. A limited number of SSR markers displayed deviations from the frequencies expected under neutrality. After the removal of these markers from the analysis, a very limited bottleneck was detected between wild cherries and sweet cherry landraces, with a much more pronounced bottleneck between sweet cherry landraces and modern sweet cherry varieties. The loss of diversity between wild cherries and sweet cherry landraces at the <it>S</it>-locus was more significant than that for microsatellites. Particularly high levels of differentiation were observed for some <it>S</it>-alleles.</p> <p>Conclusions</p> <p>Several domestication events may have happened in sweet cherry or/and intense gene flow from local wild cherry was probably maintained along the evolutionary history of the species. A marked bottleneck due to breeding was detected, with all markers, in the modern sweet cherry gene pool. The microsatellites did not detect the bottleneck due to domestication in the analysed sample. The vegetative propagation specific to some fruit trees may account for the differences in diversity observed at the <it>S</it>-locus. Our study provides insights into domestication events of cherry, however, requires confirmation on a larger sampling scheme for both sweet cherry landraces and wild cherry.</p

    The Cell Cycle Regulated Transcriptome of Trypanosoma brucei

    Get PDF
    Progression of the eukaryotic cell cycle requires the regulation of hundreds of genes to ensure that they are expressed at the required times. Integral to cell cycle progression in yeast and animal cells are temporally controlled, progressive waves of transcription mediated by cell cycle-regulated transcription factors. However, in the kinetoplastids, a group of early-branching eukaryotes including many important pathogens, transcriptional regulation is almost completely absent, raising questions about the extent of cell-cycle regulation in these organisms and the mechanisms whereby regulation is achieved. Here, we analyse gene expression over the Trypanosoma brucei cell cycle, measuring changes in mRNA abundance on a transcriptome-wide scale. We developed a “double-cut” elutriation procedure to select unperturbed, highly synchronous cell populations from log-phase cultures, and compared this to synchronization by starvation. Transcriptome profiling over the cell cycle revealed the regulation of at least 430 genes. While only a minority were homologous to known cell cycle regulated transcripts in yeast or human, their functions correlated with the cellular processes occurring at the time of peak expression. We searched for potential target sites of RNA-binding proteins in these transcripts, which might earmark them for selective degradation or stabilization. Over-represented sequence motifs were found in several co-regulated transcript groups and were conserved in other kinetoplastids. Furthermore, we found evidence for cell-cycle regulation of a flagellar protein regulon with a highly conserved sequence motif, bearing similarity to consensus PUF-protein binding motifs. RNA sequence motifs that are functional in cell-cycle regulation were more widespread than previously expected and conserved within kinetoplastids. These findings highlight the central importance of post-transcriptional regulation in the proliferation of parasitic kinetoplastids
    corecore