633 research outputs found

    jPREdictor: a versatile tool for the prediction of cis-regulatory elements

    Get PDF
    Gene regulation is the process through which an organism effects spatial and temporal differences in gene expression levels. Knowledge of cis-regulatory elements as key players in gene regulation is indispensable for the understanding of the latter and of the development of organisms. Here we present the tool jPREdictor for the fast and versatile prediction of cis-regulatory elements on a genome-wide scale. The prediction is based on clusters of individual motifs and any combination of these into multi-motifs with selectable minimal and maximal distances. Individual motifs can be of heterogenous classes, such as simple sequence motifs or position-specific scoring matrices. Cluster scores are weighted occurrences of multi-motifs, where the weights are derived from positive and negative training sets. We illustrate the flexibility of the jPREdictor with a new predic-tion of Polycomb/Trithorax Response Elements in Drosophila melanogaster. jPREdictor is available as a graphical user interface for online use and for download at

    RNAhybrid: microRNA target prediction easy, fast and flexible

    Get PDF
    In the elucidation of the microRNA regulatory network, knowledge of potential targets is of highest importance. Among existing target prediction methods, RNAhybrid [M. Rehmsmeier, P. Steffen, M. Höchsmann and R. Giegerich (2004) RNA, 10, 1507–1517] is unique in offering a flexible online prediction. Recently, some useful features have been added, among these the possibility to disallow G:U base pairs in the seed region, and a seed-match speed-up, which accelerates the program by a factor of 8. In addition, the program can now be used as a webservice for remote calls from user-implemented programs. We demonstrate RNAhybrid's flexibility with the prediction of a non-canonical target site for Caenorhabditis elegans miR-241 in the 3′-untranslated region of lin-39. RNAhybrid is available at

    MOCCA: a fexible suite for modelling DNA sequence motif occurrence combinatorics

    Get PDF
    Background Cis-regulatory elements (CREs) are DNA sequence segments that regulate gene expression. Among CREs are promoters, enhancers, Boundary Elements (BEs) and Polycomb Response Elements (PREs), all of which are enriched in specific sequence motifs that form particular occurrence landscapes. We have recently introduced a hierarchical machine learning approach (SVM-MOCCA) in which Support Vector Machines (SVMs) are applied on the level of individual motif occurrences, modelling local sequence composition, and then combined for the prediction of whole regulatory elements. We used SVM-MOCCA to predict PREs in Drosophila and found that it was superior to other methods. However, we did not publish a polished implementation of SVM-MOCCA, which can be useful for other researchers, and we only tested SVM-MOCCA with IUPAC motifs and PREs. Results We here present an expanded suite for modelling CRE sequences in terms of motif occurrence combinatorics—Motif Occurrence Combinatorics Classification Algorithms (MOCCA). MOCCA contains efficient implementations of several modelling methods, including SVM-MOCCA, and a new method, RF-MOCCA, a Random Forest–derivative of SVM-MOCCA. We used SVM-MOCCA and RF-MOCCA to model Drosophila PREs and BEs in cross-validation experiments, making this the first study to model PREs with Random Forests and the first study that applies the hierarchical MOCCA approach to the prediction of BEs. Both models significantly improve generalization to PREs and boundary elements beyond that of previous methods—including 4-spectrum and motif occurrence frequency Support Vector Machines and Random Forests—, with RF-MOCCA yielding the best results. Conclusion MOCCA is a flexible and powerful suite of tools for the motif-based modelling of CRE sequences in terms of motif composition. MOCCA can be applied to any new CRE modelling problems where motifs have been identified. MOCCA supports IUPAC and Position Weight Matrix (PWM) motifs. For ease of use, MOCCA implements generation of negative training data, and additionally a mode that requires only that the user specifies positives, motifs and a genome. MOCCA is licensed under the MIT license and is available on Github at https://github.com/bjornbredesen/MOCCA.publishedVersio

    Complete probabilistic analysis of RNA shapes

    Get PDF
    BACKGROUND: Soon after the first algorithms for RNA folding became available, it was recognised that the prediction of only one energetically optimal structure is insufficient to achieve reliable results. An in-depth analysis of the folding space as a whole appeared necessary to deduce the structural properties of a given RNA molecule reliably. Folding space analysis comprises various methods such as suboptimal folding, computation of base pair probabilities, sampling procedures and abstract shape analysis. Common to many approaches is the idea of partitioning the folding space into classes of structures, for which certain properties can be derived. RESULTS: In this paper we extend the approach of abstract shape analysis. We show how to compute the accumulated probabilities of all structures that share the same shape. While this implies a complete (non-heuristic) analysis of the folding space, the computational effort depends only on the size of the shape space, which is much smaller. This approach has been integrated into the tool RNAshapes, and we apply it to various RNAs. CONCLUSION: Analyses of conformational switches show the existence of two shapes with probabilities approximately [Formula: see text] vs. [Formula: see text] , whereas the analysis of a microRNA precursor reveals one shape with a probability near to 1.0. Furthermore, it is shown that a shape can outperform an energetically more favourable one by achieving a higher probability. From these results, and the fact that we use a complete and exact analysis of the folding space, we conclude that this approach opens up new and promising routes for investigating and understanding RNA secondary structure

    Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3

    Get PDF
    Gene expression is regulated through cis-regulatory elements (CREs), among which are promoters, enhancers, Polycomb/Trithorax Response Elements (PREs), silencers and insulators. Computational prediction of CREs can be achieved using a variety of statistical and machine learning methods combined with different feature space formulations. Although Python packages for DNA sequence feature sets and for machine learning are available, no existing package facilitates the combination of DNA sequence feature sets with machine learning methods for the genome-wide prediction of candidate CREs. We here present Gnocis, a Python package that streamlines the analysis and the modelling of CRE sequences by providing extensible APIs and implementing the glue required for combining feature sets and models for genome-wide prediction. Gnocis implements a variety of base feature sets, including motif pair occurrence frequencies and the k-spectrum mismatch kernel. It integrates with Scikit-learn and TensorFlow for state-of-the-art machine learning. Gnocis additionally implements a broad suite of tools for the handling and preparation of sequence, region and curve data, which can be useful for general DNA bioinformatics in Python. We also present Deep-MOCCA, a neural network architecture inspired by SVM-MOCCA that achieves moderate to high generalization without prior motif knowledge. To demonstrate the use of Gnocis, we applied multiple machine learning methods to the modelling of D. melanogaster PREs, including a Convolutional Neural Network (CNN), making this the first study to model PREs with CNNs. The models are readily adapted to new CRE modelling problems and to other organisms. In order to produce a high-performance, compiled package for Python 3, we implemented Gnocis in Cython. Gnocis can be installed using the PyPI package manager by running ‘pip install gnocis’.publishedVersio

    mkESA: enhanced suffix array construction tool

    Get PDF
    Summary: We introduce the tool mkESA, an open source program for constructing enhanced suffix arrays (ESAs), striving for low memory consumption, yet high practical speed. mkESA is a user-friendly program written in portable C99, based on a parallelized version of the Deep-Shallow suffix array construction algorithm, which is known for its high speed and small memory usage. The tool handles large FASTA files with multiple sequences, and computes suffix arrays and various additional tables, such as the LCP table (longest common prefix) or the inverse suffix array, from given sequence data

    mkESA: enhanced suffix array construction tool

    Get PDF
    Summary: We introduce the tool mkESA, an open source program for constructing enhanced suffix arrays (ESAs), striving for low memory consumption, yet high practical speed. mkESA is a user-friendly program written in portable C99, based on a parallelized version of the Deep-Shallow suffix array construction algorithm, which is known for its high speed and small memory usage. The tool handles large FASTA files with multiple sequences, and computes suffix arrays and various additional tables, such as the LCP table (longest common prefix) or the inverse suffix array, from given sequence data

    Bedeutung der Hybrid Capture Methode für die Diagnostik und Therapie von Vor- und Frühstadien des Zervixkarzinoms

    Full text link
    Humane Papillomaviren (HPV) sind die am häufigsten auf sexuellem Weg übertragenen Viren. Ein vorausgegangener Kontakt mit HPV ist eine notwendige Bedingung für die Entstehung des Zervixkarzinoms. DNA von HPV wird in bis zu 99.7% aller Zervixkarzinome nachgewiesen. Der Hybrid Capture Test ist ein signalamplifizierendes Hybridisierungsassay zum qualitativen Nachweis von HPV-DNA. Der Stellenwert der Methode wurde an einem Risikokollektiv von 664 Patientinnen der Universitätsfrauenklinik Münster geprüft. Aus diesem Kollektiv wurden Befunde zu HPV-Prävalenz, Zytologie, Histologie und Immunhistochemie in die Untersuchung eingebracht. Die retrospektiv erhobenen Daten deuten auf den großen potentiellen Nutzen der Hybrid Capture Methode in der Krebsfrüherkennung. Für eine endgültige und abschließende Stellungnahme zur Integration des HPV-Tests in die bisherige zytologische Krebsvorsorge müssen jedoch die Ergebnisse der noch offenen prospektiven Studien zum primären HPV-Screening abgewartet werden

    Polyploidization increases meiotic recombination frequency in Arabidopsis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Polyploidization is the multiplication of the whole chromosome complement and has occurred frequently in vascular plants. Maintenance of stable polyploid state over generations requires special mechanisms to control pairing and distribution of more than two homologous chromosomes during meiosis. Since a minimal number of crossover events is essential for correct chromosome segregation, we investigated whether polyploidy has an influence on the frequency of meiotic recombination.</p> <p>Results</p> <p>Using two genetically linked transgenes providing seed-specific fluorescence, we compared a high number of progeny from diploid and tetraploid <it>Arabidopsis </it>plants. We show that rates of meiotic recombination in reciprocal crosses of genetically identical diploid and autotetraploid <it>Arabidopsis </it>plants were significantly higher in tetraploids compared to diploids. Although male and female gametogenesis differ substantially in meiotic recombination frequency, both rates were equally increased in tetraploids. To investigate whether multivalent formation in autotetraploids was responsible for the increased recombination rates, we also performed corresponding experiments with allotetraploid plants showing strict bivalent pairing. We found similarly increased rates in auto- and allotetraploids, suggesting that the ploidy effect is independent of chromosome pairing configurations.</p> <p>Conclusions</p> <p>The evolutionary success of polyploid plants in nature and under domestication has been attributed to buffering of mutations and sub- and neo-functionalization of duplicated genes. Should the data described here be representative for polyploid plants, enhanced meiotic recombination, and the resulting rapid creation of genetic diversity, could have also contributed to their prevalence.</p

    A novel approach to remote homology detection: jumping alignments

    Get PDF
    Spang R, Rehmsmeier M, Stoye J. A novel approach to remote homology detection: jumping alignments. Journal of Computational Biology. 2002;9(5):747-760.We describe a new algorithm for protein classification and the detection of remote homologs. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well-balanced manner. This is in contrast to established methods such as profiles and profile hidden Markov models which focus on vertical information as they model the columns of the alignment independently and to family pairwise search which focuses on horizontal information as it treats given sequences separately. In our setting, we want to select from a given database of "candidate sequences" those proteins that belong to a given superfamily. In order to do so, each candidate sequence is separately tested against a multiple alignment of the known members of the superfamily by means of a new jumping alignment algorithm. This algorithm is an extension of the Smith-Waterman algorithm and computes a local alignment of a single sequence and a multiple alignment. In contrast to traditional methods, however, this alignment is not based on a summary of the individual columns of the multiple alignment. Rather, the candidate sequence is at each position aligned to one sequence of the multiple alignment, called the "reference sequence". In addition, the reference sequence may change within the alignment, while each such jump is penalized. To evaluate the discriminative quality of the jumping alignment algorithm, we compare it to profiles, profile hidden Markov models, and family pairwise search on a subset of the SCOP database of protein domains. The discriminative quality is assessed by median false positive counts (med-FP-counts). For moderate med-FP-counts, the number of successful searches with our method is considerably higher than with the competing methods
    corecore