12 research outputs found

    Patrocles: a database of polymorphic miRNA-mediated gene regulation in vertebrates

    Get PDF
    The Patrocles database (http://www.patrocles.org/) compiles DNA sequence polymorphisms (DSPs) that are predicted to perturb miRNA-mediated gene regulation. Distinctive features include: (i) the coverage of seven vertebrate species in its present release, aiming for more when information becomes available, (ii) the coverage of the three compartments involved in the silencing process (i.e. targets, miRNA precursors and silencing machinery), (iii) contextual information that enables users to prioritize candidate ‘Patrocles DSPs’, including graphical information on miRNA-target coexpression and eQTL effect of genotype on target expression levels, (iv) the inclusion of Copy Number Variants and eQTL information that affect miRNA precursors as well as genes encoding components of the silencing machinery and (v) a tool (Patrocles finder) that allows the user to determine whether her favorite DSP may perturb miRNA-mediated gene regulation of custom target sequences. To support the biological relevance of Patrocles' content, we searched for signatures of selection acting on ‘Patrocles single nucleotide polymorphisms (pSNPs)’ in human and mice. As expected, we found a strong signature of purifying selection against not only SNPs that destroy conserved target sites but also against SNPs that create novel, illegitimate target sites, which is reminiscent of the Texel mutation in sheep

    Trimming the complexity of Ranking by Pairwise Comparison

    Get PDF
    In computer science research, and more specifically in bioinformatics, the size of databases never stops to increase. This can be an issue when trying to answer questions that imply algorithms in nonlinear polynomial time with regards to the number of objects in the database, the number of attributes or the number of associated labels per objects. This is the case of the Ranking by Pairwise Comparison (RPC) algorithm. This algorithm builds a model which is able to predict the label preference for a given object, but the computation needs to be performed in an order of N*(N-1)/2 in terms of the number N of labels. Indeed, a pairwise comparator model is needed for each possible pair of labels. Our hypothesis is that a significant part of the set of comparators often contains redundancy and/or noise, so that trimming the set could be beneficiary. We implemented several methods, starting from the simplest one, which merely chooses a set of T comparators (T < N*(N-1)/2) at random, to a more complex approach based on partially randomized greedy search. This thesis will provide a detailed overview of the context we are working in, provide the reader with required background, describe existing preference learning algorithms including RPC, investigate on possible trimming methods and their accuracy, then will conclude on the relevance and robustness of the trimming approximation. After implementing and executing the procedure, we could see that using between N/2 and 2N comparators was sufficient to keep up with the original RPC algorithm, as long as a smart trimming method is used, and sometimes even outperforms it on noisy datasets. Also, comparing the use of base models in regression mode vs. classification mode showed that models built in regression mode may be more robust when using the original RPC. We thus empirically show that, in the particular case of RPC, reducing the complexity of the method gives similar or better results, which means that problems that could not be addressed by this algorithm, or at least not in an acceptable period of time, now can be. We also found that the regression mode yields RPC to be often more robust regarding its base learner parameters, meaning that the quest of optimality, which can also be time-consuming, is less difficult. Yet research on this topic is not over, and we could think of different means to further improve the RPC algorithm or investigate other innovative approaches, which will be discussed in the future work section. Also, the trimming method is not limited to RPC and could be applied to other algorithms which aggregate information provided by a set of models, e.g. the whole multitude of ensemble models used in machine learning

    Using Class-probability Models instead of Hard Classifiers as Base Learners in the Ranking by Pairwise Comparison Algorithm

    Full text link
    In the field of Preference Learning, the Ranking by Pairwise Comparison algorithm (RPC) consists of using the learning sample to derive pairwise comparators for each possible pair of class labels, and then aggregating the predictions of the whole set of pairwise comparators for a given object in order to produce a global ranking of the class labels. In its standard form, RPC uses hard binary classifiers assigning an integer (0/1) score to each class concerned by a pairwise comparison. In the present work, we compare this setting with a modified version of RPC, where soft binary class-probability models replace the binary classifiers. To this end, we compare ensembles of extremely randomized classprobability estimation trees with ensembles of extremely randomized classification trees. We empirically show that both approaches lead to equivalent results in terms of Spearman’s rho value when using the optimal settings of their metaparameters. However, we also show that in the context of small and noisy datasets (e.g. with partial ranking information) the use of class-probability models is more robust with respect to variations of its meta-parameter values than the hard classifier ensembles. This suggests that using (soft) class-probability comparators is a sensible option in the context of RPC approaches

    Comparator selection for RPC with many labels

    Full text link
    peer reviewedThe Ranking by Pairwise Comparison algorithm (RPC) is a well established label ranking method. However, its complexity is of O(N²) in the number N of labels. We present algorithms for selection, before model construction, a subset of comparators of size O(N), to reduce the computational complexity without loss in accuracy

    PREDetector 2.0: Online and Enhanced Version of the Prokaryotic Regulatory Elements Detector Tool

    Full text link
    In the era that huge numbers of microbial genomes are being released in the databases, it becomes increasingly important to rapidly mine genes as well as predict the regulatory networks that control their expression. To this end, we have developed an improved and online version of the PREDetector software aimed at identifying putative transcription factor-binding sites (TFBS) in bacterial genomes. The original philosophy of PREDetector 1.0 is maintained, i.e. to allow users to freely fix the DNA-motif screening parameters, and to provide a statistical means to estimate the reliability of the prediction output. This new version offers an interactive table as well as graphics to dynamically alter the main screening parameters with automatic update of the list of identified putative TFBS. PREDetector 2.0 also has the following additional options: (i) access to genome sequences from different databases, (ii) access to weight matrices from public repositories, (iii) visualization of the predicted hits in their genomic context, (iv) grouping of hits identified in the same upstream region, (v) possibility to store the performed jobs, and (vi) automated export of the results in various formats. PREDetector 2.0 is available at http://predetector.fsc.ulg.ac.be/

    PREDetector : Prokaryotic Regulatory Element Detector

    Full text link
    Background: In the post-genomic area, in silico predictions of regulatory networks are considered as a powerful approach to decipher and understand biological pathways within prokaryotic cells. The emergence of position weight matrices based programs has facilitated the access to this approach. However, a tool that automatically estimates the reliability of the predictions and would allow users to extend predictions in genomic regions generally regarded with no regulatory functions was still highly demanded. Result: Here, we introduce PREDetector, a tool developed for predicting regulons of DNA-binding proteins in prokaryotic genomes that (i) automatically predicts, scores and positions potential binding sites and their respective target genes, (ii) includes the downstream co-regulated genes, (iii) extends the predictions to coding sequences and terminator regions, (iv) saves private matrices and allows predictions in other genomes, and (v) provides an easy way to estimate the reliability of the predictions. Conclusion: We present, with PREDetector, an accurate prokaryotic regulon prediction tool that maximally answers biologists’ requests. PREDetector can be downloaded freely at http://www.montefiore.ulg.ac.be/~hiard/predetectorfr.htm

    Compiling polymorphic miRNA-target interactions: the Patrocles database.

    Full text link
    Using positional cloning, we have recently identified the mutation responsible for muscular phenotype of the Texel sheep. It is located in the 3’UTR of the GDF8 gene - a known developmental repressor of muscle growth - and creates an illegitimate target site for miRNA expressed in the same tissue. This causes miRNA-mediated translation inhibition of mutant GDF8 transcripts which leads to muscle hypertrophy. We followed up on this finding by searching for common polymorphisms and mutations that affect either (i) RNAi silencing machinery components, (ii) miRNA precursors or (iii) target sites. These might likewise alter miRNA-target interaction and could be responsible for substantial differences in gene expression level. They have been compiled in a public database (“Patrocles”: www.patrocles.org), where they are classified in (i) DNA sequence polymorphisms (DSP) affecting the silencing machinery, (ii) DSP affecting miRNA structure or expression and (iii) DSP affecting miRNA target sites. DSP from the last category were organized in four classes: destroying a target site conserved between mammals (DC), destroying a non-conserved target site (DNC), creating a non-conserved target site (CNC), or shifting a target site (S). To aid in the identification of the most relevant DSP (such as those were a target site is created in an antitarget gene), we have quantified the level of coexpression for all miRNA-gene pairs. Analysis of the numbers of Patrocles-DSP as well as their allelic frequency distribution indicates that a substantial proportion of them undergo purifying selection. The signature of selection was most pronounced for the DC class but was significant for the DNC and CNC class as well, suggesting that a significant proportion of non-conserved targets is truly functional. The Patrocles database allowed for the selection of DSP that are likely to affect gene function and possibly disease susceptibility. The effect of these DSP is being studied both in vitro and in vivo. In conclusion, Patrocles-DSP could be widespread and underlie an appreciable amount of phenotypic variation, including common disease susceptibility

    Detection of micro-RNA/gene interactions involved in angiogenesis using machine learning techniques

    Full text link
    Motivation: Angiogenesis is the process responsible for the growth of new blood vessels from existing ones. It is also associated with the development of cancer, as tumors need to be irrigated by blood vessels for growing. New cancer therapies appear that exploit angiogenesis inhibitors, also called angiostatic agents, to asphyxiate and starve the tumors. Better understanding the regulatory mechanisms that control angiogenesis is thus fundamental. Recently, short non-coding RNA molecules, called micro-RNAs, have been discovered that are involved in post- transcriptional regulation of gene expressions. These molecules bind to RNA messengers following the base pairing rules, preventing them from being translated into proteins and/or tagging them for degradation. The main goal of this work is to use computational approaches to identify micro-RNAs involved in angiogenesis. Method: In order to identify genes involved in angiogenesis, bovine endothelial cells were treated by a known angiogenesis inhibitor [1], prolactin 16K, and their gene expression profile was compared to the profile of untreated cells. The genes were then divided into three classes: up-regulated, down-regulated, and unaffected genes. The 3'UTR regions of these genes were then analysed by machine learning techniques. Different approaches were considered. First, we described each gene by a vector of motif counts in their 3'UTR regions and used machine learning techniques to rank the motifs according to their relevance for separating the genes into the different classes. We considered successively motifs corresponding to the seeds of known micro- RNAs and also all possible motifs of a given length. To rank the motifs, we compared ensemble of decision trees and linear support vector machines. Second, we considered an approach called Segment and Combine that was proposed in [2]. Finally, we also carried out an exhaustive search of all motifs of a given length that satisfy some constraints on specificity and coverage with respect to a given gene category. Results: The ability of the different approaches at identifying relevant motifs was first assessed on genes predicted to be the target of some known miRNAs. In this simple setting, most methods were able to identify the micro-RNA seed. The results obtained on the genes regulated by prolactin 16K are also very encouraging. We were able to identify one micro-RNA already known to play a role in angiogenesis and several motifs are predicted by different approaches as very specific of up- or down-regulation by prolactin 16K. Their relationship with known micro-RNAs is certainly worth exploring. Conclusion: Machine learning approaches are promising techniques for the identification of micro-RNA/gene interactions. Future work will concern the application of the same kind of techniques on promoters for the identification of transcription factor binding sites
    corecore