4 research outputs found
Investigating Power and Limitations of Ensemble Motif Finders Using Metapredictor CE3
Ensemble methods represent a relatively new approach to motif discovery that combines the results returned by "third-party" finders with the aim of achieving a better accuracy than that obtained by the single tools. Besides the choice of the external finders, another crucial element for the success of an ensemble method is the particular strategy adopted to combine the finders' results, a.k.a. learning function.
Results appeared in the literature seem to suggest that ensemble methods can provide noticeable improvements over the quality of the most popular tools available for motif discovery.
With the goal of better understanding potentials and limitations of ensemble methods, we developed a general software architecture whose major feature is the flexibility with respect to the crucial aspects of ensemble methods mentioned above. The architecture provides facilities for the easy addition of virtually any third-party tool for motif discovery whose code is publicly available, and for the definition of new learning functions. We present a prototype implementation of our architecture, called CE3 (Customizable and Easily Extensible Ensemble).
Using CE3, and available ensemble methods, we performed experiments with three well-known datasets. The results presented here are varied. On the one hand, they confirm that ensemble methods cannot be just considered as the universal remedy for "in-silico" motif discovery. On the other hand, we found some encouraging regularities that may help to find a general set up for CE3 (and other ensemble methods as well) able to guarantee substantial improvements over single finders in a systematic way
CMStalker: a combinatorial tool for composite motif discovery
Controlling the differential expression of many thousands different genes at any given time is a fundamental task of metazoan organisms and this complex orchestration is controlled by the so-called regulatory genome encoding complex regulatory networks: several Transcription Factors bind to precise DNA regions, so to perform in a cooperative manner a specific regulation task for nearby genes. The in silico prediction of these binding sites is still an open problem, notwithstanding continuous progress and activity in the last two decades. In this paper we describe a new efficient combinatorial approach to the problem of detecting sets of cooperating binding sites in promoter sequences, given in input a database of Transcription Factor Binding Sites encoded as Position Weight Matrices. We present CMStalker, a software tool for composite motif discovery which embodies a new approach that combines a constraint satisfaction formulation with a parameter relaxation technique to explore efficiently the space of possible solutions. Extensive experiments with twelve data sets and eleven state-of-the-art tools are reported, showing an average value of the correlation coefficient of 0.54 (against a value 0.41 of the closest competitor). This improvements in output quality due to CMStalker is statistically significant
CE^3: Customizable and Easily Extensible Ensemble Tool for Motif Discovery
Ensemble methods (or simply ensembles) for motif discovery
represent a relatively new approach to improve the accuracy of standalone motif finders. In particular, the accuracy of an ensemble is determined by the included finders and the strategy (learning rule) used to combine the results returned by the latter, making these choices crucial for the ensemble success. In this research we propose a general architecture for ensembles, called CE3, which is meant to be extensible and customizable for what concerns external tools inclusion and learning rule. Using CE3 the user will be able to “simulate” existing ensembles and possibly incorporate newly proposed tools (and learning functions) with the aim at improving the ensemble’s prediction accuracy. Preliminary experiments performed with a prototype implementation of CE3 led to interesting insights and a critical analysis of the potentials and limitations of currently available ensembles
CMF: a Combinatorial Tool to Find Composite Motifs
Controlling the differential expression of many thousands
genes at any given time is a fundamental task of metazoan organisms and this complex orchestration is controlled by the so-called regulatory genome encoding complex regulatory networks. Cis-Regulatory Modules are fundamental units of such networks. To detect Cis-Regulatory Modules \u201cin-silico\u201d a key step is the discovery of recurrent clusters of DNA binding sites for sets of cooperating Transcription Factors. Composite
motif is the term often adopted to refer to these clusters of sites. In this paper we describe CMF, a new efficient combinatorial method for the problem of detecting composite motifs, given in input a description of the binding affinities for a set of transcription factors. Testing with known benchmark data, we attain statistically significant better performance against nine state-of-the-art competing methods