2,513 research outputs found

    Transcription Factor-DNA Binding Via Machine Learning Ensembles

    Full text link
    We present ensemble methods in a machine learning (ML) framework combining predictions from five known motif/binding site exploration algorithms. For a given TF the ensemble starts with position weight matrices (PWM's) for the motif, collected from the component algorithms. Using dimension reduction, we identify significant PWM-based subspaces for analysis. Within each subspace a machine classifier is built for identifying the TF's gene (promoter) targets (Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool. Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string) feature PWM-based subspaces that stand out in identifying gene targets. We approach Problem 3 (binding sites) with a novel machine learning approach that uses promoter string features and ML importance scores in a classification algorithm locating binding sites across the genome. For target gene identification this method improves performance (measured by the F1 score) by about 10 percentage points over the (a) motif scanning method and (b) the coexpression-based association method. Top motif outperformed 5 component algorithms as well as two other common algorithms (BEST and DEME). For identifying individual binding sites on a benchmark cross species database (Tompa et al., 2005) we match the best performer without much human intervention. It also improved the performance on mammalian TFs. The ensemble can integrate orthogonal information from different weak learners (potentially using entirely different types of features) into a machine learner that can perform consistently better for more TFs. The TF gene target identification component (problem 1 above) is useful in constructing a transcriptional regulatory network from known TF-target associations. The ensemble is easily extendable to include more tools as well as future PWM-based information.Comment: 33 page

    Zeroing in on violent recidivism among released prisoners

    Get PDF

    Guest Editorial: Systems Biology, the Second Time Around

    Get PDF

    Susceptibility of the southern house mosquito, Culex quinquefasciatus, in East Baton Rouge Parish to larval insecticides

    Get PDF
    Mosquito control districts in Louisiana focus their efforts on Culex quinquefasciatus, the primary vector of West Nile virus in the southern United States, with rigorous larvicide treatments. However, the development of resistant populations of Cx. quinquefasciatus in response to extensive insecticide application has been demonstrated repeatedly. Examining changes in insecticide susceptibility and larvicide efficacy in real world scenarios can help inform mosquito control districts as to whether or not their treatments are killing mosquitoes. We hypothesized that frequent larvicide applications for the control of mosquitoes in East Baton Rouge Parish had lowered susceptibility of wild Cx. quinquefasciatus to insecticides, and that treatment in real-world septic water conditions negatively impacts larvicide efficacy. Larvicide susceptibility and efficacy in septic-water were measured using the larvicides Bacillus sphaericus, spinosad, and temephos. Culex quinquefasciatus populations were sampled from sites in three Parishes where frequencies of insecticide applications varied, and frequencies of resistance and efficacy were measured relative to a susceptible reference colony. Five-fold resistance to the organophosphate temephos was detected at one site in East Baton Rouge Parish in the spring of 2016, which increased to ten-fold resistance by the end of the mosquito season. Activities of esterases were found to be elevated in wild, temephos-resistant mosquitoes, indicating the potential role of these enzymes as a mechanism of resistance. Water quality did not appear to play a significant role in the efficacy of the larvicides used in this study. The results of this study provide a baseline of comparison for future measurements of susceptibility in Cx. quinquefasciatus in Louisiana, and may help inform local mosquito control districts as to the effectiveness and sustainability of their insecticide programs

    Wake Vortex Inverse Model User's Guide

    Get PDF
    NorthWest Research Associates (NWRA) has developed an inverse model for inverting landing aircraft vortex data. The data used for the inversion are the time evolution of the lateral transport position and vertical position of both the port and starboard vortices. The inverse model performs iterative forward model runs using various estimates of vortex parameters, vertical crosswind profiles, and vortex circulation as a function of wake age. Forward model predictions of lateral transport and altitude are then compared with the observed data. Differences between the data and model predictions guide the choice of vortex parameter values, crosswind profile and circulation evolution in the next iteration. Iterations are performed until a user-defined criterion is satisfied. Currently, the inverse model is set to stop when the improvement in the rms deviation between the data and model predictions is less than 1 percent for two consecutive iterations. The forward model used in this inverse model is a modified version of the Shear-APA model. A detailed description of this forward model, the inverse model, and its validation are presented in a different report (Lai, Mellman, Robins, and Delisi, 2007). This document is a User's Guide for the Wake Vortex Inverse Model. Section 2 presents an overview of the inverse model program. Execution of the inverse model is described in Section 3. When executing the inverse model, a user is requested to provide the name of an input file which contains the inverse model parameters, the various datasets, and directories needed for the inversion. A detailed description of the list of parameters in the inversion input file is presented in Section 4. A user has an option to save the inversion results of each lidar track in a mat-file (a condensed data file in Matlab format). These saved mat-files can be used for post-inversion analysis. A description of the contents of the saved files is given in Section 5. An example of an inversion input file, with preferred parameters values, is given in Appendix A. An example of the plot generated at a normal completion of the inversion is shown in Appendix B

    Phenotypic connections in surprising places

    Get PDF
    Connections have been revealed between very different human diseases using phenotype associations in other specie

    The society of genes: networks of functional links between genes from comparative genomics

    Get PDF
    BACKGROUND: Comparative genomics provides at least three methods beyond traditional sequence similarity for identifying functional links between genes: the examination of common phylogenetic distributions, the analysis of conserved proximity along the chromosomes of multiple genomes, and observations of fusions of genes into a multidomain gene in another organism. We have previously generated the links according to each of these methods individually for 43 known microbial genomes. Here we combine these results to construct networks of functional associations. RESULTS: We show that the functional networks obtained by applying these methods have different topologies and that the information they provide is largely additive. In particular, the combined networks of functional links contain an average of 57% of an organism's complete genetic complement, uncover substantial portions of known pathways, and suggest the function of previously unannotated genes. In addition, the combined networks are qualitatively different from the networks obtained using individual methods. They have a dominant cluster that contains approximately 80%-90% of the genes, independent of genome size, and the dominant clusters show the small world behavior expected of a biological system, with global connectivity that is nearly random, and local properties that are highly ordered. CONCLUSIONS: When the information on functional linkage provided by three emerging computational methods is combined, the integrated network uncovers large numbers of conserved pathways and identifies clusters of functionally related genes. It therefore shows considerable utility and promise as a tool for understanding genomic structure, and for guiding high throughput experimental investigations
    • …
    corecore