467,558 research outputs found

    A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

    Full text link
    The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013 International Conference on Data Minin

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    (Re)-conciliation of genetics and genomics approaches for cotton fiber quality improvement

    Full text link
    The integration of genomics and plant breeding is driven by the increasing availability of sequence resources and by technological developments. The simultaneous measurement of the expression of thousands of genes is possible, and comparisons between contrasting genotypes and/or biological states, as well as within segregating populations has become feasible. In genetical genomics, the merger of genetics and genomics, gene expression profiles are quantitatively assessed within a segregating population, and expression quantitative trait loci (eQTL) can be mapped like classical QTLs. Methods and examples of applications related to genetical genomics will be reviewed, with emphasis on hybridisation-based (microarray) and PCR-based (cDNA-AFLP) techniques. Despite the complexity of the molecular mechanisms underlying its development, the study of the cotton fiber has become a trait of primary interest. Several maps, including QTL maps, have been published, structural and metabolic genes related to fiber initiation or elongation have been identified, and several large EST projects have been developed. In this context, the applicability of a genetical genomics approach for the study of cotton fiber quality will be discussed. A new cooperative project with CIRAD, Bayer CropScience and CSIRO, and supported by the French National Agency for Research, ANR, was initiated in 2007. The project aims at the genetic and genomic dissection of fiber quality using an interspecific Gossypium hirsutum X G. barbadense RIL population. Classical QTL mapping of fiber properties will be undertaken using data from different locations, and eQTLs will be detected using both microarray and cDNA-AFLP population-wide profiling. (Résumé d'auteur

    Genomics and synthetic biology as a viable option to intensify sustainable use of biodiversity

    Get PDF
    The Amazon basin is an area of mega-biodiversity. Different models have been proposed^1-8^ for the establishment of an effective conservation policy, increasing sustainability and adding value for biodiversity. Currently, a broad spectrum of technologies from genomics to synthetic biology is available, and these permit the collection, manipulation and effective evaluation of countless organisms, metabolic pathways and molecules that exist as potential products of a large, biodiverse ecosystem. The use of Genomics and synthetic biology may constitute an important tool and be a viable option for the prospection, evaluation and manipulation of biodiversity as advocated as well as be useful for developing methods for sustainable use and the production of novel molecules

    Genomics knowledge and attitudes among European public health professionals. Results of a cross-sectional survey

    Get PDF
    Background The international public health (PH) community is debating the opportunity to incorporate genomic technologies into PH practice. A survey was conducted to assess attitudes of the European Public Health Association (EUPHA) members towards their role in the implementation of public health genomics (PHG), and their knowledge and attitudes towards genetic testing and the delivery of genetic services. Methods EUPHA members were invited via monthly newsletter and e-mail to take part in an online survey from February 2017 to January 2018. A descriptive analysis of knowledge and attitudes was conducted, along with a univariate and multivariate analysis of their determinants. Results Five hundred and two people completed the questionnaire, 17.9% were involved in PHG activities. Only 28.9% correctly identified all medical conditions for which there is (or not) evidence for implementing genetic testing; over 60% thought that investing in genomics may divert economic resources from social and environmental determinants of health. The majority agreed that PH professionals may play different roles in incorporating genomics into their activities. Better knowledge was associated with positive attitudes towards the use of genetic testing and the delivery of genetic services in PH (OR = 1.48; 95% CI 1.01–2.18). Conclusions Our study revealed quite positive attitudes, but also a need to increase awareness on genomics among European PH professionals. Those directly involved in PHG activities tend to have a more positive attitude and better knowledge; however, gaps are also evident in this group, suggesting the need to harmonize practice and encourage greater exchange of knowledge among professionals

    Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering

    Get PDF
    Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and is available on CRAN.Comment: The original title dated back to May 2015 is "Bootstrap Tests on High Dimensional Covariance Matrices with Applications to Understanding Gene Clustering

    Regularized Partial Least Squares with an Application to NMR Spectroscopy

    Get PDF
    High-dimensional data common in genomics, proteomics, and chemometrics often contains complicated correlation structures. Recently, partial least squares (PLS) and Sparse PLS methods have gained attention in these areas as dimension reduction techniques in the context of supervised data analysis. We introduce a framework for Regularized PLS by solving a relaxation of the SIMPLS optimization problem with penalties on the PLS loadings vectors. Our approach enjoys many advantages including flexibility, general penalties, easy interpretation of results, and fast computation in high-dimensional settings. We also outline extensions of our methods leading to novel methods for Non-negative PLS and Generalized PLS, an adaption of PLS for structured data. We demonstrate the utility of our methods through simulations and a case study on proton Nuclear Magnetic Resonance (NMR) spectroscopy data

    GreenPhylDB: A Gene Family Database for plant functional Genomics

    Get PDF
    With the increasing number of genomes being sequenced, a major objective is to transfer accurate annotation from characterised proteins to uncharacterised sequences. Consequently, comparative genomics has become a usual and efficient strategy in functional genomics. The release of various annotated genomes of plants, such as _O. sativa_ and _A. thaliana_, has allowed setting up comprehensive lists of gene families defined by automated methods. However, like for gene sequence, manual curation of gene families is an important requirement that has to be undertaken. GreenPhylDB comprises protein sequences of 12 plant species fully sequenced that were grouped into homeomorphic families using similarity-based methods. Clusters are finally processed by phylogenetic analysis to infer orthologs and paralogs that will be particularly helpful to study genome evolution. Previously, each cluster has to be curated (i.e. properly named and classified) using different sources of information. A web interface for plant gene families’ curation was developed for that purpose. This interface, accessible on GreenPhylDB ("http://greenphyl.cirad.fr":http://greenphyl.cirad.fr), centralizes external references (e.g. InterPro, KEGG, Swiss-Prot, PIRSF, Pubmed) related to all gene members of the clusters and shows statistics and automatic analysis. We believe that this synthetic view of data available for a gene cluster, combined with basic guidelines, is an efficient way to provide reliable method for gene family annotations
    corecore