467,558 research outputs found
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
The combination of multiple classifiers using ensemble methods is
increasingly important for making progress in a variety of difficult prediction
problems. We present a comparative analysis of several ensemble methods through
two case studies in genomics, namely the prediction of genetic interactions and
protein functions, to demonstrate their efficacy on real-world datasets and
draw useful conclusions about their behavior. These methods include simple
aggregation, meta-learning, cluster-based meta-learning, and ensemble selection
using heterogeneous classifiers trained on resampled data to improve the
diversity of their predictions. We present a detailed analysis of these methods
across 4 genomics datasets and find the best of these methods offer
statistically significant improvements over the state of the art in their
respective domains. In addition, we establish a novel connection between
ensemble selection and meta-learning, demonstrating how both of these disparate
methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013
International Conference on Data Minin
Kernel methods in genomics and computational biology
Support vector machines and kernel methods are increasingly popular in
genomics and computational biology, due to their good performance in real-world
applications and strong modularity that makes them suitable to a wide range of
problems, from the classification of tumors to the automatic annotation of
proteins. Their ability to work in high dimension, to process non-vectorial
data, and the natural framework they provide to integrate heterogeneous data
are particularly relevant to various problems arising in computational biology.
In this chapter we survey some of the most prominent applications published so
far, highlighting the particular developments in kernel methods triggered by
problems in biology, and mention a few promising research directions likely to
expand in the future
(Re)-conciliation of genetics and genomics approaches for cotton fiber quality improvement
The integration of genomics and plant breeding is driven by the increasing availability of sequence resources and by technological developments. The simultaneous measurement of the expression of thousands of genes is possible, and comparisons between contrasting genotypes and/or biological states, as well as within segregating populations has become feasible. In genetical genomics, the merger of genetics and genomics, gene expression profiles are quantitatively assessed within a segregating population, and expression quantitative trait loci (eQTL) can be mapped like classical QTLs. Methods and examples of applications related to genetical genomics will be reviewed, with emphasis on hybridisation-based (microarray) and PCR-based (cDNA-AFLP) techniques. Despite the complexity of the molecular mechanisms underlying its development, the study of the cotton fiber has become a trait of primary interest. Several maps, including QTL maps, have been published, structural and metabolic genes related to fiber initiation or elongation have been identified, and several large EST projects have been developed. In this context, the applicability of a genetical genomics approach for the study of cotton fiber quality will be discussed. A new cooperative project with CIRAD, Bayer CropScience and CSIRO, and supported by the French National Agency for Research, ANR, was initiated in 2007. The project aims at the genetic and genomic dissection of fiber quality using an interspecific Gossypium hirsutum X G. barbadense RIL population. Classical QTL mapping of fiber properties will be undertaken using data from different locations, and eQTLs will be detected using both microarray and cDNA-AFLP population-wide profiling. (Résumé d'auteur
Genomics and synthetic biology as a viable option to intensify sustainable use of biodiversity
The Amazon basin is an area of mega-biodiversity. Different models have been proposed^1-8^ for the establishment of an effective conservation policy, increasing sustainability and adding value for biodiversity. Currently, a broad spectrum of technologies from genomics to synthetic biology is available, and these permit the collection, manipulation and effective evaluation of countless organisms, metabolic pathways and molecules that exist as potential products of a large, biodiverse ecosystem. The use of Genomics and synthetic biology may constitute an important tool and be a viable option for the prospection, evaluation and manipulation of biodiversity as advocated as well as be useful for developing methods for sustainable use and the production of novel molecules
Genomics knowledge and attitudes among European public health professionals. Results of a cross-sectional survey
Background The international public health (PH) community is debating the opportunity to incorporate genomic technologies into PH practice. A survey was conducted to assess attitudes of the European Public Health Association (EUPHA) members towards their role in the implementation of public health genomics (PHG), and their knowledge and attitudes towards genetic testing and the delivery of genetic services. Methods EUPHA members were invited via monthly newsletter and e-mail to take part in an online survey from February 2017 to January 2018. A descriptive analysis of knowledge and attitudes was conducted, along with a univariate and multivariate analysis of their determinants. Results Five hundred and two people completed the questionnaire, 17.9% were involved in PHG activities. Only 28.9% correctly identified all medical conditions for which there is (or not) evidence for implementing genetic testing; over 60% thought that investing in genomics may divert economic resources from social and environmental determinants of health. The majority agreed that PH professionals may play different roles in incorporating genomics into their activities. Better knowledge was associated with positive attitudes towards the use of genetic testing and the delivery of genetic services in PH (OR = 1.48; 95% CI 1.01–2.18). Conclusions Our study revealed quite positive attitudes, but also a need to increase awareness on genomics among European PH professionals. Those directly involved in PHG activities tend to have a more positive attitude and better knowledge; however, gaps are also evident in this group, suggesting the need to harmonize practice and encourage greater exchange of knowledge among professionals
Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering
Comparing large covariance matrices has important applications in modern
genomics, where scientists are often interested in understanding whether
relationships (e.g., dependencies or co-regulations) among a large number of
genes vary between different biological states. We propose a computationally
fast procedure for testing the equality of two large covariance matrices when
the dimensions of the covariance matrices are much larger than the sample
sizes. A distinguishing feature of the new procedure is that it imposes no
structural assumptions on the unknown covariance matrices. Hence the test is
robust with respect to various complex dependence structures that frequently
arise in genomics. We prove that the proposed procedure is asymptotically valid
under weak moment conditions. As an interesting application, we derive a new
gene clustering algorithm which shares the same nice property of avoiding
restrictive structural assumptions for high-dimensional genomics data. Using an
asthma gene expression dataset, we illustrate how the new test helps compare
the covariance matrices of the genes across different gene sets/pathways
between the disease group and the control group, and how the gene clustering
algorithm provides new insights on the way gene clustering patterns differ
between the two groups. The proposed methods have been implemented in an
R-package HDtest and is available on CRAN.Comment: The original title dated back to May 2015 is "Bootstrap Tests on High
Dimensional Covariance Matrices with Applications to Understanding Gene
Clustering
Regularized Partial Least Squares with an Application to NMR Spectroscopy
High-dimensional data common in genomics, proteomics, and chemometrics often
contains complicated correlation structures. Recently, partial least squares
(PLS) and Sparse PLS methods have gained attention in these areas as dimension
reduction techniques in the context of supervised data analysis. We introduce a
framework for Regularized PLS by solving a relaxation of the SIMPLS
optimization problem with penalties on the PLS loadings vectors. Our approach
enjoys many advantages including flexibility, general penalties, easy
interpretation of results, and fast computation in high-dimensional settings.
We also outline extensions of our methods leading to novel methods for
Non-negative PLS and Generalized PLS, an adaption of PLS for structured data.
We demonstrate the utility of our methods through simulations and a case study
on proton Nuclear Magnetic Resonance (NMR) spectroscopy data
GreenPhylDB: A Gene Family Database for plant functional Genomics
With the increasing number of genomes being sequenced, a major objective is to transfer accurate annotation from characterised proteins to uncharacterised sequences. Consequently, comparative genomics has become a usual and efficient strategy in functional genomics. The release of various annotated genomes of plants, such as _O. sativa_ and _A. thaliana_, has allowed setting up comprehensive lists of gene families defined by automated methods. However, like for gene sequence, manual curation of gene families is an important requirement that has to be undertaken. GreenPhylDB comprises protein sequences of 12 plant species fully sequenced that were grouped into homeomorphic families using similarity-based methods. Clusters are finally processed by phylogenetic analysis to infer orthologs and paralogs that will be particularly helpful to study genome evolution. Previously, each cluster has to be curated (i.e. properly named and classified) using different sources of information. A web interface for plant gene families’ curation was developed for that purpose. This interface, accessible on GreenPhylDB ("http://greenphyl.cirad.fr":http://greenphyl.cirad.fr), centralizes external references (e.g. InterPro, KEGG, Swiss-Prot, PIRSF, Pubmed) related to all gene members of the clusters and shows statistics and automatic analysis. We believe that this synthetic view of data available for a gene cluster, combined with basic guidelines, is an efficient way to provide reliable method for gene family annotations
- …
