648 research outputs found

    A Gene Ontology Tutorial in Python.

    Get PDF
    This chapter is a tutorial on using Gene Ontology resources in the Python programming language. This entails querying the Gene Ontology graph, retrieving Gene Ontology annotations, performing gene enrichment analyses, and computing basic semantic similarity between GO terms. An interactive version of the tutorial, including solutions, is available at http://gohandbook.org

    Phylogenetic profiling: how much input data is enough?

    Get PDF
    Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the quality of predictions. In this work, we ask: how many genomes and functional annotations need to be considered for phylogenetic profiling to be effective? Phylogenetic profiling generally benefits from an increased amount of input data. However, by decomposing this improvement in predictive accuracy in terms of the contribution of additional genomes and of additional annotations, we observed diminishing returns in adding more than ∼ 100 genomes, whereas increasing the number of annotations remained strongly beneficial throughout. We also observed that maximising phylogenetic diversity within a clade of interest improves predictive accuracy, but the effect is small compared to changes in the number of genomes under comparison. Finally, we show that these findings are supported in light of the Open World Assumption, which posits that functional annotation databases are inherently incomplete. All the tools and data used in this work are available for reuse from http://lab.dessimoz.org/14_phylprof. Scripts used to analyse the data are available on request from the authors

    Phylogenetic assessment of alignments reveals neglected tree signal in gaps

    Get PDF
    Tree-based tests of alignment methods enable the evaluation of the effect of gap placement on the inference of phylogenetic relationships

    Alignments with non-overlapping moves, inversions and tandem duplications in O ( n 4) time

    Get PDF
    Sequence alignment is a central problem in bioinformatics. The classical dynamic programming algorithm aligns two sequences by optimizing over possible insertions, deletions and substitutions. However, other evolutionary events can be observed, such as inversions, tandem duplications or moves (transpositions). It has been established that the extension of the problem to move operations is NP-complete. Previous work has shown that an extension restricted to non-overlapping inversions can be solved in O(n 3) with a restricted scoring scheme. In this paper, we show that the alignment problem extended to non-overlapping moves can be solved in O(n 5) for general scoring schemes, O(n 4log n) for concave scoring schemes and O(n 4) for restricted scoring schemes. Furthermore, we show that the alignment problem extended to non-overlapping moves, inversions and tandem duplications can be solved with the same time complexities. Finally, an example of an alignment with non-overlapping moves is provide

    Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise

    Get PDF
    The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood (ML) on characters aligned either pairwise or jointly in a multiple sequence alignment (MSA). Estimators for the covariance of pairs from an MSA are known, but we are not aware of any solution for cases of pairs aligned independently. In large-scale analyses, it may be too costly to compute MSAs every time distances must be compared, and therefore a covariance estimator for distances estimated from pairs aligned independently is desirable. Knowledge of covariances improves any process that compares or combines distances, such as in generalized least-squares phylogenetic tree building, orthology inference, or lateral gene transfer detection

    A case for hornblende dominated fractionation of arc magmas: the Chelan Complex (Washington Cascades)

    Get PDF
    Amphibole fractionation in the deep roots of subduction-related magmatic arcs is a fundamental process for the generation of the continental crust. Field relations and geochemical data of exposed lower crustal igneous rocks can be used to better constrain these processes. The Chelan Complex in the western U.S. forms the lowest level of a 40-km thick exposed crustal section of the North Cascades and is composed of olivine websterite, pyroxenite, hornblendite, and dominantly by hornblende gabbro and tonalite. Magmatic breccias, comb layers and intrusive contacts suggest that the Chelan Complex was build by igneous processes. Phase equilibria, textural observations and mineral chemistry yield emplacement pressures of ∼1.0GPa followed by isobaric cooling to 700°C. The widespread occurrence of idiomorphic hornblende and interstitial plagioclase together with the lack of Eu anomalies in bulk rock compositions indicate that the differentiation is largely dominated by amphibole. Major and trace element modeling constrained by field observations and bulk chemistry demonstrate that peraluminous tonalite could be derived by removing successively 3% of olivine websterite, 12% of pyroxene hornblendite, 33% of pyroxene hornblendite, 19% of gabbros, 15% of diorite and 2% tonalite. Peraluminous tonalite with high Sr/Y that are worldwide associated with active margin settings can be derived from a parental basaltic melt by crystal fractionation at high pressure provided that amphibole dominates the fractionation process. Crustal assimilation during fractionation is thus not required to generate peraluminous tonalit

    Effets des programmes de prévention à focus neuromusculaire chez l’athlète adolescente: revue de la littérature et méta-analyse

    Get PDF
    Dans la pratique du sport, les athlètes adolescentes sont particulièrement à risque de blessures en raison du niveau élevé d'exposition à un stade de grands changements physiologiques. La pratique des échauffements à focus neuromusculaire lors des entraînements ainsi qu’en compétition semble représenter une approche optimale afin de diminuer le taux de blessures. L’objectif de notre revue est d’évaluer l’effet des programmes de prévention à focus neuromusculaire sur le risque de blessures du membre inférieur chez l’adolescente sportive.Während des Sporttreibens sind jugendliche Athleten aufgrund der grossen physiognomischen Veränderungen, denen sie ausgesetzt sind, besonders verletzungsgefährdet. Ein Aufwärmen mit Fokus auf die Neuromuskulär während des Trainings sowie des Wettkampfes scheint daher ein optimaler Ansatz zu sein um die Anzahl Verletzungen zu verringern. Ziel dieser Arbeit ist es, den Effekt von präventiven Programmen mit Fokus auf die Neuromuskulär auf das Risiko von Verletzungen der Unteren Extremität der Jugendlichen zu evaluieren

    Benchmarking gene ontology function predictions using negative annotations.

    Get PDF
    With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations. This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments. All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not. Supplementary data are available at Bioinformatics online

    A putative origin of the insect chemosensory receptor superfamily in the last common eukaryotic ancestor

    Get PDF
    The insect chemosensory repertoires of Odorant Receptors (ORs) and Gustatory Receptors (GRs) together represent one of the largest families of ligand-gated ion channels. Previous analyses have identified homologous 'Gustatory Receptor-Like (GRL)' proteins across Animalia, but the evolutionary origin of this novel class of ion channels is unknown. We describe a survey of unicellular eukaryotic genomes for GRLs, identifying several candidates in fungi, protists and algae that contain many structural features characteristic of animal GRLs. The existence of these proteins in unicellular eukaryotes, together with ab initio protein structure predictions, provide evidence for homology between GRLs and a family of uncharacterized plant proteins containing the DUF3537 domain. Together, our analyses suggest an origin of this protein superfamily in the last common eukaryotic ancestor
    corecore