220 research outputs found
Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes
Causal inference approaches in systems genetics exploit quantitative trait
loci (QTL) genotypes to infer causal relationships among phenotypes. The
genetic architecture of each phenotype may be complex, and poorly estimated
genetic architectures may compromise the inference of causal relationships
among phenotypes. Existing methods assume QTLs are known or inferred without
regard to the phenotype network structure. In this paper we develop a
QTL-driven phenotype network method (QTLnet) to jointly infer a causal
phenotype network and associated genetic architecture for sets of correlated
phenotypes. Randomization of alleles during meiosis and the unidirectional
influence of genotype on phenotype allow the inference of QTLs causal to
phenotypes. Causal relationships among phenotypes can be inferred using these
QTL nodes, enabling us to distinguish among phenotype networks that would
otherwise be distribution equivalent. We jointly model phenotypes and QTLs
using homogeneous conditional Gaussian regression models, and we derive a
graphical criterion for distribution equivalence. We validate the QTLnet
approach in a simulation study. Finally, we illustrate with simulated data and
a real example how QTLnet can be used to infer both direct and indirect effects
of QTLs and phenotypes that co-map to a genomic region.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS288 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Quantitative measures for the management and comparison of annotated genomes
<p>Abstract</p> <p>Background</p> <p>The ever-increasing number of sequenced and annotated genomes has made management of their annotations a significant undertaking, especially for large eukaryotic genomes containing many thousands of genes. Typically, changes in gene and transcript numbers are used to summarize changes from release to release, but these measures say nothing about changes to individual annotations, nor do they provide any means to identify annotations in need of manual review.</p> <p>Results</p> <p>In response, we have developed a suite of quantitative measures to better characterize changes to a genome's annotations between releases, and to prioritize problematic annotations for manual review. We have applied these measures to the annotations of five eukaryotic genomes over multiple releases – <it>H. sapiens</it>, <it>M. musculus</it>, <it>D. melanogaster</it>, <it>A. gambiae</it>, and <it>C. elegans</it>.</p> <p>Conclusion</p> <p>Our results provide the first detailed, historical overview of how these genomes' annotations have changed over the years, and demonstrate the usefulness of these measures for genome annotation management.</p
Recommended from our members
The Sequence Ontology: a tool for the unification of genome annotations.
The Sequence Ontology (SO) is a structured controlled vocabulary for the parts of a genomic annotation. SO provides a common set of terms and definitions that will facilitate the exchange, analysis and management of genomic data. Because SO treats part-whole relationships rigorously, data described with it can become substrates for automated reasoning, and instances of sequence features described by the SO can be subjected to a group of logical operations termed extensional mereology operators.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
A standard variation file format for human genome sequences
Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology website and from an Amazon elastic block storage (EBS) snapshot for use in Amazon's EC2 cloud computing environment
Venom insulins of cone snails diversify rapidly and track prey taxa
A specialized insulin was recently found in the venom of a fish-hunting cone snail, Conus geographus. Here we show that many worm-hunting and snail-hunting cones also express venom insulins, and that this novel gene family has diversified explosively. Cone snails express a highly conserved insulin in their nerve ring; presumably this conventional signaling insulin is finely tuned to the Conus insulin receptor, which also evolves very slowly. By contrast, the venom insulins diverge rapidly, apparently in response to biotic interactions with prey and also possibly the cones’ own predators and competitors. Thus, the inwardly directed signaling insulins appear to experience predominantly purifying sele\ction to target an internal receptor that seldom changes, while the outwardly directed venom insulins frequently experience directional selection to target heterospecific insulin receptors in a changing mix of prey, predators and competitors. Prey insulin receptors may often be constrained in ways that prevent their evolutionary escape from targeted venom insulins, if amino-acid substitutions that result in escape also degrade the receptor’s signaling functions
Transcriptomic profiling reveals extraordinary diversity of venom peptides in unexplored predatory gastropods of the genus Clavus
Predatory gastropods of the superfamily Conoidea number over 12,000 living species. The evolutionary success of this lineage can be explained by the ability of conoideans to produce complex venoms for hunting, defense and competitive interactions. Whereas venoms of cone snails (family Conidae) have become increasingly well studied, the venoms of most other conoidean lineages remain largely uncharacterized. In the present study we present the venom gland transcriptomes of two species of the genus Clavus that belong to the family Drilliidae. Venom gland transcriptomes of two specimens of Clavus canalicularis, and two specimens of Cv. davidgilmouri were analyzed, leading to the identification of a total of 1,176 putative venom peptide toxins ( drillipeptides ). Based on the combined evidence of secretion signal sequence identity, entire precursor similarity search (BLAST), and the orthology inference, putative Clavus toxins were assigned to 158 different gene families. The majority of identified transcripts comprise signal, pro-, mature peptide, and post- regions, with a typically short ( \u3c 50 amino acids) and cysteine-rich mature peptide region. Thus drillipeptides are structurally similar to conotoxins. However, convincing homology with known groups of Conus toxins was only detected for very few toxin families. Among these are Clavus counterparts of Conus venom insulins (drillinsulins), porins (drilliporins), highly diversified lectins (drillilectins). The short size of most drillipeptpides and structural similarity to conotoxins was unexpected, given that most related conoidean gastropod families (Terebridae and Turridae) possess longer mature peptide regions. Our findings indicate that, similar to conotoxins, drillipeptides may represent a valuable resource for future pharmacological exploration
The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae.
A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms
Large-Scale Trends in the Evolution of Gene Structures within 11 Animal Genomes
We have used the annotations of six animal genomes (Homo sapiens, Mus musculus, Ciona intestinalis, Drosophila melanogaster, Anopheles gambiae, and Caenorhabditis elegans) together with the sequences of five unannotated Drosophila genomes to survey changes in protein sequence and gene structure over a variety of timescales—from the less than 5 million years since the divergence of D. simulans and D. melanogaster to the more than 500 million years that have elapsed since the Cambrian explosion. To do so, we have developed a new open-source software library called CGL (for “Comparative Genomics Library”). Our results demonstrate that change in intron–exon structure is gradual, clock-like, and largely independent of coding-sequence evolution. This means that genome annotations can be used in new ways to inform, corroborate, and test conclusions drawn from comparative genomics analyses that are based upon protein and nucleotide sequence similarities
- …