725 research outputs found
A complete tool set for molecular QTL discovery and analysis
Population scale studies combining genetic information with molecular phenotypes (for example, gene expression) have become a standard to dissect the effects of genetic variants onto organismal phenotypes. These kinds of data sets require powerful, fast and versatile methods able to discover molecular Quantitative Trait Loci (molQTL). Here we propose such a solution, QTLtools, a modular framework that contains multiple new and well-established methods to prepare the data, to discover proximal and distal molQTLs and, finally, to integrate them with GWAS variants and functional annotations of the genome. We demonstrate its utility by performing a complete expression QTL study in a few easy-to-perform steps. QTLtools is open source and available at https://qtltools.github.io/qtltools/.</p
Iron Age and Anglo-Saxon genomes from East England reveal British migration history
British population history has been shaped by a series of immigrations, including the early Anglo-Saxon migrations after 400 CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences from 10 individuals excavated close to Cambridge in the East of England, ranging from the late Iron Age to the middle Anglo-Saxon period. By analysing shared rare variants with hundreds of modern samples from Britain and Europe, we estimate that on average the contemporary East English population derives 38% of its ancestry from Anglo-Saxon migrations. We gain further insight with a new method, rarecoal, which infers population history and identifies fine-scale genetic ancestry from rare variants. Using rarecoal we find that the Anglo-Saxon samples are closely related to modern Dutch and Danish populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain
Inference of population splits and mixtures from genome-wide allele frequency data
Many aspects of the historical relationships between populations in a species
are reflected in genetic data. Inferring these relationships from genetic data,
however, remains a challenging task. In this paper, we present a statistical
model for inferring the patterns of population splits and mixtures in multiple
populations. In this model, the sampled populations in a species are related to
their common ancestor through a graph of ancestral populations. Using
genome-wide allele frequency data and a Gaussian approximation to genetic
drift, we infer the structure of this graph. We applied this method to a set of
55 human populations and a set of 82 dog breeds and wild canids. In both
species, we show that a simple bifurcating tree does not fully describe the
data; in contrast, we infer many migration events. While some of the migration
events that we find have been detected previously, many have not. For example,
in the human data we infer that Cambodians trace approximately 16% of their
ancestry to a population ancestral to other extant East Asian populations. In
the dog data, we infer that both the boxer and basenji trace a considerable
fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to
domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese)
result from admixture between modern toy breeds and "ancient" Asian breeds.
Software implementing the model described here, called TreeMix, is available at
http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15
figures. This is an updated version of the preprint available at
http://precedings.nature.com/documents/6956/version/
Profiling allele-specific gene expression in brains from individuals with autism spectrum disorder reveals preferential minor allele usage.
One fundamental but understudied mechanism of gene regulation in disease is allele-specific expression (ASE), the preferential expression of one allele. We leveraged RNA-sequencing data from human brain to assess ASE in autism spectrum disorder (ASD). When ASE is observed in ASD, the allele with lower population frequency (minor allele) is preferentially more highly expressed than the major allele, opposite to the canonical pattern. Importantly, genes showing ASE in ASD are enriched in those downregulated in ASD postmortem brains and in genes harboring de novo mutations in ASD. Two regions, 14q32 and 15q11, containing all known orphan C/D box small nucleolar RNAs (snoRNAs), are particularly enriched in shifts to higher minor allele expression. We demonstrate that this allele shifting enhances snoRNA-targeted splicing changes in ASD-related target genes in idiopathic ASD and 15q11-q13 duplication syndrome. Together, these results implicate allelic imbalance and dysregulation of orphan C/D box snoRNAs in ASD pathogenesis
Compression of Structured High-Throughput Sequencing Data
Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.National Center for Research Resources (U.S.) (Grant UL1 RR024996)Leukemia & Lymphoma Society of America (Translational Research Program Grant LLS 6304-11)National Institute of Mental Health (U.S.) (R01 MH086883
Scans for signatures of selection in Russian cattle breed genomes reveal new candidate genes for environmental adaptation and acclimation
Domestication and selective breeding has resulted in over 1000 extant cattle breeds. Many of these breeds do not excel in important traits but are adapted to local environments. These adaptations are a valuable source of genetic material for efforts to improve commercial breeds. As a step toward this goal we identified candidate regions to be under selection in genomes of nine Russian native cattle breeds adapted to survive in harsh climates. After comparing our data to other breeds of European and Asian origins we found known and novel candidate genes that could potentially be related to domestication, economically important traits and environmental adaptations in cattle. The Russian cattle breed genomes contained regions under putative selection with genes that may be related to adaptations to harsh environments (e.g., AQP5, RAD50, and RETREG1). We found genomic signatures of selective sweeps near key genes related to economically important traits, such as the milk production (e.g., DGAT1, ABCG2), growth (e.g., XKR4), and reproduction (e.g., CSF2). Our data point to candidate genes which should be included in future studies attempting to identify genes to improve the extant breeds and facilitate generation of commercial breeds that fit better into the environments of Russia and other countries with similar climates
Whole-genome sequencing for an enhanced understanding of genetic variation among South Africans
The Southern African Human Genome Programme is a national initiative that aspires to
unlock the unique genetic character of southern African populations for a better understanding
of human genetic diversity. In this pilot study the Southern African Human Genome
Programme characterizes the genomes of 24 individuals (8 Coloured and 16 black southeastern
Bantu-speakers) using deep whole-genome sequencing. A total of ~16 million unique
variants are identified. Despite the shallow time depth since divergence between the two
main southeastern Bantu-speaking groups (Nguni and Sotho-Tswana), principal component
analysis and structure analysis reveal significant (p < 10−6) differentiation, and FST analysis
identifies regions with high divergence. The Coloured individuals show evidence of varying
proportions of admixture with Khoesan, Bantu-speakers, Europeans, and populations from the
Indian sub-continent. Whole-genome sequencing data reveal extensive genomic diversity,
increasing our understanding of the complex and region-specific history of African populations
and highlighting its potential impact on biomedical research and genetic susceptibility to
disease
Cis and Trans Effects of Human Genomic Variants on Gene Expression
This work was funded by the Louis-Jeantet Foundation (http://www.jeantet.ch/), the European Research Council (Grant ID: 260927 http://erc.europa.eu/), the Swiss National Foundation (Grant ID: 130342 http://www.snf.ch), NCCR Frontiers In Genetics (http://www.frontiers-in-genetics.org), the UK Medical Research Council (http://www.mrc.ac.uk) and the Wellcome Trust (Grant ID: 092731).
Genome sequencing of the extinct Eurasian wild aurochs, Bos primigenius, illuminates the phylogeography and evolution of cattle
Background
Domestication of the now-extinct wild aurochs, Bos primigenius, gave rise to the two major domestic extant cattle taxa, B. taurus and B. indicus. While previous genetic studies have shed some light on the evolutionary relationships between European aurochs and modern cattle, important questions remain unanswered, including the phylogenetic status of aurochs, whether gene flow from aurochs into early domestic populations occurred, and which genomic regions were subject to selection processes during and after domestication. Here, we address these questions using whole-genome sequencing data generated from an approximately 6,750-year-old British aurochs bone and genome sequence data from 81 additional cattle plus genome-wide single nucleotide polymorphism data from a diverse panel of 1,225 modern animals.
Results
Phylogenomic analyses place the aurochs as a distinct outgroup to the domestic B. taurus lineage, supporting the predominant Near Eastern origin of European cattle. Conversely, traditional British and Irish breeds share more genetic variants with this aurochs specimen than other European populations, supporting localized gene flow from aurochs into the ancestors of modern British and Irish cattle, perhaps through purposeful restocking by early herders in Britain. Finally, the functions of genes showing evidence for positive selection in B. taurus are enriched for neurobiology, growth, metabolism and immunobiology, suggesting that these biological processes have been important in the domestication of cattle.
Conclusions
This work provides important new information regarding the origins and functional evolution of modern cattle, revealing that the interface between early European domestic populations and wild aurochs was significantly more complex than previously thought
Genome-wide analyses for personality traits identify six genomic loci and show correlations with psychiatric disorders
Personality is influenced by genetic and environmental factors1
and associated with mental health. However, the underlying
genetic determinants are largely unknown. We identified six
genetic loci, including five novel loci2,3, significantly associated
with personality traits in a meta-analysis of genome-wide
association studies (N = 123,132–260,861). Of these genomewide
significant loci, extraversion was associated with variants
in WSCD2 and near PCDH15, and neuroticism with variants
on chromosome 8p23.1 and in L3MBTL2. We performed a
principal component analysis to extract major dimensions
underlying genetic variations among five personality traits
and six psychiatric disorders (N = 5,422–18,759). The first
genetic dimension separated personality traits and psychiatric
disorders, except that neuroticism and openness to experience
were clustered with the disorders. High genetic correlations
were found between extraversion and attention-deficit–
hyperactivity disorder (ADHD) and between openness and
schizophrenia and bipolar disorder. The second genetic
dimension was closely aligned with extraversion–introversion
and grouped neuroticism with internalizing psychopathology
(e.g., depression or anxiety)
- …
