50 research outputs found
Hidden Semi Markov Models for Multiple Observation Sequences: The mhsmm Package for R
This paper describes the R package mhsmm which implements estimation and prediction methods for hidden Markov and semi-Markov models for multiple observation sequences. Such techniques are of interest when observed data is thought to be dependent on some unobserved (or hidden) state. Hidden Markov models only allow a geometrically distributed sojourn time in a given state, while hidden semi-Markov models extend this by allowing an arbitrary sojourn distribution. We demonstrate the software with simulation examples and an application involving the modelling of the ovarian cycle of dairy cows.
Hidden Semi Markov Models for Multiple Observation Sequences: The mhsmm Package for R
This paper describes the R package mhsmm which implements estimation and prediction methods for hidden Markov and semi-Markov models for multiple observation sequences. Such techniques are of interest when observed data is thought to be dependent on some unobserved (or hidden) state. Hidden Markov models only allow a geometrically distributed sojourn time in a given state, while hidden semi-Markov models extend this by allowing an arbitrary sojourn distribution. We demonstrate the software with simulation examples and an application involving the modelling of the ovarian cycle of dairy cows
Multicohort analysis of the maternal age effect on recombination
Several studies have reported that the number of crossovers increases with maternal age in humans, but others have found the opposite. Resolving the true effect has implications for understanding the maternal age effect on aneuploidies. Here, we revisit this question in the largest sample to date using single nucleotide polymorphism (SNP)-chip data, comprising over 6,000 meioses from nine cohorts. We develop and fit a hierarchical model to allow for differences between cohorts and between mothers. We estimate that over 10 years, the expected number of maternal crossovers increases by 2.1% (95% credible interval (0.98%, 3.3%)). Our results are not consistent with the larger positive and negative effects previously reported in smaller cohorts. We see heterogeneity between cohorts that is likely due to chance effects in smaller samples, or possibly to confounders, emphasizing that care should be taken when interpreting results from any specific cohort about the effect of maternal age on recombination
Pyruvate metabolism controls chromatin remodeling during CD4+ T cell activation
Upon antigen-specific T cell receptor (TCR) engagement, human CD4 + T cells proliferate and differentiate, a process associated with rapid transcriptional changes and metabolic reprogramming. Here, we show that the generation of extramitochondrial pyruvate is an important step for acetyl-CoA production and subsequent H3K27ac-mediated remodeling of histone acetylation. Histone modification, transcriptomic, and carbon tracing analyses of pyruvate dehydrogenase (PDH)-deficient T cells show PDH-dependent acetyl-CoA generation as a rate-limiting step during T activation. Furthermore, T cell activation results in the nuclear translocation of PDH and its association with both the p300 acetyltransferase and histone H3K27ac. These data support the tight integration of metabolic and histone-modifying enzymes, allowing metabolic reprogramming to fuel CD4 + T cell activation. Targeting this pathway may provide a therapeutic approach to specifically regulate antigen-driven T cell activation
Genome-wide association study of {REM} sleep behavior disorder identifies polygenic risk and brain expression effects
AbstractRapid-eye movement (REM) sleep behavior disorder (RBD), enactment of dreams during REM sleep, is an early clinical symptom of alpha-synucleinopathies and defines a more severe subtype. The genetic background of RBD and its underlying mechanisms are not well understood. Here, we perform a genome-wide association study of RBD, identifying five RBD risk loci near SNCA, GBA, TMEM175, INPP5F, and SCARB2. Expression analyses highlight SNCA-AS1 and potentially SCARB2 differential expression in different brain regions in RBD, with SNCA-AS1 further supported by colocalization analyses. Polygenic risk score, pathway analysis, and genetic correlations provide further insights into RBD genetics, highlighting RBD as a unique alpha-synucleinopathy subpopulation that will allow future early intervention
A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness
Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics
Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics
Background: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African
countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research
consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to
study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building
bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous
African computing environments. Processing and analysis of genomic data is an example of a big data application
requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and
secondary input data through several computationally-intensive processing steps using different software packages,
where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and
easy-to-use workflows is particularly challenging.
Results: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing
data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide
association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in
August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the
workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the
workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for
use by members of the H3Africa consortium and the international research community.
Conclusion: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and
high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing
protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use.
All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective
needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa
and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network
The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease
Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.</p
Multi-ancestry genome-wide association meta-analysis of Parkinson?s disease
Although over 90 independent risk variants have been identified for Parkinson’s disease using genome-wide association studies, most studies have been performed in just one population at a time. Here we performed a large-scale multi-ancestry meta-analysis of Parkinson’s disease with 49,049 cases, 18,785 proxy cases and 2,458,063 controls including individuals of European, East Asian, Latin American and African ancestry. In a meta-analysis, we identified 78 independent genome-wide significant loci, including 12 potentially novel loci (MTF2, PIK3CA, ADD1, SYBU, IRS2, USP8, PIGL, FASN, MYLK2, USP25, EP300 and PPP6R2) and fine-mapped 6 putative causal variants at 6 known PD loci. By combining our results with publicly available eQTL data, we identified 25 putative risk genes in these novel loci whose expression is associated with PD risk. This work lays the groundwork for future efforts aimed at identifying PD loci in non-European populations
Hidden Semi Markov Models for Multiple Observation Sequences: The mhsmm
This paper describes the R package mhsmm which implements estimation and prediction methods for hidden Markov and semi-Markov models for multiple observation sequences. Such techniques are of interest when observed data is thought to be dependent on some unobserved (or hidden) state. Hidden Markov models only allow a geometrically distributed sojourn time in a given state, while hidden semi-Markov models extend this by allowing an arbitrary sojourn distribution. We demonstrate the software with simulation examples and an application involving the modelling of the ovarian cycle of dairy cows