98 research outputs found

    RNA-seq data analysis pipeline

    Get PDF
    Kuna bioloogid viivad läbi suurel hulgal ülegenoomseid geeniekspressiooni eksperimente, on tekkinud vajadus töövoo jaoks, millega saaks töödelda ning analüüsida RNA-seq andmeid. Selline töövoog koosneb erinevatest arvutuslikest tööriistadest ning sisendfaili tüüpidest, mis teeb ühtse töövoo arenduse raskeks ülesandeks, kuid teeks teadlastele andmete analüüsi ja tulemuste tõlgendamise palju lihtsamaks. Kohandatud töövoogu on lihtsam rakendada, kuid see nõuab, et kasutaja oleks tuttav kõikide arvutuslike tööriistadega, millest töövoog koosneb. Käesoleva töö eesmärk oli kirjeldada detailselt RNA-seq andmete analüüsi töövoo loomist ning rakendamist. Saadud tulemustest võib järeldada, et ühtse töövoo tarkvara iRAP vajab veel edasiarendust. Lisaks sellele aitavad tulemused paremini mõista erinevate tööriistade funktsioonidest ning nende potentsiaalsetest parandustest.The vast amount of large-scale gene expression experiments carried out by biologists has created the need for a pipeline to process and analyse RNA-seq data. The pipeline consists of different computational tools and data input types which makes developing an integrated pipeline a challenging task but would make the use of the workflow much easier for researchers. A customized pipeline, on the other hand, is easier to implement but needs the user to be familiar with all of the computational tools that the pipeline consists of. The aim of this thesis was to provide good knowledge on creating and running a typical RNA-seq data analyis pipeline. The results obtained allow to conclude that the integrated pipeline iRAP still needs development. Also, the results create a better understanding of the functions and potential improvements of different tools

    Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche.

    Get PDF
    Age at menarche is a marker of timing of puberty in females. It varies widely between individuals, is a heritable trait and is associated with risks for obesity, type 2 diabetes, cardiovascular disease, breast cancer and all-cause mortality. Studies of rare human disorders of puberty and animal models point to a complex hypothalamic-pituitary-hormonal regulation, but the mechanisms that determine pubertal timing and underlie its links to disease risk remain unclear. Here, using genome-wide and custom-genotyping arrays in up to 182,416 women of European descent from 57 studies, we found robust evidence (P < 5 × 10(-8)) for 123 signals at 106 genomic loci associated with age at menarche. Many loci were associated with other pubertal traits in both sexes, and there was substantial overlap with genes implicated in body mass index and various diseases, including rare disorders of puberty. Menarche signals were enriched in imprinted regions, with three loci (DLK1-WDR25, MKRN3-MAGEL2 and KCNK9) demonstrating parent-of-origin-specific associations concordant with known parental expression patterns. Pathway analyses implicated nuclear hormone receptors, particularly retinoic acid and γ-aminobutyric acid-B2 receptor signalling, among novel mechanisms that regulate pubertal timing in humans. Our findings suggest a genetic architecture involving at least hundreds of common variants in the coordinated timing of the pubertal transition

    Hundreds of variants clustered in genomic loci and biological pathways affect human height

    Get PDF
    Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

    Rare coding variants and X-linked loci associated with age at menarche.

    Get PDF
    More than 100 loci have been identified for age at menarche by genome-wide association studies; however, collectively these explain only ∼3% of the trait variance. Here we test two overlooked sources of variation in 192,974 European ancestry women: low-frequency protein-coding variants and X-chromosome variants. Five missense/nonsense variants (in ALMS1/LAMB2/TNRC6A/TACR3/PRKAG1) are associated with age at menarche (minor allele frequencies 0.08-4.6%; effect sizes 0.08-1.25 years per allele; P<5 × 10(-8)). In addition, we identify common X-chromosome loci at IGSF1 (rs762080, P=9.4 × 10(-13)) and FAAH2 (rs5914101, P=4.9 × 10(-10)). Highlighted genes implicate cellular energy homeostasis, post-transcriptional gene silencing and fatty-acid amide signalling. A frequently reported mutation in TACR3 for idiopathic hypogonatrophic hypogonadism (p.W275X) is associated with 1.25-year-later menarche (P=2.8 × 10(-11)), illustrating the utility of population studies to estimate the penetrance of reportedly pathogenic mutations. Collectively, these novel variants explain ∼0.5% variance, indicating that these overlooked sources of variation do not substantially explain the 'missing heritability' of this complex trait.UK sponsors (see article for overseas ones): This work made use of data and samples generated by the 1958 Birth Cohort (NCDS). Access to these resources was enabled via the 58READIE Project funded by Wellcome Trust and Medical Research Council (grant numbers WT095219MA and G1001799). A full list of the financial, institutional and personal contributions to the development of the 1958 Birth Cohort Biomedical resource is available at http://www2.le.ac.uk/projects/birthcohort. Genotyping was undertaken as part of the Wellcome Trust Case-Control Consortium (WTCCC) under Wellcome Trust award 076113, and a full list of the investigators who contributed to the generation of the data is available at www.wtccc.org.uk ... The Fenland Study is funded by the Wellcome Trust and the Medical Research Council, as well as by the Support for Science Funding programme and CamStrad. ... SIBS - CRUK ref: C1287/A8459 SEARCH - CRUK ref: A490/A10124 EMBRACE is supported by Cancer Research UK Grants C1287/A10118, C1287/A16563 and C1287/A17523. Genotyping was supported by Cancer Research - UK grant C12292/A11174D and C8197/A16565. Gareth Evans and Fiona Lalloo are supported by an NIHR grant to the Biomedical Research Centre, Manchester. The Investigators at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust are supported by an NIHR grant to the Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. Ros Eeles and Elizabeth Bancroft are supported by Cancer Research UK Grant C5047/A8385. ... Generation Scotland - Scottish Executive Health Department, Chief Scientist Office, grant number CZD/16/6. Exome array genotyping for GS:SFHS was funded by the Medical Research Council UK. 23andMe - This work was supported in part by NIH Award 2R44HG006981-02 from the National Human Genome Research Institute.This is the final version of the article. It first appeared from NPG via http://dx.doi.org/10.1038/ncomms875

    Platelet-Related Variants Identified by Exomechip Meta-analysis in 157,293 Individuals

    Get PDF
    Platelet production, maintenance, and clearance are tightly controlled processes indicative of platelets important roles in hemostasis and thrombosis. Platelets are common targets for primary and secondary prevention of several conditions. They are monitored clinically by complete blood counts, specifically with measurements of platelet count (PLT) and mean platelet volume (MPV). Identifying genetic effects on PLT and MPV can provide mechanistic insights into platelet biology and their role in disease. Therefore, we formed the Blood Cell Consortium (BCX) to perform a large-scale meta-analysis of Exomechip association results for PLT and MPV in 157,293 and 57,617 individuals, respectively. Using the low-frequency/rare coding variant-enriched Exomechip genotyping array, we sought to identify genetic variants associated with PLT and MPV. In addition to confirming 47 known PLT and 20 known MPV associations, we identified 32 PLT and 18 MPV associations not previously observed in the literature across the allele frequency spectrum, including rare large effect (FCER1A), low-frequency (IQGAP2, MAP1A, LY75), and common (ZMIZ2, SMG6, PEAR1, ARFGAP3/PACSIN2) variants. Several variants associated with PLT/MPV (PEAR1, MRVI1, PTGES3) were also associated with platelet reactivity. In concurrent BCX analyses, there was overlap of platelet-associated variants with red (MAP1A, TMPRSS6, ZMIZ2) and white (PEAR1, ZMIZ2, LY75) blood cell traits, suggesting common regulatory pathways with shared genetic architecture among these hematopoietic lineages. Our large-scale Exomechip analyses identified previously undocumented associations with platelet traits and further indicate that several complex quantitative hematological, lipid, and cardiovascular traits share genetic factors

    Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function

    Get PDF
    In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10−9) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10−4-2.2 × 10−7. Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in genera

    The genetic basis of endometriosis and comorbidity with other pain and inflammatory conditions

    Get PDF
    Endometriosis is a common condition associated with debilitating pelvic pain and infertility. A genome-wide association study meta-analysis, including 60,674 cases and 701,926 controls of European and East Asian descent, identified 42 genome-wide significant loci comprising 49 distinct association signals. Effect sizes were largest for stage 3/4 disease, driven by ovarian endometriosis. Identified signals explained up to 5.01% of disease variance and regulated expression or methylation of genes in endometrium and blood, many of which were associated with pain perception/maintenance (SRP14/BMF, GDAP1, MLLT10, BSN and NGF). We observed significant genetic correlations between endometriosis and 11 pain conditions, including migraine, back and multisite chronic pain (MCP), as well as inflammatory conditions, including asthma and osteoarthritis. Multitrait genetic analyses identified substantial sharing of variants associated with endometriosis and MCP/migraine. Targeted investigations of genetically regulated mechanisms shared between endometriosis and other pain conditions are needed to aid the development of new treatments and facilitate early symptomatic intervention

    Multi-ancestry sleep-by-SNP interaction analysis in 126,926 individuals reveals lipid loci stratified by sleep duration.

    Get PDF
    Both short and long sleep are associated with an adverse lipid profile, likely through different biological pathways. To elucidate the biology of sleep-associated adverse lipid profile, we conduct multi-ancestry genome-wide sleep-SNP interaction analyses on three lipid traits (HDL-c, LDL-c and triglycerides). In the total study sample (discovery + replication) of 126,926 individuals from 5 different ancestry groups, when considering either long or short total sleep time interactions in joint analyses, we identify 49 previously unreported lipid loci, and 10 additional previously unreported lipid loci in a restricted sample of European-ancestry cohorts. In addition, we identify new gene-sleep interactions for known lipid loci such as LPL and PCSK9. The previously unreported lipid loci have a modest explained variance in lipid levels: most notable, gene-short-sleep interactions explain 4.25% of the variance in triglyceride level. Collectively, these findings contribute to our understanding of the biological mechanisms involved in sleep-associated adverse lipid profiles

    The genetic basis of endometriosis and comorbidity with other pain and inflammatory conditions

    Get PDF
    Endometriosis is a common condition associated with debilitating pelvic pain and infertility. A genome-wide association study meta-analysis, including 60,674 cases and 701,926 controls of European and East Asian descent, identified 42 genome-wide significant loci comprising 49 distinct association signals. Effect sizes were largest for stage 3/4 disease, driven by ovarian endometriosis. Identified signals explained up to 5.01% of disease variance and regulated expression or methylation of genes in endometrium and blood, many of which were associated with pain perception/maintenance (SRP14/BMF, GDAP1, MLLT10, BSN and NGF). We observed significant genetic correlations between endometriosis and 11 pain conditions, including migraine, back and multisite chronic pain (MCP), as well as inflammatory conditions, including asthma and osteoarthritis. Multitrait genetic analyses identified substantial sharing of variants associated with endometriosis and MCP/migraine. Targeted investigations of genetically regulated mechanisms shared between endometriosis and other pain conditions are needed to aid the development of new treatments and facilitate early symptomatic intervention

    Genome-wide association and functional follow-up reveals new loci for kidney function

    Get PDF
    Chronic kidney disease (CKD) is an important public health problem with a genetic component. We performed genome-wide association studies in up to 130,600 European ancestry participants overall, and stratified for key CKD risk factors. We uncovered 6 new loci in association with estimated glomerular filtration rate (eGFR), the primary clinical measure of CKD, in or near MPPED2, DDX1, SLC47A1, CDK12, CASP9, and INO80. Morpholino knockdown of mpped2 and casp9 in zebrafish embryos revealed podocyte and tubular abnormalities with altered dextran clearance, suggesting a role for these genes in renal function. By providing new insights into genes that regulate renal function, these results could further our understanding of the pathogenesis of CKD
    corecore