36 research outputs found
Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services
ABSTRACT We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads
Dissecting the Shared Genetic Architecture of Suicide Attempt, Psychiatric Disorders, and Known Risk Factors
Background Suicide is a leading cause of death worldwide, and nonfatal suicide attempts, which occur far more frequently, are a major source of disability and social and economic burden. Both have substantial genetic etiology, which is partially shared and partially distinct from that of related psychiatric disorders. Methods We conducted a genome-wide association study (GWAS) of 29,782 suicide attempt (SA) cases and 519,961 controls in the International Suicide Genetics Consortium (ISGC). The GWAS of SA was conditioned on psychiatric disorders using GWAS summary statistics via multitrait-based conditional and joint analysis, to remove genetic effects on SA mediated by psychiatric disorders. We investigated the shared and divergent genetic architectures of SA, psychiatric disorders, and other known risk factors. Results Two loci reached genome-wide significance for SA: the major histocompatibility complex and an intergenic locus on chromosome 7, the latter of which remained associated with SA after conditioning on psychiatric disorders and replicated in an independent cohort from the Million Veteran Program. This locus has been implicated in risk-taking behavior, smoking, and insomnia. SA showed strong genetic correlation with psychiatric disorders, particularly major depression, and also with smoking, pain, risk-taking behavior, sleep disturbances, lower educational attainment, reproductive traits, lower socioeconomic status, and poorer general health. After conditioning on psychiatric disorders, the genetic correlations between SA and psychiatric disorders decreased, whereas those with nonpsychiatric traits remained largely unchanged. Conclusions Our results identify a risk locus that contributes more strongly to SA than other phenotypes and suggest a shared underlying biology between SA and known risk factors that is not mediated by psychiatric disorders.Peer reviewe
The Case for Optimized Edge-Centric Tractography at Scale.
The anatomic validity of structural connectomes remains a significant uncertainty in neuroimaging. Edge-centric tractography reconstructs streamlines in bundles between each pair of cortical or subcortical regions. Although edge bundles provides a stronger anatomic embedding than traditional connectomes, calculating them for each region-pair requires exponentially greater computation. We observe that major speedup can be achieved by reducing the number of streamlines used by probabilistic tractography algorithms. To ensure this does not degrade connectome quality, we calculate the identifiability of edge-centric connectomes between test and re-test sessions as a proxy for information content. We find that running PROBTRACKX2 with as few as 1 streamline per voxel per region-pair has no significant impact on identifiability. Variation in identifiability caused by streamline count is overshadowed by variation due to subject demographics. This finding even holds true in an entirely different tractography algorithm using MRTrix. Incidentally, we observe that Jaccard similarity is more effective than Pearson correlation in calculating identifiability for our subject population
FAIR for AI: An interdisciplinary and international community building perspective
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the FAIR for AI Workshop held at Argonne National Laboratory on June 7, 2022
Recommended from our members
Genome-wide association study identifies four pan-ancestry loci for suicidal ideation in the Million Veterans Program
Suicidal ideation (SI) often precedes and predicts suicide attempt and death, is the most common suicidal phenotype and is over-represented in veterans. The genetic architecture of SI in the absence of suicide attempt (SA) is unknown, yet believed to have distinct and overlapping risk with other suicidal behaviors. We performed the first GWAS of SI without SA in the Million Veteran Program (MVP), identifying 99,814 SI cases from electronic health records without a history of SA or suicide death (SD) and 512,567 controls without SI, SA or SD. GWAS was performed separately in the four largest ancestry groups, controlling for sex, age and genetic substructure. Ancestry-specific results were combined via meta-analysis to identify pan-ancestry loci. Four genome-wide significant (GWS) loci were identified in the pan-ancestry meta-analysis with loci on chromosomes 6 and 9 associated with suicide attempt in an independent sample. Pan-ancestry gene-based analysis identified GWS associations with DRD2, DCC, FBXL19, BCL7C, CTF1, ANNK1, and EXD3. Gene-set analysis implicated synaptic and startle response pathways (q's<0.05). European ancestry (EA) analysis identified GWS loci on chromosomes 6 and 9, as well as GWS gene associations in EXD3, DRD2, and DCC. No other ancestry-specific GWS results were identified, underscoring the need to increase representation of diverse individuals. The genetic correlation of SI and SA within MVP was high (rG = 0.87; p = 1.09e-50), as well as with post-traumatic stress disorder (PTSD; rG = 0.78; p = 1.98e-95) and major depressive disorder (MDD; rG = 0.78; p = 8.33e-83). Conditional analysis on PTSD and MDD attenuated most pan-ancestry and EA GWS signals for SI without SA to nominal significance, with the exception of EXD3 which remained GWS. Our novel findings support a polygenic and complex architecture for SI without SA which is largely shared with SA and overlaps with psychiatric conditions frequently comorbid with suicidal behaviors
Recommended from our members
A genome-wide association study of suicide attempts in the million veterans program identifies evidence of pan-ancestry and ancestry-specific risk loci
To identify pan-ancestry and ancestry-specific loci associated with attempting suicide among veterans, we conducted a genome-wide association study (GWAS) of suicide attempts within a large, multi-ancestry cohort of U.S. veterans enrolled in the Million Veterans Program (MVP). Cases were defined as veterans with a documented history of suicide attempts in the electronic health record (EHR; Nâ=â14,089) and controls were defined as veterans with no documented history of suicidal thoughts or behaviors in the EHR (Nâ=â395,064). GWAS was performed separately in each ancestry group, controlling for sex, age and genetic substructure. Pan-ancestry risk loci were identified through meta-analysis and included two genome-wide significant loci on chromosomes 20 (pâ=â3.64âĂâ10
) and 1 (pâ=â3.69âĂâ10
). A strong pan-ancestry signal at the Dopamine Receptor D2 locus (pâ=â1.77âĂâ10
) was also identified and subsequently replicated in a large, independent international civilian cohort (pâ=â7.97âĂâ10
). Additionally, ancestry-specific genome-wide significant loci were also detected in African-Americans, European-Americans, Asian-Americans, and Hispanic-Americans. Pathway analyses suggested over-representation of many biological pathways with high clinical significance, including oxytocin signaling, glutamatergic synapse, cortisol synthesis and secretion, dopaminergic synapse, and circadian rhythm. These findings confirm that the genetic architecture underlying suicide attempt risk is complex and includes both pan-ancestry and ancestry-specific risk loci. Moreover, pathway analyses suggested many commonly impacted biological pathways that could inform development of improved therapeutics for suicide prevention