318 research outputs found
Recommended from our members
Identification of autoimmune gene signatures in autism
The role of the immune system in neuropsychiatric diseases, including autism spectrum disorder (ASD), has long been hypothesized. This hypothesis has mainly been supported by family cohort studies and the immunological abnormalities found in ASD patients, but had limited findings in genetic association testing. Two cross-disorder genetic association tests were performed on the genome-wide data sets of ASD and six autoimmune disorders. In the polygenic score test, we examined whether ASD risk alleles with low effect sizes work collectively in specific autoimmune disorders and show significant association statistics. In the genetic variation score test, we tested whether allele-specific associations between ASD and autoimmune disorders can be found using nominally significant single-nucleotide polymorphisms. In both tests, we found that ASD is probabilistically linked to ankylosing spondylitis (AS) and multiple sclerosis (MS). Association coefficients showed that ASD and AS were positively associated, meaning that autism susceptibility alleles may have a similar collective effect in AS. The association coefficients were negative between ASD and MS. Significant associations between ASD and two autoimmune disorders were identified. This genetic association supports the idea that specific immunological abnormalities may underlie the etiology of autism, at least in a number of cases
Human disease classification in the postgenomic era: A complex systems approach to human pathobiology
Contemporary classification of human disease derives from observational correlation between pathological analysis and clinical syndromes. Characterizing disease in this way established a nosology that has served clinicians well to the current time, and depends on observational skills and simple laboratory tools to define the syndromic phenotype. Yet, this time-honored diagnostic strategy has significant shortcomings that reflect both a lack of sensitivity in identifying preclinical disease, and a lack of specificity in defining disease unequivocally. In this paper, we focus on the latter limitation, viewing it as a reflection both of the different clinical presentations of many diseases (variable phenotypic expression), and of the excessive reliance on Cartesian reductionism in establishing diagnoses. The purpose of this perspective is to provide a logical basis for a new approach to classifying human disease that uses conventional reductionism and incorporates the non-reductionist approach of systems biomedicine
Making sense out of massive data by going beyond differential expression
With the rapid growth of publicly available high-throughput transcriptomic data, there is increasing recognition that large sets of such data can be mined to better understand disease states and mechanisms. Prior gene expression analyses, both large and small, have been dichotomous in nature, in which phenotypes are compared using clearly defined controls. Such approaches may require arbitrary decisions about what are considered “normal” phenotypes, and what each phenotype should be compared to. Instead, we adopt a holistic approach in which we characterize phenotypes in the context of a myriad of tissues and diseases. We introduce scalable methods that associate expression patterns to phenotypes in order both to assign phenotype labels to new expression samples and to select phenotypically meaningful gene signatures. By using a nonparametric statistical approach, we identify signatures that are more precise than those from existing approaches and accurately reveal biological processes that are hidden in case vs. control studies. Employing a comprehensive perspective on expression, we show how metastasized tumor samples localize in the vicinity of the primary site counterparts and are overenriched for those phenotype labels. We find that our approach provides insights into the biological processes that underlie differences between tissues and diseases beyond those identified by traditional differential expression analyses. Finally, we provide an online resource (http://concordia.csail.mit.edu) for mapping users’ gene expression samples onto the expression landscape of tissue and disease
Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services
Objective
Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies.
Methods
We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset.
Results
Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics.
Conclusions
We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost
Peripheral blood gene expression signature differentiates children with autism from unaffected siblings
Autism spectrum disorder (ASD) is one of the most prevalent neurodevelopmental disorders with high heritability, yet a majority of genetic contribution to pathophysiology is not known. Siblings of individuals with ASD are at increased risk for ASD and autistic traits, but the genetic contribution for simplex families is estimated to be less when compared to multiplex families. To explore the genomic (dis-) similarity between proband and unaffected sibling in simplex families, we used genome-wide gene expression profiles of blood from 20 proband-unaffected sibling pairs and 18 unrelated control individuals. The global gene expression profiles of unaffected siblings were more similar to those from probands as they shared genetic and environmental background. A total of 189 genes were significantly differentially expressed between proband-sib pairs (nominal p < 0.01) after controlling for age, sex, and family effects. Probands and siblings were distinguished into two groups by cluster analysis with these genes. Overall, unaffected siblings were equally distant from the centroid of probands and from that of unrelated controls with the differentially expressed genes. Interestingly, five of 20 siblings had gene expression profiles that were more similar to unrelated controls than to their matched probands. In summary, we found a set of genes that distinguished probands from the unaffected siblings, and a subgroup of unaffected siblings who were more similar to probands. The pathways that characterized probands compared to siblings using peripheral blood gene expression profiles were the up-regulation of ribosomal, spliceosomal, and mitochondrial pathways, and the down-regulation of neuroreceptor-ligand, immune response and calcium signaling pathways. Further integrative study with structural genetic variations such as de novo mutations, rare variants, and copy number variations would clarify whether these transcriptomic changes are structural or environmental in origin.Simons Foundatio
Recommended from our members
HD CAGnome: A Search Tool for Huntingtin CAG Repeat Length-Correlated Genes
Background: The length of the huntingtin (HTT) CAG repeat is strongly correlated with both age at onset of Huntington’s disease (HD) symptoms and age at death of HD patients. Dichotomous analysis comparing HD to controls is widely used to study the effects of HTT CAG repeat expansion. However, a potentially more powerful approach is a continuous analysis strategy that takes advantage of all of the different CAG lengths, to capture effects that are expected to be critical to HD pathogenesis. Methodology/Principal Findings We used continuous and dichotomous approaches to analyze microarray gene expression data from 107 human control and HD lymphoblastoid cell lines. Of all probes found to be significant in a continuous analysis by CAG length, only 21.4% were so identified by a dichotomous comparison of HD versus controls. Moreover, of probes significant by dichotomous analysis, only 33.2% were also significant in the continuous analysis. Simulations revealed that the dichotomous approach would require substantially more than 107 samples to either detect 80% of the CAG-length correlated changes revealed by continuous analysis or to reduce the rate of significant differences that are not CAG length-correlated to 20% (n = 133 or n = 206, respectively). Given the superior power of the continuous approach, we calculated the correlation structure between HTT CAG repeat lengths and gene expression levels and created a freely available searchable website, “HD CAGnome,” that allows users to examine continuous relationships between HTT CAG and expression levels of ∼20,000 human genes. Conclusions/Significance: Our results reveal limitations of dichotomous approaches compared to the power of continuous analysis to study a disease where human genotype-phenotype relationships strongly support a role for a continuum of CAG length-dependent changes. The compendium of HTT CAG length-gene expression level relationships found at the HD CAGnome now provides convenient routes for discovery of candidates influenced by the HD mutation
Integration of heterogeneous expression data sets extends the role of the retinol pathway in diabetes and insulin resistance
Motivation: Type 2 diabetes is a chronic metabolic disease that involves both environmental and genetic factors. To understand the genetics of type 2 diabetes and insulin resistance, the DIabetes Genome Anatomy Project (DGAP) was launched to profile gene expression in a variety of related animal models and human subjects. We asked whether these heterogeneous models can be integrated to provide consistent and robust biological insights into the biology of insulin resistance
Integration of heterogeneous expression data sets extends the role of the retinol pathway in diabetes and insulin resistance
Motivation: Type 2 diabetes is a chronic metabolic disease that involves both environmental and genetic factors. To understand the genetics of type 2 diabetes and insulin resistance, the DIabetes Genome Anatomy Project (DGAP) was launched to profile gene expression in a variety of related animal models and human subjects. We asked whether these heterogeneous models can be integrated to provide consistent and robust biological insights into the biology of insulin resistance
A Practical Platform for Blood Biomarker Study by Using Global Gene Expression Profiling of Peripheral Whole Blood
Background: Although microarray technology has become the most common method for studying global gene expression, a plethora of technical factors across the experiment contribute to the variable of genome gene expression profiling using peripheral whole blood. A practical platform needs to be established in order to obtain reliable and reproducible data to meet clinical requirements for biomarker study. Methods and Findings: We applied peripheral whole blood samples with globin reduction and performed genome-wide transcriptome analysis using Illumina BeadChips. Real-time PCR was subsequently used to evaluate the quality of array data and elucidate the mode in which hemoglobin interferes in gene expression profiling. We demonstrated that, when applied in the context of standard microarray processing procedures, globin reduction results in a consistent and significant increase in the quality of beadarray data. When compared to their pre-globin reduction counterparts, post-globin reduction samples show improved detection statistics, lowered variance and increased sensitivity. More importantly, gender gene separation is remarkably clearer in post-globin reduction samples than in pre-globin reduction samples. Our study suggests that the poor data obtained from pre-globin reduction samples is the result of the high concentration of hemoglobin derived from red blood cells either interfering with target mRNA binding or giving the pseudo binding background signal. Conclusion: We therefore recommend the combination of performing globin mRNA reduction in peripheral whole blood samples and hybridizing on Illumina BeadChips as the practical approach for biomarker study
- …