20 research outputs found

    Improving Risk Factor Identification of Human Complex Traits in Omics Data

    Get PDF
    With recent advances in various high throughput technologies, the rise of omics data offers a promise of personalized health care with its potential to expand both the depth and the width of the identification of risk factors that are associated with human complex traits. In genomics, the introduction of repeated measures and the increased sequencing depth provides an opportunity for deeper investigation of disease dynamics for patients. In transcriptomics, high throughput single-cell assays provide cellular level gene expression depicting cell-to-cell heterogeneity. The cell-level resolution of gene expression data brought the opportunities to promote our understanding of cell function, disease pathogenesis, and treatment response for more precise therapeutic development. Along with these advances are the challenges posed by the increasingly complicated data sets. In genomics, as repeated measures of phenotypes are crucial for understanding the onset of disease and its temporal pattern, longitudinal designs of omics data and phenotypes are being increasingly introduced. However, current statistical tests for longitudinal outcomes, especially for binary outcomes, depend heavily on the correct specification of the phenotype model. As many diseases are rare, efficient designs are commonly applied in epidemiological studies to recruit more cases. Despite the enhanced efficiency in the study sample, this non-random ascertainment sampling can be a major source of model misspecification that may lead to inflated type I error and/or power loss in the association analysis. In transcriptomics, the analysis of single-cell RNA-seq data is facing its particular challenges due to low library size, high noise level, and prevalent dropout events. The purpose of this dissertation is to provide the methodological foundation to tackle the aforementioned challenges. We first propose a set of retrospective association tests for the identification of genetic loci associated with longitudinal binary traits. These tests are robust to different types of phenotype model misspecification and ascertainment sampling design which is common in longitudinal cohorts. We then extend these retrospective tests to variant-set tests for genetic rare variants that have low detection power by incorporating the variance component test and burden test into the retrospective test framework. Finally, we present a novel gene-graph based imputation method to impute dropout events in single-cell transcriptomic data to recover true gene expression level by borrowing information from adjacent genes in the gene graph

    Predicting In Vivo Anti-Hepatofibrotic Drug Efficacy Based on In Vitro High-Content Analysis

    Get PDF
    Background/Aims Many anti-fibrotic drugs with high in vitro efficacies fail to produce significant effects in vivo. The aim of this work is to use a statistical approach to design a numerical predictor that correlates better with in vivo outcomes. Methods High-content analysis (HCA) was performed with 49 drugs on hepatic stellate cells (HSCs) LX-2 stained with 10 fibrotic markers. ~0.3 billion feature values from all cells in >150,000 images were quantified to reflect the drug effects. A systematic literature search on the in vivo effects of all 49 drugs on hepatofibrotic rats yields 28 papers with histological scores. The in vivo and in vitro datasets were used to compute a single efficacy predictor (Epredict). Results We used in vivo data from one context (CCl4 rats with drug treatments) to optimize the computation of Epredict. This optimized relationship was independently validated using in vivo data from two different contexts (treatment of DMN rats and prevention of CCl4 induction). A linear in vitro-in vivo correlation was consistently observed in all the three contexts. We used Epredict values to cluster drugs according to efficacy; and found that high-efficacy drugs tended to target proliferation, apoptosis and contractility of HSCs. Conclusions The Epredict statistic, based on a prioritized combination of in vitro features, provides a better correlation between in vitro and in vivo drug response than any of the traditional in vitro markers considered.Institute of Bioengineering and Nanotechnology (Singapore)Singapore. Biomedical Research CouncilSingapore. Agency for Science, Technology and ResearchSingapore-MIT Alliance for Research and Technology Center (C-185-000-033-531)Janssen Cilag (R-185-000-182-592)Singapore-MIT Alliance Computational and Systems Biology Flagship Project (C-382-641-001-091)Mechanobiology Institute, Singapore (R-714-001-003-271

    G2S3: A gene graph-based imputation method for single-cell RNA sequencing data.

    No full text
    Single-cell RNA sequencing technology provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses in single-cell transcriptomic studies. We propose a new method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and ten existing imputation methods to eight single-cell transcriptomic datasets and compared their performance. Our results demonstrated that G2S3 has superior overall performance in recovering gene expression, identifying cell subtypes, reconstructing cell trajectories, identifying differentially expressed genes, and recovering gene regulatory and correlation relationships. Moreover, G2S3 is computationally efficient for imputation in large-scale single-cell transcriptomic datasets

    High Level of GMFG Correlated to Poor Clinical Outcome and Promoted Cell Migration and Invasion through EMT Pathway in Triple-Negative Breast Cancer

    No full text
    Triple-negative breast cancer (TNBC) has a very poor prognosis due to the disease’s lack of established targeted treatment options. Glia maturation factor γ (GMFG), a novel ADF/cofilin superfamily protein, has been reported to be differentially expressed in tumors, but its expression level in TNBC remains unknown. The question of whether GMFG correlates with the TNBC prognosis is also unclear. In this study, data from the Cancer Genome Atlas (TCGA), Clinical Proteomic Tumor Analysis Consortium (CPTAC), Human Protein Atlas (HPA), and Genotype-Tissue Expression (GTEx) databases were used to analyze the expression of GMFG in pan-cancer and the correlation between clinical factors. Gene Set Cancer Analysis (GSCA) and Gene Set Enrichment Analysis (GSEA) were also used to analyze the functional differences between the different expression levels and predict the downstream pathways. GMFG expression in breast cancer tissues, and its related biological functions, were further analyzed by immunohistochemistry (IHC), immunoblotting, RNAi, and function assay; we found that TNBC has a high expression of GMFG, and this higher expression was correlated with a poorer prognosis in TCGA and collected specimens of the TNBC. GMFG was also related to TNBC patients’ clinicopathological data, especially those with histological grade and axillary lymph node metastasis. In vitro, GMFG siRNA inhibited cell migration and invasion through the EMT pathway. The above data indicate that high expression of GMFG in TNBC is related to malignancy and that GMFG could be a biomarker for the detection of TNBC metastasis

    GDGTs-based quantitative reconstruction of water level changes and precipitation at Daye Lake, Qinling Mountains (central-east China), over the past 2000 years

    No full text
    Alpine lakes are natural rain gauges, and reconstructing changes in their water level is a key to understanding the regional hydrological environment, climate change and vegetation evolution. Precipitation in the northern and the southern parts of the eastern monsoon region of China exhibits a centennial scale inverse relationship over the past 2000 years; however, there is substantial uncertainty regarding the temporal range of this dipolar pattern. In order to better understand this north-south pattern of precipitation variation and its driving mechanism, we analyzed isoGDGTs biomarker compounds in a sediment core from alpine Daye Lake, in the Qinling Mountains, in the north-south climatic transition zone of eastern China. Measurements of %Cren were used to reconstruct changes in lake level over the past 2000 years. The results show that, from 240 to 1300 CE, prior to the Little Ice Age, the lake level changes were consistent with the precipitation record for the northern part of eastern China, with the lake reaching its highest level of 25 +/- 7.17 mat 555 CE; subsequently, the lake fell to its lowest level of 12 +/- 7.17 m at 1030 CE. During the Little Ice Age, the water level maintained an increasing trend, especially during the last three centuries, when it remained above 20 +/- 7.17 m, which is consistent with the precipitation record from southern China. The results indicate that the climatically transitional Qinling region has a complex history of climate change. During the early part of the record (240-130 0 CE), the level of Daye Lake and the East Asian summer monsoon precipitation were in phase, controlled mainly by the strength of the East Asian summer monsoon. In contrast, since the Little Ice Age (1300 CE to the present), under the influence of ENSO, the westward extension and southward retreat of the West Pacific Subtropical High caused the rain belt to shift southward, decreasing the water vapor supply to the Qinling Mountains. The ascent of moisture-bearing air over the Qinling Mountains resulted in orographic rainfall, while the weakening of evaporation during the Little Ice Age reduced the evaporation of water vapor and also contributed to the continued rise of the level of Daye Lake. The abundant precipitation in the Qinling region during the Little Ice Age provided water resources to sustain human activities in the downstream Weihe Plain, but was also a major cause of flooding. (C) 2021 Elsevier Ltd. All rights reserved

    Lake eutrophication in northeast China induced by the recession of the East Asian summer monsoon

    No full text
    Lakes are one of the most important freshwater resources on Earth and they provide a wide range of ecosystem services. However, due to rapid economic development and the intensification of human activities, many lakes have become eutrophic, which may threaten their status as water resources. Human activities have played a significant role in lake eutrophication, but whether this role is independent of, or coupled with, natural climate change requires further study. We selected Dali Lake, a large lake affected by human activity within the ancient warfare borders, to clarify the ecological response of a lake to climate change and human activity. We used analyses of sedimentary n-alkanes and AMS C-14 dating to reconstruct the paleolimnological evolution of Dali Lake since 15 cal kyr BP, and specifically to assess the timing and causes of eutrophication. The results show that the short-chain n-alkanes (C17-19-alkanes) in Dali Lake are mainly produced by bacteria and algae within the lake, and that the sedimentary absolute abundance of short-chain n-alkanes (A(17-19)-alkanes) can be used as a proxy for assessing the ecological status of the lake. The ecological status of Dali Lake was the most stable during the early to middle Holocene, when the East Asian summer monsoon was strong, but bacterial and algal outbreaks occurred during three episodes of a weakened summer monsoon-the Older Dryas, Younger Dryas, and the Common Era-when the lake experienced different degrees of eutrophication. During the recession of the East Asian summer monsoon, the weakening of precipitation recharge of the lake led to a reduction in lake area and an increase in nutrient concentrations in the lake water, while aeolian dust input was an additional nutrient source, leading to bacterial and algal outbreaks. During the Common Era, lake eutrophication occurred in the context of both summer monsoon recession and enhanced human activities, but their combined effects did not lead to more intense lake eutrophication than was caused by monsoon recession during the Younger Dryas. We conclude that, although human activities have enhanced the eutrophication of Dali Lake, the reduction in lake size due to monsoon recession and the resulting increase in the salinity and nutrient concentration of the lake water, combined with increased aeolian inputs, were a more important trigger of lake eutrophication. (C) 2022 Elsevier Ltd. All rights reserved
    corecore