22 research outputs found

    Improving Risk Factor Identification of Human Complex Traits in Omics Data

    Get PDF
    With recent advances in various high throughput technologies, the rise of omics data offers a promise of personalized health care with its potential to expand both the depth and the width of the identification of risk factors that are associated with human complex traits. In genomics, the introduction of repeated measures and the increased sequencing depth provides an opportunity for deeper investigation of disease dynamics for patients. In transcriptomics, high throughput single-cell assays provide cellular level gene expression depicting cell-to-cell heterogeneity. The cell-level resolution of gene expression data brought the opportunities to promote our understanding of cell function, disease pathogenesis, and treatment response for more precise therapeutic development. Along with these advances are the challenges posed by the increasingly complicated data sets. In genomics, as repeated measures of phenotypes are crucial for understanding the onset of disease and its temporal pattern, longitudinal designs of omics data and phenotypes are being increasingly introduced. However, current statistical tests for longitudinal outcomes, especially for binary outcomes, depend heavily on the correct specification of the phenotype model. As many diseases are rare, efficient designs are commonly applied in epidemiological studies to recruit more cases. Despite the enhanced efficiency in the study sample, this non-random ascertainment sampling can be a major source of model misspecification that may lead to inflated type I error and/or power loss in the association analysis. In transcriptomics, the analysis of single-cell RNA-seq data is facing its particular challenges due to low library size, high noise level, and prevalent dropout events. The purpose of this dissertation is to provide the methodological foundation to tackle the aforementioned challenges. We first propose a set of retrospective association tests for the identification of genetic loci associated with longitudinal binary traits. These tests are robust to different types of phenotype model misspecification and ascertainment sampling design which is common in longitudinal cohorts. We then extend these retrospective tests to variant-set tests for genetic rare variants that have low detection power by incorporating the variance component test and burden test into the retrospective test framework. Finally, we present a novel gene-graph based imputation method to impute dropout events in single-cell transcriptomic data to recover true gene expression level by borrowing information from adjacent genes in the gene graph

    High-throughput functional analysis of autism genes in zebrafish identifies convergence in dopaminergic and neuroimmune pathways

    Get PDF
    Advancing from gene discovery in autism spectrum disorders (ASDs) to the identification of biologically relevant mechanisms remains a central challenge. Here, we perform parallel in vivo functional analysis of 10 ASD genes at the behavioral, structural, and circuit levels in zebrafish mutants, revealing both unique and overlapping effects of gene loss of function. Whole-brain mapping identifies the forebrain and cerebellum as the most significant contributors to brain size differences, while regions involved in sensory-motor control, particularly dopaminergic regions, are associated with altered baseline brain activity. Finally, we show a global increase in microglia resulting from ASD gene loss of function in select mutants, implicating neuroimmune dysfunction as a key pathway relevant to ASD biology

    Predicting In Vivo Anti-Hepatofibrotic Drug Efficacy Based on In Vitro High-Content Analysis

    Get PDF
    Background/Aims Many anti-fibrotic drugs with high in vitro efficacies fail to produce significant effects in vivo. The aim of this work is to use a statistical approach to design a numerical predictor that correlates better with in vivo outcomes. Methods High-content analysis (HCA) was performed with 49 drugs on hepatic stellate cells (HSCs) LX-2 stained with 10 fibrotic markers. ~0.3 billion feature values from all cells in >150,000 images were quantified to reflect the drug effects. A systematic literature search on the in vivo effects of all 49 drugs on hepatofibrotic rats yields 28 papers with histological scores. The in vivo and in vitro datasets were used to compute a single efficacy predictor (Epredict). Results We used in vivo data from one context (CCl4 rats with drug treatments) to optimize the computation of Epredict. This optimized relationship was independently validated using in vivo data from two different contexts (treatment of DMN rats and prevention of CCl4 induction). A linear in vitro-in vivo correlation was consistently observed in all the three contexts. We used Epredict values to cluster drugs according to efficacy; and found that high-efficacy drugs tended to target proliferation, apoptosis and contractility of HSCs. Conclusions The Epredict statistic, based on a prioritized combination of in vitro features, provides a better correlation between in vitro and in vivo drug response than any of the traditional in vitro markers considered.Institute of Bioengineering and Nanotechnology (Singapore)Singapore. Biomedical Research CouncilSingapore. Agency for Science, Technology and ResearchSingapore-MIT Alliance for Research and Technology Center (C-185-000-033-531)Janssen Cilag (R-185-000-182-592)Singapore-MIT Alliance Computational and Systems Biology Flagship Project (C-382-641-001-091)Mechanobiology Institute, Singapore (R-714-001-003-271

    G2S3: A gene graph-based imputation method for single-cell RNA sequencing data.

    No full text
    Single-cell RNA sequencing technology provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses in single-cell transcriptomic studies. We propose a new method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and ten existing imputation methods to eight single-cell transcriptomic datasets and compared their performance. Our results demonstrated that G2S3 has superior overall performance in recovering gene expression, identifying cell subtypes, reconstructing cell trajectories, identifying differentially expressed genes, and recovering gene regulatory and correlation relationships. Moreover, G2S3 is computationally efficient for imputation in large-scale single-cell transcriptomic datasets

    High Level of GMFG Correlated to Poor Clinical Outcome and Promoted Cell Migration and Invasion through EMT Pathway in Triple-Negative Breast Cancer

    No full text
    Triple-negative breast cancer (TNBC) has a very poor prognosis due to the disease’s lack of established targeted treatment options. Glia maturation factor γ (GMFG), a novel ADF/cofilin superfamily protein, has been reported to be differentially expressed in tumors, but its expression level in TNBC remains unknown. The question of whether GMFG correlates with the TNBC prognosis is also unclear. In this study, data from the Cancer Genome Atlas (TCGA), Clinical Proteomic Tumor Analysis Consortium (CPTAC), Human Protein Atlas (HPA), and Genotype-Tissue Expression (GTEx) databases were used to analyze the expression of GMFG in pan-cancer and the correlation between clinical factors. Gene Set Cancer Analysis (GSCA) and Gene Set Enrichment Analysis (GSEA) were also used to analyze the functional differences between the different expression levels and predict the downstream pathways. GMFG expression in breast cancer tissues, and its related biological functions, were further analyzed by immunohistochemistry (IHC), immunoblotting, RNAi, and function assay; we found that TNBC has a high expression of GMFG, and this higher expression was correlated with a poorer prognosis in TCGA and collected specimens of the TNBC. GMFG was also related to TNBC patients’ clinicopathological data, especially those with histological grade and axillary lymph node metastasis. In vitro, GMFG siRNA inhibited cell migration and invasion through the EMT pathway. The above data indicate that high expression of GMFG in TNBC is related to malignancy and that GMFG could be a biomarker for the detection of TNBC metastasis

    GDGTs-based quantitative reconstruction of water level changes and precipitation at Daye Lake, Qinling Mountains (central-east China), over the past 2000 years

    No full text
    Alpine lakes are natural rain gauges, and reconstructing changes in their water level is a key to understanding the regional hydrological environment, climate change and vegetation evolution. Precipitation in the northern and the southern parts of the eastern monsoon region of China exhibits a centennial scale inverse relationship over the past 2000 years; however, there is substantial uncertainty regarding the temporal range of this dipolar pattern. In order to better understand this north-south pattern of precipitation variation and its driving mechanism, we analyzed isoGDGTs biomarker compounds in a sediment core from alpine Daye Lake, in the Qinling Mountains, in the north-south climatic transition zone of eastern China. Measurements of %Cren were used to reconstruct changes in lake level over the past 2000 years. The results show that, from 240 to 1300 CE, prior to the Little Ice Age, the lake level changes were consistent with the precipitation record for the northern part of eastern China, with the lake reaching its highest level of 25 +/- 7.17 mat 555 CE; subsequently, the lake fell to its lowest level of 12 +/- 7.17 m at 1030 CE. During the Little Ice Age, the water level maintained an increasing trend, especially during the last three centuries, when it remained above 20 +/- 7.17 m, which is consistent with the precipitation record from southern China. The results indicate that the climatically transitional Qinling region has a complex history of climate change. During the early part of the record (240-130 0 CE), the level of Daye Lake and the East Asian summer monsoon precipitation were in phase, controlled mainly by the strength of the East Asian summer monsoon. In contrast, since the Little Ice Age (1300 CE to the present), under the influence of ENSO, the westward extension and southward retreat of the West Pacific Subtropical High caused the rain belt to shift southward, decreasing the water vapor supply to the Qinling Mountains. The ascent of moisture-bearing air over the Qinling Mountains resulted in orographic rainfall, while the weakening of evaporation during the Little Ice Age reduced the evaporation of water vapor and also contributed to the continued rise of the level of Daye Lake. The abundant precipitation in the Qinling region during the Little Ice Age provided water resources to sustain human activities in the downstream Weihe Plain, but was also a major cause of flooding. (C) 2021 Elsevier Ltd. All rights reserved

    Exploratory Study to Identify Radiomics Classifiers for Lung Cancer Histology

    Get PDF
    Contains fulltext : 172518.pdf (publisher's version ) (Open Access)BACKGROUND: Radiomics can quantify tumor phenotypic characteristics non-invasively by applying feature algorithms to medical imaging data. In this study of lung cancer patients, we investigated the association between radiomic features and the tumor histologic subtypes (adenocarcinoma and squamous cell carcinoma). Furthermore, in order to predict histologic subtypes, we employed machine-learning methods and independently evaluated their prediction performance. METHODS: Two independent radiomic cohorts with a combined size of 350 patients were included in our analysis. A total of 440 radiomic features were extracted from the segmented tumor volumes of pretreatment CT images. These radiomic features quantify tumor phenotypic characteristics on medical images using tumor shape and size, intensity statistics, and texture. Univariate analysis was performed to assess each feature's association with the histological subtypes. In our multivariate analysis, we investigated 24 feature selection methods and 3 classification methods for histology prediction. Multivariate models were trained on the training cohort and their performance was evaluated on the independent validation cohort using the area under ROC curve (AUC). Histology was determined from surgical specimen. RESULTS: In our univariate analysis, we observed that fifty-three radiomic features were significantly associated with tumor histology. In multivariate analysis, feature selection methods ReliefF and its variants showed higher prediction accuracy as compared to other methods. We found that Naive Baye's classifier outperforms other classifiers and achieved the highest AUC (0.72; p-value = 2.3 x 10(-7)) with five features: Stats_min, Wavelet_HLL_rlgl_lowGrayLevelRunEmphasis, Wavelet_HHL_stats_median, Wavelet_HLL_stats_skewness, and Wavelet_HLH_glcm_clusShade. CONCLUSION: Histological subtypes can influence the choice of a treatment/therapy for lung cancer patients. We observed that radiomic features show significant association with the lung tumor histology. Moreover, radiomics-based multivariate classifiers were independently validated for the prediction of histological subtypes. Despite achieving lower than optimal prediction accuracy (AUC 0.72), our analysis highlights the impressive potential of non-invasive and cost-effective radiomics for precision medicine. Further research in this direction could lead us to optimal performance and therefore to clinical applicability, which could enhance the efficiency and efficacy of cancer care

    Diagnostic accuracy of risk assessment and fecal immunochemical test in colorectal cancer screening: Results from a population‐based program and meta‐analysis

    No full text
    Abstract Background Fecal immunochemical test (FIT) is a commonly used initial test for colorectal cancer (CRC) screening. Parallel use of FIT with risk assessment (RA) could improve the detection of non‐bleeding lesions, but at the expense of compromising sensitivity. In this study, we evaluated the accuracy of FIT and/or RA in the Shanghai CRC screening program, and systematically reviewed the relevant evaluations worldwide. Methods RA and 2‐specimen FIT were used in parallel in the Shanghai screening program, followed by a colonoscopy among those with positive results. Sensitivity, specificity, detection rate of CRC, positive predictive value (PPV), and other measures with their 95% confident intervals were calculated for each type of tests and several assumed combined tests. We further searched PubMed, Embase, Web of Science, and Cochrane Library for relevant studies published in English up to January 5, 2022. Results By the end of 2019, a total of 1,901,360 participants of the screening program completed 3,045,108 tests, with 1,901,360 first‐time tests and 1,143,748 subsequent tests. Parallel use of RA and 2‐specimen FIT achieved a sensitivity of 0.78 (0.77–0.80), a specificity of 0.78 (0.78–0.78), PPV of 0.89% (0.86–0.92), and a detection rate of 1.99 (1.93–2.05) for CRC per 1000 among participants enrolled in the first screening round, and performed similarly among those who participated for several times. A meta‐analysis of 103 published observational studies demonstrated a higher sensitivity [0.76 (0.36, 0.94)] but a much lower specificity [0.59 (0.28, 0.85)] of parallel use of RA and FIT for detecting CRC in average‐risk populations than in our subjects. One‐specimen FIT, the most commonly used initial test, had a pooled specificity comparable to the Shanghai screening program (0.92 vs. 0.91), but a much higher pooled sensitivity (0.76 vs. 0.57). Conclusion Our results indicate the limitation of FIT only as an initial screening test for CRC in Chinese populations, and highlight the higher sensitivity of parallel use of RA and FIT. Attempts should be made to optimize RA to improve effectiveness of screening in the populations
    corecore