80 research outputs found

    Methods for High Dimensional Inferences With Applications in Genomics

    Get PDF
    In this dissertation, I have developed several high dimensional inferences and computational methods motivated by problems in genomics studies. It consists of two parts. The first part is motivated by analysis of data from genome-wide association studies (GWAS), where I have developed an optimal false discovery rate (FDR) con- trolling method for high dimensional dependent data. For short-ranged dependent data, I have shown that the marginal plug-in procedure has the optimal property in controlling the FDR and minimizing the false non-discovery rate (FNR). When applied to analysis of the neuroblastoma GWAS data, this procedure identified six more disease-associated variants compared to previous p-value based procedures such as the Benjamini and Hochberg procedure. I have further investigated the statistical issue of sparse signal recovery in the setting of GWAS and developed a rigorous procedure for sample size and power analysis in the framework of FDR and FNR for GWAS. In addition, I have characterized the almost complete discovery boundary in terms of signal strength and non-null proportion and developed a procedure to achieve the almost complete recovery of the signals. The second part of my dissertation was motivated by gene regulation network construction based on the genetical genomics data (eQTL). I have developed a sparse high dimensional multivariate regression model for studying the conditional independent relationships among a set of genes adjusting for possible genetic effects, as well as the genetic architecture that influences the gene expression. I have developed a covariate adjusted precision matrix estimation method (CAPME), which can be easily implemented by linear programming. Asymptotic convergence rates and sign consistency are established for the estimators of the regression coefficients and the precision matrix. Numerical performance of the estimator was investigated using both simulated and real data sets. Simulation results have shown that the CAPME resulted in great improvements in both estimation and graph structure selection. I have applied the CAPME to analysis of a yeast eQTL data in order to identify the gene regulatory network among a set of genes in the MAPK signaling pathway. Finally, I have also made the R software package CAPME based on my dissertation work

    Implementation of quasi-least squares With the R package qlspack

    Get PDF
    Quasi-least squares (QLS) is an alternative method for estimating the correlation parameters within the framework of generalized estimating equations (GEE) that has two main advantages over the moment estimates that are typically applied for GEE: (1) It guarantees a consistent estimate of the correlation parameter and a positive definite estimated correlation matrix, for several correlation structures; and (2) It allows for easier implementation of some correlation structures that have not yet been implemented in the framework of GEE. Furthermore, because QLS is a method in the framework of GEE, existing software can be employed within the QLS algorithm for estimation of the correlation and regression parameters. In this manuscript we describe and demonstrate the user written package qlspack that allows for implementation of QLS in R software. Our package qlspack calls up the geepack package Yan (2002) and Halekoh et al. (2006) to update the estimate of the regression parameter at the current QLS estimate of the correlation parameter; hence, geepack related functions for standard error estimation can be used after implementing qlspack

    False Discovery Rate Control for High Dimensional Dependent Data with an Application to Large-Scale Genetic Association Studies

    Get PDF
    Large-scale genetic association studies are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single SNP analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferonni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. Motivated by an application for analysis of data from the genetic association studies, we consider the problem of false discovery rate (FDR) control under the high dimensional multivariate normal model. Using the compound decision rule framework, we develop an optimal joint oracle procedure and propose to use a marginal procedure to approximate the optimal joint optimal procedure. We show that the marginal plug-in procedure is asymptotically optimal under mild conditions. Our results indicate that the multiple testing procedure developed under the independent model is not only valid but also asymptotically optimal for the high dimensional multivariate normal data under some weak dependency. We evaluate various procedures using simulation studies and demonstrate its application to a genome-wide association study of neuroblastoma (NB). The proposed procedure identified a few more genetic variants that are potentially associated with NB than the standard p-value-based FDR controlling procedure

    Genetic variants in the calcium signaling pathway genes are associated with cutaneous melanoma-specific survival

    Get PDF
    Remodeling or deregulation of the calcium signaling pathway is a relevant hallmark of cancer including cutaneous melanoma (CM). In this study, using data from a published genome-wide association study (GWAS) from The University of Texas M.D. Anderson Cancer Center, we assessed the role of 41,377 common single-nucleotide polymorphisms (SNPs) of 167 calcium signaling pathway genes in CM survival. We used another GWAS from Harvard University as the validation dataset. In the single-locus analysis, 1830 SNPs were found to be significantly associated with CM-specific survival (CMSS; P ≤ 0.050 and false-positive report probability ≤ 0.2), of which 9 SNPs were validated in the Harvard study (P ≤ 0.050). Among these, three independent SNPs (i.e. PDE1A rs6750552 T>C, ITPR1 rs6785564 A>G and RYR3 rs2596191 C>A) had a predictive role in CMSS, with a meta-analysis-derived hazards ratio of 1.52 (95% confidence interval = 1.19–1.94, P = 7.21 × 10−4), 0.49 (0.33–0.73, 3.94 × 10−4) and 0.67 (0.53–0.86, 0.0017), respectively. Patients with an increasing number of protective genotypes had remarkably improved CMSS. Additional expression quantitative trait loci analysis showed that these genotypes were also significantly associated with mRNA expression levels of the genes. Taken together, these results may help us to identify prospective biomarkers in the calcium signaling pathway for CM prognosis

    A retrospective comparative study on the diagnostic efficacy and the complications: between CassiII rotational core biopsy and core needle biopsy

    Get PDF
    Accurate pathologic diagnosis and molecular classification of breast mass biopsy tissue is important for determining individualized therapy for (neo)adjuvant systemic therapies for invasive breast cancer. The CassiII rotational core biopsy system is a novel biopsy technique with a guide needle and a “stick-freeze” technology. The comprehensive assessments including the concordance rates of diagnosis and biomarker status between CassiII and core needle biopsy were evaluated in this study. Estrogen receptor (ER), progesterone receptor (PgR), human epidermal growth factor receptor 2 (HER2), and Ki67 were analyzed through immunohistochemistry. In total, 655 patients with breast cancer who underwent surgery after biopsy at Sir Run Run Shaw Hospital between January 2019 to December 2021 were evaluated. The concordance rates (CRs) of malignant surgical specimens with CassiII needle biopsy was significantly high compared with core needle biopsy. Moreover, CassiII needle biopsy had about 20% improvement in sensitivity and about 5% improvement in positive predictive value compared to Core needle biopsy. The characteristics including age and tumor size were identified the risk factors for pathological inconsistencies with core needle biopsies. However, CassiII needle biopsy was associated with tumor diameter only. The CRs of ER, PgR, HER2, and Ki67 using Cassi needle were 98.08% (kappa, 0.941; p<.001), 90.77% (kappa, 0.812; p<.001), 69.62% (kappa, 0.482; p<.001), and 86.92% (kappa, 0.552; p<.001), respectively. Post-biopsy complications with CassiII needle biopsy were also collected. The complications of CassiII needle biopsy including chest stuffiness, pain and subcutaneous ecchymosis are not rare. The underlying mechanism of subcutaneous congestion or hematoma after CassiII needle biopsy might be the larger needle diameter and the effect of temperature on coagulation function. In summary, CassiII needle biopsy is age-independent and has a better accuracy than CNB for distinguishing carcinoma in situ and invasive carcinoma

    Impacts of groundwater depth on regional scale soil gleyization under changing climate in the Poyang Lake Basin, China

    Get PDF
    This manuscript version is made available under the CC-BY-NC-ND 4.0 license: http://creativecommons.org/licenses/by-nc-nd/4.0/ which permits use, distribution and reproduction in any medium, provided the original work is properly cited. This author accepted manuscript is made available following 24 month embargo from date of publication (November 2018) in accordance with the publisher’s archiving policyVarious natural and anthropogenic factors affect the formation of gleyed soil. It is a major challenge to identify the key hazard factors and evaluate the dynamic evolutionary process of soil gleyization at a regional scale under future climate change. This study addressed this complex challenge based on regional groundwater modelling for a typical agriculture region located in the Ganjiang River Delta (GRD) of Poyang Lake Basin, China. We first implemented in-situ soil sampling analysis and column experiments under different water depths to examine the statistical relationship between groundwater depth (GD) and gleyization indexes including active reducing substance, ferrous iron content, and redox potential. Subsequently, a three-dimensional groundwater flow numerical model for the GRD was established to evaluate the impacts of the historical average level and future climate change on vadose saturation and soil gleyization (averaged over 2016–2050) in the irrigated farmland. Three climate change scenarios associated with carbon dioxide emission (A1B, A2, and B1) were predicted by the ECHAM5 global circulation model published in IPCC Assessment Report (2007). The ECHAM5 outputs were applied to quantify the variation of groundwater level and to identify the potential maximum gleyed zones affected by the changes of meteorological and hydrological conditions. The results of this study indicate that GD is an indirect indicator for predicting the gradation of soil gleyization at the regional scale, and that the GRD will suffer considerable soil gleyization by 2050 due to fluctuations of the water table induced by future climate changes. Compared with the annually average condition, the climate scenario B1 will probably exacerbate soil gleyization with an 8.8% increase in total gleyed area in GRD. On average, the highly gleyed areas will increase in area by 29.7 km2, mainly on the riverside area, and the medium-slightly gleyed area will increase by 19.2 km2 in the middle region.This work was partially supported by the National Key R&D Program of China (No. 2016YFC0402800), the National Natural Science Foundation of China (Nos. 41772254, 41502226, and 41402198), and the Fundamental Research Funds for the Central Universities (No. 2018B18714). We are grateful to Jiangxi Institute of Survey and Design, who provides the detailed hydrogeological data of PLB for establishing three-dimensional groundwater flow model. Yun Yang gratefully acknowledges financial support from China Scholarship Council (CSC No. 201706715023) during the visit to National Centre for Groundwater Research and Training (NCGRT), Australia. Behzad Ataie-Ashtiani and Craig T. Simmons acknowledge support from the National Centre for Groundwater Research and Training, Australia

    Duke EQAPOL CMV Intervention Flow Cytometry Data

    No full text
    This data set is collected from the External Quality Assurance Program Oversight Laboratory (EQAPOL) proficiency program by the Duke Immune Profiling Core (DIPC). This data set comes from 11 healthy volunteers who provided blood samples for flow cytometric intra-cellular cytokine staining (ICS) experiments. Blood samples from each individual was used as a negative control (“Costim”) or treated with a peptide mixture from the immunodominant cytomegalovirus (CMV) pp65 protein. Each sample contains approximately 200,000 cells for which 11 attributes, protein markers for discriminating between T cell basic, maturational and functional subsets, have been measured
    corecore