49 research outputs found
Time-dependent Cross-ratio Estimation for Bivariate Failure Times.
In the analysis of bivariate correlated failure time data, it is important to measure the strength of association among the correlated failure times. One commonly used measure is the cross-ratio. In the literature, the functional form of cross-ratio is rather restrictive, often assumed to be constant, piecewise constant or completely determined by specific model assumption for the joint survival function, e.g. copula model. In this dissertation, we focus on estimating the cross-ratio as a smooth function of bivariate times in various settings without imposing any model assumption for the joint survival function.
Motivated by Cox's partial likelihood idea, we propose in the first chapter a novel parametric estimator for the cross-ratio that is a flexible polynomial function of both survival times. We show that the estimates of cross-ratio regression coefficients are consistent and asymptotically normal. The performance of the proposed technique in finite samples is examined using simulation studies. The proposed method is applied to the Australian twin data for the estimation of dependence of the risk for appendicitis between twin pairs.
In the second chapter, we extend our model to accommodate covariates. Motivated by the Tremin study, we propose a multiplicative model for covariates effect. When the covariate is discrete, we modify our estimator in chapter one by grouping subjects with the same covariate value into strata. When the covariate is continuous, we propose using kernel smoothing applied to the estimating equations. The estimates of regression coefficients are shown to be consistent and asymptotically normal. Numerical studies are conducted for both discrete and continuous covariates.
In observational follow up studies, delayed entry observations are common. In an AIDS incubation cohort study, for example, lag time between HIV infection and death is left-truncated by lag time between HIV infection and the beginning of the study if the patient was infected before the beginning of the study. Ignoring left truncation yields biased estimates. We adjust our model by modifying the risk set and relevant indicators to handle left truncations in the third chapter. We show that the estimates of cross-ratio regression coefficients are consistent and asymptotically normal. Numerical studies are conducted.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89616/1/hutianle_1.pd
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond
We consider vertical logistic regression (VLR) trained with mini-batch
gradient descent -- a setting which has attracted growing interest among
industries and proven to be useful in a wide range of applications including
finance and medical research. We provide a comprehensive and rigorous privacy
analysis of VLR in a class of open-source Federated Learning frameworks, where
the protocols might differ between one another, yet a procedure of obtaining
local gradients is implicitly shared. We first consider the honest-but-curious
threat model, in which the detailed implementation of protocol is neglected and
only the shared procedure is assumed, which we abstract as an oracle. We find
that even under this general setting, single-dimension feature and label can
still be recovered from the other party under suitable constraints of batch
size, thus demonstrating the potential vulnerability of all frameworks
following the same philosophy. Then we look into a popular instantiation of the
protocol based on Homomorphic Encryption (HE). We propose an active attack that
significantly weaken the constraints on batch size in the previous analysis via
generating and compressing auxiliary ciphertext. To address the privacy leakage
within the HE-based protocol, we develop a simple-yet-effective countermeasure
based on Differential Privacy (DP), and provide both utility and privacy
guarantees for the updated algorithm. Finally, we empirically verify the
effectiveness of our attack and defense on benchmark datasets. Altogether, our
findings suggest that all vertical federated learning frameworks that solely
depend on HE might contain severe privacy risks, and DP, which has already
demonstrated its power in horizontal federated learning, can also play a
crucial role in the vertical setting, especially when coupled with HE or secure
multi-party computation (MPC) techniques
All-Inorganic Perovskite Solar Cells With Both High Open-Circuit Voltage and Stability
Metal halide perovskite solar cells based on all-inorganic CsPbBr3 have attracted considerable attentions recently, due to their high open-circuit voltage and good stability. However, the fabrication of CsPbBr3 film is limited by the poor solubility of cesium precursors in organic solvents by the one-step method. Here, we successfully fabricated CsPbBr3 film solar cells by employing colloid nanocrystal. The effects of technique parameters, including purification times, anneal temperatures, and spin-coating times on film morphology, optical spectra, and device performance are investigated in detail. The highest power conversion efficiency of 4.57% has been achieved based on a large open-circuit voltage of 1.45 V and a large short-circuit current of 9.41 mA cm−2. A large open-circuit voltage results from the reduced non-radiative energy loss channels and defect states while a large short-circuit current is related to the high conductivity induced by the removal of organic ligands with the increased nanocrystal electronic coupling. Furthermore, excellent stability in air is disclosed on the unencapsulated device suggesting the enormous potential for developing high open-circuit photovoltaic devices with high stability in future
Understanding LLMs: A Comprehensive Overview from Training to Inference
The introduction of ChatGPT has led to a significant increase in the
utilization of Large Language Models (LLMs) for addressing downstream tasks.
There's an increasing focus on cost-efficient training and deployment within
this context. Low-cost training and deployment of LLMs represent the future
development trend. This paper reviews the evolution of large language model
training techniques and inference deployment technologies aligned with this
emerging trend. The discussion on training includes various aspects, including
data preprocessing, training architecture, pre-training tasks, parallel
training, and relevant content related to model fine-tuning. On the inference
side, the paper covers topics such as model compression, parallel computation,
memory scheduling, and structural optimization. It also explores LLMs'
utilization and provides insights into their future development.Comment: 30 pages,6 figure
Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes
Genome-wide association (GWA) studies have identified multiple new genomic loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D)1-11. Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to discover loci at which common alleles have modest effects, we performed meta-analysis of three T2D GWA scans encompassing 10,128 individuals of European-descent and ~2.2 million SNPs (directly genotyped and imputed). Replication testing was performed in an independent sample with an effective sample size of up to 53,975. At least six new loci with robust evidence for association were detected, including the JAZF1 (p=5.0×10−14), CDC123/CAMK1D (p=1.2×10−10), TSPAN8/LGR5 (p=1.1×10−9), THADA (p=1.1×10−9), ADAMTS9 (p=1.2×10−8), and NOTCH2 (p=4.1×10−8) gene regions. The large number of loci with relatively small effects indicates the value of large discovery and follow-up samples in identifying additional clues about the inherited basis of T2D
A genome-wide association study of type 2 diabetes in finns detects multiple susceptibility variants
Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans has been a formidable challenge. Adopting a genome-wide association strategy, we genotyped 1161 Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000 single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 million autosomal SNPs. We carried out association analysis with these SNPs to identify genetic variants that predispose to T2D, compared our T2D association results with the results of two similar studies, and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls. We identify T2D-associated variants in an intergenic region of chromosome 11p12, contribute to the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and the region of CDKN2A and CDKN2B, and confirm that variants near TCF7L2, SLC30A8, HHEX, FTO, PPARG, and KCNJ11 are associated with T2D risk. This brings the number of T2D loci now confidently identified to at least 10
Recommended from our members
Proportional cross-ratio model
Cross-ratio is an important local measure of the strength of dependence among correlated failure times. If a covariate is available, it may be of scientific interest to understand how the cross-ratio varies with the covariate as well as time components. Motivated by the Tremin study, where the dependence between age at a marker event reflecting early lengthening of menstrual cycles and age at menopause may be affected by age at menarche, we propose a proportional cross-ratio model through a baseline cross-ratio function and a multiplicative covariate effect. Assuming a parametric model for the baseline cross-ratio, we generalize the pseudo-partial likelihood approach of Hu et al. (Biometrika 98:341-354, 2011) to the joint estimation of the baseline cross-ratio and the covariate effect. We show that the proposed parameter estimator is consistent and asymptotically normal. The performance of the proposed technique in finite samples is examined using simulation studies. In addition, the proposed method is applied to the Tremin study for the dependence between age at a marker event and age at menopause adjusting for age at menarche. The method is also applied to the Australian twin data for the estimation of zygosity effect on cross-ratio for age at appendicitis between twin pairs
Risk Assessment in the Industry Chain of Industrialized Construction:A Chinese Case Study
The industry chain of industrialized construction is a key strategy for promoting the sustainable performance of China’s construction industry. Its risk identification is the fundamental step to promote the development of the industry chain. The study was conducted in two phases. The first phase included an extensive literature review and case study analysis to document 32 key factors affecting the process of the industry chain of industrialized construction. In the second phase, 22 key factors influencing the development of the industry chain of industrialized construction in Shandong Province were screened through data collection and expert consultation. A complex network of industrialized construction risk associations (CNICRA) was developed to assess these risks by considering the interrelationship among risks, network nodes, and network edges, and the comprehensive degree indicators for improving the model’s accuracy and resolution. The results show that enterprise collaboration level is the most important factor in the industry chain of industrialized construction. The industrialized system is the most transmittable factor of risk. This study investigated a list of risks in the industrialization of construction, optimized a complex network of risk association, and provided theoretical support for risk management of the industry chain of industrialized construction and understanding of risk response strategies for decision makers
Research on Multi-Optimal Project of Outlet Guide Vanes of Nuclear Grade Axial Flow Fan Based on Sensitivity Analysis
Nuclear grade axial flow fans are widely used in nuclear power plants for ventilation and heat dissipation and have the advantages of high efficiency and high flow rates. A nuclear grade axial flow fan with OGVs (outlet guide vanes) can recover the kinetic energy of the dynamic impeller outlet winding to increase the ventilator pressure, thus improving the ventilator efficiency; therefore, the OGVs play an essential role in the performance of the axial flow fan. Based on accurate numerical simulations, an MRGP approximation model was developed to analyse the factors affecting the OGVs duct and optimise the guide vane structure, combined with the Sobol method for sensitivity analysis. The experiments and numerical simulations show that the total pressure of the optimised model increases by 154 Pa, and the noise decreases by 4.1 dB. The multi-objective optimisation method using the parametric approach and combining it with the MRGP model is highly reliable. It provides a key design direction for optimising nuclear grade axial flow fans