138 research outputs found

    Semiparametric and nonparametric methods in data mining and statistical learning with applications in public health surveillance and personalized medicine

    Get PDF
    The field of statistical learning has been growing rapidly over the past few decades, with a diverse range of applications. In this dissertation, we develop methodology mainly using semiparametric and nonparametric statistical learning techniques for the areas of public health surveillance and personalized medicine. Surveillance, providing early warning for impending emergencies, is a key function of public health. In Chapter 2, we propose a semiparametric spatiotemporal method to model spatiotemporal lattice data via a local linear fitting combined with day-of-week effects, in which both spatial and temporal information are taken into account. Detection of abnormal events are carried out using an ARMA time series technique for residuals combined with a resampling approach to determine the threshold for significance. We conduct simulations to assess the performance of the proposed method. Also, the method is illustrated using the data on daily asthma admissions collected through North Carolina emergency departments that occurred between 2006 and 2007. There is increasing interest in personalized medicine: the idea of tailoring treatment for each individual to optimize patient outcome. In Chapter 3, we focus on the single-decision setup. We show that estimating such an optimal treatment rule is equivalent to a classification problem where each subject is weighted proportional to his or her clinical outcome, although the true class labels, to which treatment group the patients belong as the optimal, are unknown in the training set. We then propose a new approach based on the support vector machine framework from computer science. We show the resulting estimator of the treatment rule is consistent, and further derive fairly accurate convergence rates for this estimator. The performance of the proposed approach is demonstrated via simulation studies and an analysis of chronic depression data. It is not uncommon that the best clinical strategies may require adaptation over time. We thus in Chapter 4 generalize the outcome weighted learning method to the multi-decision setup, aiming at finding the dynamic treatment regimes, customized sequential decision rules for individual patients which can adapt over time to the evolving illness, to maximize the long term health outcome. Inspired by the intrinsic idea in dynamic programming, we conduct outcome weighted learning for each stage backwards through time. We further introduce an iterative procedure which can improve the performance of the algorithm. The methods are evaluated by simulation studies and an analysis on a smoking cessation data set

    Acid-Triggered Self-Assembled Egg White Protein-Coated Gold Nanoclusters for Selective Fluorescent Detection of Fe\u3csup\u3e3+\u3c/sup\u3e, NO2\u3csup\u3e-\u3c/sup\u3e, and Cysteine

    Get PDF
    Herein, we present a simple and economical synthesis for the first multianalyte probe able to selectively quantify the concentrations of Fe3+, NO2-, and cysteine. It comprises H+-triggered self-assembled gold nanoclusters (AuNCs@EW/H+, AuEHs), showing enhanced red fluorescence at 640 nm. The AuEH is a good fluorescent nanosensor for Fe3+ and NO2- with detection limits of 1.40 and 2.82 nM, respectively. Iron detection, through fluorescence quenching, occurs because of nanocluster aggregation elicited by the complexation of Fe3+ with amino acids on the surface of AuEH; nitrite detection likely proceeds through fluorescence quenching via the disassembly of the nanoclusters following irreversible oxidation by nitrite. This selectivity is good enough that it can be used to quantify the nitrite concentration in commercially available processed meat. Cysteine detection occurs through the restoration of fluorescence of iron-quenched samples; similar molecules including homocysteine and glutathione are unable to restore fluorescence, showing the specificity of the interaction. Applications, including as a detecting ink and as a biocompatible probe, show promise because of the lack of observable toxicity of the AuEHs, demonstrating their promise as specific and sensitive biosensors

    Inverse Regression Estimation for Censored Data

    Get PDF
    An inverse regression methodology for assessing predictor performance in the censored data setup is developed along with inference procedures and a computational algorithm. The technique developed here allows for conditioning on the unobserved failure time along with a weighting mechanism that accounts for the censoring. The implementation is nonparametric and computationally fast. This provides an efficient methodological tool that can be used especially in cases where the usual modeling assumptions are not applicable to the data under consideration. It can also be a good diagnostic tool that can be used in the model selection process. We have provided theoretical justification of consistency and asymptotic normality of the methodology. Simulation studies and two data analyses are provided to illustrate the practical utility of the procedure

    Sample size estimation in educational intervention trials with subgroup heterogeneity in only one arm

    Get PDF
    We present closed form sample size and power formulas motivated by the study of a psycho-social intervention in which the experimental group has the intervention delivered in teaching subgroups while the control group receives usual care. This situation is different from the usual clustered randomized trial since subgroup heterogeneity only exists in one arm. We take this modification into consideration and present formulas for the situation in which we compare a continuous outcome at both a single point in time and longitudinally over time. In addition, we present the optimal combination of parameters such as the number of subgroups and number of time points for minimizing sample size and maximizing power subject to constraints such as the maximum number of measurements that can be taken (i.e. a proxy for cost)

    Carbon dots for specific “off-on” sensing of Co\u3csup\u3e2+\u3c/sup\u3e and EDTA for in vivo bioimaging

    Get PDF
    Fluorescent carbon dots (CDs) were hydrothermally synthesized from a mixture of frozen tofu, ethylenediamine and phosphoric acid in an efficient 64% yield. The resulting CDs exhibit good water solubility, low cytotoxicity, high stability, and excellent biocompatibility. The CDs selectively and sensitively detect Co2+ through fluorescent quenching with a detection limit of 58 nM. Fluorescence can be restored through the introduction of EDTA, and this phenomenon can be used to quantify EDTA in solution with a detection limit of 98 nM. As both analytes are detected by the same CD platform, this is an “off-on” fluorescence sensor for Co2+ and EDTA. The technique\u27s robustness for real-world samples was illustrated by quantifying cobalt in tap water and EDTA in contact lens solution. The CDs were also evaluated for in vivo imaging as they show low cytotoxicity and excellent cellular uptake. In a zebrafish model, the CDs are rapidly adsorbed from the intestine into the liver, and are essentially cleared from the body in 24 h with no appreciable bioaccumulation. Their simple and efficient synthesis, combined with excellent physical and chemical performance, renders these CDs attractive candidates for theranostic applications in targeted “smart” drug delivery and bioimaging

    RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

    Full text link
    In various natural language processing tasks, passage retrieval and passage re-ranking are two key procedures in finding and ranking relevant information. Since both the two procedures contribute to the final performance, it is important to jointly optimize them in order to achieve mutual improvement. In this paper, we propose a novel joint training approach for dense passage retrieval and passage re-ranking. A major contribution is that we introduce the dynamic listwise distillation, where we design a unified listwise training approach for both the retriever and the re-ranker. During the dynamic distillation, the retriever and the re-ranker can be adaptively improved according to each other's relevance information. We also propose a hybrid data augmentation strategy to construct diverse training instances for listwise training approach. Extensive experiments show the effectiveness of our approach on both MSMARCO and Natural Questions datasets. Our code is available at https://github.com/PaddlePaddle/RocketQA.Comment: EMNLP 202
    • …
    corecore