23 research outputs found

    A Bayesian test for excess zeros in a zero-inflated power series distribution

    Get PDF
    Power series distributions form a useful subclass of one-parameter discrete exponential families suitable for modeling count data. A zero-inflated power series distribution is a mixture of a power series distribution and a degenerate distribution at zero, with a mixing probability pp for the degenerate distribution. This distribution is useful for modeling count data that may have extra zeros. One question is whether the mixture model can be reduced to the power series portion, corresponding to p=0p=0, or whether there are so many zeros in the data that zero inflation relative to the pure power series distribution must be included in the model i.e., p0p\geq0. The problem is difficult partially because p=0p=0 is a boundary point. Here, we present a Bayesian test for this problem based on recognizing that the parameter space can be expanded to allow pp to be negative. Negative values of pp are inconsistent with the interpretation of pp as a mixing probability, however, they index distributions that are physically and probabilistically meaningful. We compare our Bayesian solution to two standard frequentist testing procedures and find that using a posterior probability as a test statistic has slightly higher power on the most important ranges of the sample size nn and parameter values than the score test and likelihood ratio test in simulations. Our method also performs well on three real data sets.Comment: Published in at http://dx.doi.org/10.1214/193940307000000068 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets

    Get PDF
    Privacy-preserving machine learning (PPML) promises to train machine learning (ML) models by combining data spread across multiple data silos. Theoretically, secure multiparty computation (MPC) allows multiple data owners to train models on their joint data without revealing the data to each other. However, the prior implementations of this secure training using MPC have three limitations: they have only been evaluated on CNNs, and LSTMs have been ignored; fixed point approximations have affected training accuracies compared to training in floating point; and due to significant latency overheads of secure training via MPC, its relevance for practical tasks with streaming data remains unclear. The motivation of this work is to report our experience of addressing the practical problem of secure training and inference of models for urban sensing problems, e.g., traffic congestion estimation, or air pollution monitoring in large cities, where data can be contributed by rival fleet companies while balancing the privacy-accuracy trade-offs using MPC-based techniques. Our first contribution is to design a custom ML model for this task that can be efficiently trained with MPC within a desirable latency. In particular, we design a GCN-LSTM and securely train it on time-series sensor data for accurate forecasting, within 7 minutes per epoch. As our second contribution, we build an end-toend system of private training and inference that provably matches the training accuracy of cleartext ML training. This work is the first to securely train a model with LSTM cells. Third, this trained model is kept secret-shared between the fleet companies and allows clients to make sensitive queries to this model while carefully handling potentially invalid queries. Our custom protocols allow clients to query predictions from privately trained models in milliseconds, all the while maintaining accuracy and cryptographic securit

    Global disparities in surgeons’ workloads, academic engagement and rest periods: the on-calL shIft fOr geNEral SurgeonS (LIONESS) study

    Get PDF
    : The workload of general surgeons is multifaceted, encompassing not only surgical procedures but also a myriad of other responsibilities. From April to May 2023, we conducted a CHERRIES-compliant internet-based survey analyzing clinical practice, academic engagement, and post-on-call rest. The questionnaire featured six sections with 35 questions. Statistical analysis used Chi-square tests, ANOVA, and logistic regression (SPSS® v. 28). The survey received a total of 1.046 responses (65.4%). Over 78.0% of responders came from Europe, 65.1% came from a general surgery unit; 92.8% of European and 87.5% of North American respondents were involved in research, compared to 71.7% in Africa. Europe led in publishing research studies (6.6 ± 8.6 yearly). Teaching involvement was high in North America (100%) and Africa (91.7%). Surgeons reported an average of 6.7 ± 4.9 on-call shifts per month, with European and North American surgeons experiencing 6.5 ± 4.9 and 7.8 ± 4.1 on-calls monthly, respectively. African surgeons had the highest on-call frequency (8.7 ± 6.1). Post-on-call, only 35.1% of respondents received a day off. Europeans were most likely (40%) to have a day off, while African surgeons were least likely (6.7%). On the adjusted multivariable analysis HDI (Human Development Index) (aOR 1.993) hospital capacity > 400 beds (aOR 2.423), working in a specialty surgery unit (aOR 2.087), and making the on-call in-house (aOR 5.446), significantly predicted the likelihood of having a day off after an on-call shift. Our study revealed critical insights into the disparities in workload, access to research, and professional opportunities for surgeons across different continents, underscored by the HDI

    GNN-based end-to-end reconstruction in the CMS Phase 2 High-Granularity Calorimeter

    Full text link
    We present the current stage of research progress towards a one-pass, completely Machine Learning (ML) based imaging calorimeter reconstruction. The model used is based on Graph Neural Networks (GNNs) and directly analyzes the hits in each HGCAL endcap. The ML algorithm is trained to predict clusters of hits originating from the same incident particle by labeling the hits with the same cluster index. We impose simple criteria to assess whether the hits associated as a cluster by the prediction are matched to those hits resulting from any particular individual incident particles. The algorithm is studied by simulating two tau leptons in each of the two HGCAL endcaps, where each tau may decay according to its measured standard model branching probabilities. The simulation includes the material interaction of the tau decay products which may create additional particles incident upon the calorimeter. Using this varied multiparticle environment we can investigate the application of this reconstruction technique and begin to characterize energy containment and performance.We present the current stage of research progress towards a one-pass, completely Machine Learning (ML) based imaging calorimeter reconstruction. The model used is based on Graph Neural Networks (GNNs) and directly analyzes the hits in each HGCAL endcap. The ML algorithm is trained to predict clusters of hits originating from the same incident particle by labeling the hits with the same cluster index. We impose simple criteria to assess whether the hits associated as a cluster by the prediction are matched to those hits resulting from any particular individual incident particles. The algorithm is studied by simulating two tau leptons in each of the two HGCAL endcaps, where each tau may decay according to its measured standard model branching probabilities. The simulation includes the material interaction of the tau decay products which may create additional particles incident upon the calorimeter. Using this varied multiparticle environment we can investigate the application of this reconstruction technique and begin to characterize energy containment and performance

    Amplified fragment length polymorphism and metabolomic profiles of hairy roots of Psoralea corylifolia L.

    Full text link
    A reproducible protocol for establishment of hairy root cultures of Psoralea corylifolia L. was developed using Agrobacterium rhizogenes strain ATCC 15834. The hairy root clones exhibited typical sigmoid growth curves. Genomic and metabolomic profiles of hairy root clones along with that of untransformed control were analysed. Hairy root clones, Ps I and Ps II, showed significant differences in their amplified fragment length polymorphism (AFLP) profiles as compared to that of control, besides exhibiting Ri T-DNA-specific bands. These results amply indicate the stable integration of Ri T-DNA into the genomes of these clones. Further, the variations observed between clones in the AFLP profiles suggest the variable lengths and independent nature of Ri T-DNA integrations into their genomes. An isoflavonoid, formononetin, and its glycoside were present only in the hairy root clones while they were absent in the untransformed control. Variations observed in the metabolite profiles of these clones may be attributed to the random T-DNA integrations and associated changes caused by them in the recipient genomes. GC/MS analyses revealed the production of three and six clone-specific compounds in Ps I and Ps II, respectively, suggesting that the clones are dissimilar in their secondary metabolism. HPLC/UV-MS analyses disclosed substantial increases in the total isoflavonoids produced in Ps I (184%) and Ps II (94%) compared to untransformed control. Graphical abstract Hairy root cultures of Psoralea corylifolia were developed. AFLP and Metabolomic profiles showed striking variations between the clones. An Isoflavonoid, formononetin and its glycoside were identified for the first time from hairy root cultures of P. corylifolia
    corecore