23 research outputs found
A Bayesian test for excess zeros in a zero-inflated power series distribution
Power series distributions form a useful subclass of one-parameter discrete
exponential families suitable for modeling count data. A zero-inflated power
series distribution is a mixture of a power series distribution and a
degenerate distribution at zero, with a mixing probability for the
degenerate distribution. This distribution is useful for modeling count data
that may have extra zeros. One question is whether the mixture model can be
reduced to the power series portion, corresponding to , or whether there
are so many zeros in the data that zero inflation relative to the pure power
series distribution must be included in the model i.e., . The problem
is difficult partially because is a boundary point. Here, we present a
Bayesian test for this problem based on recognizing that the parameter space
can be expanded to allow to be negative. Negative values of are
inconsistent with the interpretation of as a mixing probability, however,
they index distributions that are physically and probabilistically meaningful.
We compare our Bayesian solution to two standard frequentist testing procedures
and find that using a posterior probability as a test statistic has slightly
higher power on the most important ranges of the sample size and parameter
values than the score test and likelihood ratio test in simulations. Our method
also performs well on three real data sets.Comment: Published in at http://dx.doi.org/10.1214/193940307000000068 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets
Privacy-preserving machine learning (PPML) promises to train
machine learning (ML) models by combining data spread across
multiple data silos. Theoretically, secure multiparty computation
(MPC) allows multiple data owners to train models on their joint
data without revealing the data to each other. However, the prior
implementations of this secure training using MPC have three limitations: they have only been evaluated on CNNs, and LSTMs have
been ignored; fixed point approximations have affected training
accuracies compared to training in floating point; and due to significant latency overheads of secure training via MPC, its relevance
for practical tasks with streaming data remains unclear.
The motivation of this work is to report our experience of addressing the practical problem of secure training and inference
of models for urban sensing problems, e.g., traffic congestion estimation, or air pollution monitoring in large cities, where data
can be contributed by rival fleet companies while balancing the
privacy-accuracy trade-offs using MPC-based techniques.
Our first contribution is to design a custom ML model for this
task that can be efficiently trained with MPC within a desirable
latency. In particular, we design a GCN-LSTM and securely train
it on time-series sensor data for accurate forecasting, within 7
minutes per epoch. As our second contribution, we build an end-toend system of private training and inference that provably matches
the training accuracy of cleartext ML training. This work is the first
to securely train a model with LSTM cells. Third, this trained model
is kept secret-shared between the fleet companies and allows clients
to make sensitive queries to this model while carefully handling
potentially invalid queries. Our custom protocols allow clients to
query predictions from privately trained models in milliseconds,
all the while maintaining accuracy and cryptographic securit
Global disparities in surgeons’ workloads, academic engagement and rest periods: the on-calL shIft fOr geNEral SurgeonS (LIONESS) study
: The workload of general surgeons is multifaceted, encompassing not only surgical procedures but also a myriad of other responsibilities. From April to May 2023, we conducted a CHERRIES-compliant internet-based survey analyzing clinical practice, academic engagement, and post-on-call rest. The questionnaire featured six sections with 35 questions. Statistical analysis used Chi-square tests, ANOVA, and logistic regression (SPSS® v. 28). The survey received a total of 1.046 responses (65.4%). Over 78.0% of responders came from Europe, 65.1% came from a general surgery unit; 92.8% of European and 87.5% of North American respondents were involved in research, compared to 71.7% in Africa. Europe led in publishing research studies (6.6 ± 8.6 yearly). Teaching involvement was high in North America (100%) and Africa (91.7%). Surgeons reported an average of 6.7 ± 4.9 on-call shifts per month, with European and North American surgeons experiencing 6.5 ± 4.9 and 7.8 ± 4.1 on-calls monthly, respectively. African surgeons had the highest on-call frequency (8.7 ± 6.1). Post-on-call, only 35.1% of respondents received a day off. Europeans were most likely (40%) to have a day off, while African surgeons were least likely (6.7%). On the adjusted multivariable analysis HDI (Human Development Index) (aOR 1.993) hospital capacity > 400 beds (aOR 2.423), working in a specialty surgery unit (aOR 2.087), and making the on-call in-house (aOR 5.446), significantly predicted the likelihood of having a day off after an on-call shift. Our study revealed critical insights into the disparities in workload, access to research, and professional opportunities for surgeons across different continents, underscored by the HDI
GNN-based end-to-end reconstruction in the CMS Phase 2 High-Granularity Calorimeter
We present the current stage of research progress towards a one-pass, completely Machine Learning (ML) based imaging calorimeter reconstruction. The model used is based on Graph Neural Networks (GNNs) and directly analyzes the hits in each HGCAL endcap. The ML algorithm is trained to predict clusters of hits originating from the same incident particle by labeling the hits with the same cluster index. We impose simple criteria to assess whether the hits associated as a cluster by the prediction are matched to those hits resulting from any particular individual incident particles. The algorithm is studied by simulating two tau leptons in each of the two HGCAL endcaps, where each tau may decay according to its measured standard model branching probabilities. The simulation includes the material interaction of the tau decay products which may create additional particles incident upon the calorimeter. Using this varied multiparticle environment we can investigate the application of this reconstruction technique and begin to characterize energy containment and performance.We present the current stage of research progress towards a one-pass, completely Machine Learning (ML) based imaging calorimeter reconstruction. The model used is based on Graph Neural Networks (GNNs) and directly analyzes the hits in each HGCAL endcap. The ML algorithm is trained to predict clusters of hits originating from the same incident particle by labeling the hits with the same cluster index. We impose simple criteria to assess whether the hits associated as a cluster by the prediction are matched to those hits resulting from any particular individual incident particles. The algorithm is studied by simulating two tau leptons in each of the two HGCAL endcaps, where each tau may decay according to its measured standard model branching probabilities. The simulation includes the material interaction of the tau decay products which may create additional particles incident upon the calorimeter. Using this varied multiparticle environment we can investigate the application of this reconstruction technique and begin to characterize energy containment and performance
Amplified fragment length polymorphism and metabolomic profiles of hairy roots of Psoralea corylifolia L.
A reproducible protocol for establishment of hairy root cultures of Psoralea corylifolia L. was developed using Agrobacterium rhizogenes strain ATCC 15834. The hairy root clones exhibited typical sigmoid growth curves. Genomic and metabolomic profiles of hairy root clones along with that of untransformed control were analysed. Hairy root clones, Ps I and Ps II, showed significant differences in their amplified fragment length polymorphism (AFLP) profiles as compared to that of control, besides exhibiting Ri T-DNA-specific bands. These results amply indicate the stable integration of Ri T-DNA into the genomes of these clones. Further, the variations observed between clones in the AFLP profiles suggest the variable lengths and independent nature of Ri T-DNA integrations into their genomes. An isoflavonoid, formononetin, and its glycoside were present only in the hairy root clones while they were absent in the untransformed control. Variations observed in the metabolite profiles of these clones may be attributed to the random T-DNA integrations and associated changes caused by them in the recipient genomes. GC/MS analyses revealed the production of three and six clone-specific compounds in Ps I and Ps II, respectively, suggesting that the clones are dissimilar in their secondary metabolism. HPLC/UV-MS analyses disclosed substantial increases in the total isoflavonoids produced in Ps I (184%) and Ps II (94%) compared to untransformed control. Graphical abstract Hairy root cultures of Psoralea corylifolia were developed. AFLP and Metabolomic profiles showed striking variations between the clones. An Isoflavonoid, formononetin and its glycoside were identified for the first time from hairy root cultures of P. corylifolia