671 research outputs found
Toward Understanding Generative Data Augmentation
Generative data augmentation, which scales datasets by obtaining fake labeled
examples from a trained conditional generative model, boosts classification
performance in various learning tasks including (semi-)supervised learning,
few-shot learning, and adversarially robust learning. However, little work has
theoretically investigated the effect of generative data augmentation. To fill
this gap, we establish a general stability bound in this not independently and
identically distributed (non-i.i.d.) setting, where the learned distribution is
dependent on the original train set and generally not the same as the true
distribution. Our theoretical result includes the divergence between the
learned distribution and the true distribution. It shows that generative data
augmentation can enjoy a faster learning rate when the order of divergence term
is , where is the train
set size and is the corresponding stability constant. We further
specify the learning setup to the Gaussian mixture model and generative
adversarial nets. We prove that in both cases, though generative data
augmentation does not enjoy a faster learning rate, it can improve the learning
guarantees at a constant level when the train set is small, which is
significant when the awful overfitting occurs. Simulation results on the
Gaussian mixture model and empirical results on generative adversarial nets
support our theoretical conclusions. Our code is available at
https://github.com/ML-GSAI/Understanding-GDA.Comment: 39 page
ANALYTIC METHODS USED IN REAL WORLD DATA BASED BIOMEDICAL RESEARCH- A SCOPING REVIEW
Background and Objective:
Real-world data (RWD) is characterized as data derived from multiple sources associated with the process in real-world practice in a heterogeneous patient population. There is a growing interest in using Real-World Data and Real-World Evidence in biomedical research since RWE presents an opportunity to extend the research beyond the typical limits of academia. However, the traditional statistics methods used in RWD analysis may lead to bias and challenge the credibility of RWE. To document what analytics methods have been used in RWD analysis, we conducted a sampled methodological review of methods used in EHRs based biomedical research.
Methods:
We developed an article database to document literature characteristics and analytical methods. We took a random sample of articles for detailed review. The primary outcome was proportion of articles using RWD methods. Meta-regressions were utilized to examine trends in proportion changes over time.
Results:
Of 88 papers reviewed in detail, 7 (8.0%) used the recommended Real-World Method (RWM). The proportion (and 95% confidence interval) of publications reporting having used RWM, performed sensitivity analysis, and handled missing data problem in 2019 were 11% (0, 26%), 17% (0, 34%) , and 22% (3%, 41%), respectively. Results of the sensitivity analysis showed the proportion of use RWM increased 0.4% per year, although this slope was statistically equivalent to 0.
Conclusions: The proportion of the EHRs based studies handling missing data, using RWM, or performing sensitivity analysis is disappointingly low. Although regulator guidelines, books, and academic meetings have suggested during the study period methods should be used in RWD analysis, the proper analytic methods are inadequately used in the published studies
FUNCTIONAL STUDIES OF NUSAP IN MICROTUBULE STABILITY, CHROMOSOME OSCILLATION AND MIDZONE FORMATION DURING MITOSIS
Ph.DDOCTOR OF PHILOSOPH
Novel Criteria for When and How to Exit a COVID-19 Pandemic Lockdown
In the first month of 2020, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), a novel coronavirus spreading quickly via human-to-human transmission, caused the coronavirus disease 2019 (COVID-19) pandemic. Italy installed a successful nationwide lockdown to mitigate the exponential increase of case numbers, as the basic reproduction number R0 reached 1 within 4 weeks. But is R0 really the relevant criterion as to whether or not community spreading is under control? In most parts of the world, testing largely focused on symptomatic cases, and we thus hypothesized that the true number of infected cases and relative testing capacity are better determinants to guide lockdown exit strategies. We employed the SEIR model to estimate the numbers of undocumented cases. As expected, the estimated numbers of all cases largely exceeded the reported ones in all Italian regions. Next, we used the numbers of reported and estimated cases per million of population and compared it with the respective numbers of tests. In Lombardy, as the most affected region, testing capacity per reported new case seemed between two and eight most of the time, but testing capacity per estimated new cases never reached four up to April 30. In contrast, Veneto's testing capacity per reported and estimated new cases were much less discrepant and were between four and 16 most of the time. As per April 30 also Marche, Lazio and other Italian regions arrived close to 16 ratio of test capacity per new estimated infection. Thus, the criterion to exit a lockdown should be decided at the level of the regions, based on the local testing capacity that should reach 16 times the estimated true number of newly infected cases as predicted
On the origin of the split main sequences of the young massive cluster NGC 1856
The detection of split main sequences (MSs) associated with young clusters
(600 Myr) has caught lots of attention. A prevailing scenario is that
a bimodality of stellar rotation distribution drives the MS bifurcation.
Nevertheless, the origin of the stellar rotation dichotomy remains unclear.
Hypotheses involving tidally-locked binaries or blue straggler stars (BSSs) are
proposed to explain the observed split MSs. This work examines if the long-term
dynamical evolution of star clusters can produce the observed split MSs,
through high-performance -body simulation. As a prototype example, the young
massive cluster NGC 1856 exhibits an apparent MS bifurcation. Our simulation
reports that at the age of NGC 1856, tidally-locked binaries are fully mixed
with single stars. This is consistent with the observation that there is no
significant spatial difference between blue MS and red MS stars. However, we
find that only high mass-ratio binaries can evolve to the tidally-locked phase
at the age of the NGC 1856. These tidally-locked binaries will populate a much
redder sequence than the MS of single stars rather than a blue MS, which is
inconsistent with the hypothesis. The number of tidally-locked binaries cannot
account for the observation. Our simulation shows that BSSs produced by binary
interactions do populate the blue periphery in the color-magnitude diagram, and
their spatial distribution shows a similar pattern of single stars. However,
the number of BSSs does not fit the observation.Comment: 14 pages, 7 figures, 1 table; accepted for publication in Ap
Koopa: Learning Non-stationary Time Series Dynamics with Koopman Predictors
Real-world time series is characterized by intrinsic non-stationarity that
poses a principal challenge for deep forecasting models. While previous models
suffer from complicated series variations induced by changing temporal
distribution, we tackle non-stationary time series with modern Koopman theory
that fundamentally considers the underlying time-variant dynamics. Inspired by
Koopman theory of portraying complex dynamical systems, we disentangle
time-variant and time-invariant components from intricate non-stationary series
by Fourier Filter and design Koopman Predictor to advance respective dynamics
forward. Technically, we propose Koopa as a novel Koopman forecaster composed
of stackable blocks that learn hierarchical dynamics. Koopa seeks measurement
functions for Koopman embedding and utilizes Koopman operators as linear
portraits of implicit transition. To cope with time-variant dynamics that
exhibits strong locality, Koopa calculates context-aware operators in the
temporal neighborhood and is able to utilize incoming ground truth to scale up
forecast horizon. Besides, by integrating Koopman Predictors into deep residual
structure, we ravel out the binding reconstruction loss in previous Koopman
forecasters and achieve end-to-end forecasting objective optimization. Compared
with the state-of-the-art model, Koopa achieves competitive performance while
saving 77.3% training time and 76.0% memory
A hybrid Decoder-DeepONet operator regression framework for unaligned observation data
Deep neural operators (DNOs) have been utilized to approximate nonlinear
mappings between function spaces. However, DNOs face the challenge of increased
dimensionality and computational cost associated with unaligned observation
data. In this study, we propose a hybrid Decoder-DeepONet operator regression
framework to handle unaligned data effectively. Additionally, we introduce a
Multi-Decoder-DeepONet, which utilizes an average field of training data as
input augmentation. The consistencies of the frameworks with the operator
approximation theory are provided, on the basis of the universal approximation
theorem. Two numerical experiments, Darcy problem and flow-field around an
airfoil, are conducted to validate the efficiency and accuracy of the proposed
methods. Results illustrate the advantages of Decoder-DeepONet and
Multi-Decoder-DeepONet in handling unaligned observation data and showcase
their potentials in improving prediction accuracy.Comment: 35 pages, 10 figures, 11 table
- …