4,351 research outputs found
Text generation for dataset augmentation in security classification tasks
Security classifiers, designed to detect malicious content in computer
systems and communications, can underperform when provided with insufficient
training data. In the security domain, it is often easy to find samples of the
negative (benign) class, and challenging to find enough samples of the positive
(malicious) class to train an effective classifier. This study evaluates the
application of natural language text generators to fill this data gap in
multiple security-related text classification tasks. We describe a variety of
previously-unexamined language-model fine-tuning approaches for this purpose
and consider in particular the impact of disproportionate class-imbalances in
the training set. Across our evaluation using three state-of-the-art
classifiers designed for offensive language detection, review fraud detection,
and SMS spam detection, we find that models trained with GPT-3 data
augmentation strategies outperform both models trained without augmentation and
models trained using basic data augmentation strategies already in common
usage. In particular, we find substantial benefits for GPT-3 data augmentation
strategies in situations with severe limitations on known positive-class
samples
Jamming and Stress Propagation in Particulate Matter
We present simple models of particulate materials whose mechanical integrity
arises from a jamming process. We argue that such media are generically
"fragile", that is, they are unable to support certain types of incremental
loading without plastic rearrangement. In such models, fragility is naturally
linked to the marginal stability of force chain networks (granular skeletons)
within the material. Fragile matter exhibits novel mechanical responses that
may be relevant to both jammed colloids and cohesionless assemblies of poured,
rigid grains.Comment: LATEX, 3 Figures, elsart.cls style file, 11 page
Variational study of a dilute Bose condensate in a harmonic trap
A two-parameter trial condensate wave function is used to find an approximate
variational solution to the Gross-Pitaevskii equation for condensed
bosons in an isotropic harmonic trap with oscillator length and
interacting through a repulsive two-body scattering length . The
dimensionless parameter characterizes the effect
of the interparticle interactions, with for an ideal gas and
for a strongly interacting system (the Thomas-Fermi limit).
The trial function interpolates smoothly between these two limits, and the
three separate contributions (kinetic energy, trap potential energy, and
two-body interaction energy) to the variational condensate energy and the
condensate chemical potential are determined parametrically for any value of
, along with illustrative numerical values. The straightforward
generalization to an anisotropic harmonic trap is considered briefly.Comment: 14 pages, RevTeX, submitted to Journal of Low Temperature Physic
Recommended from our members
A Lab-in-a-briefcase for rapid prostate specific antigen (PSA) screening from whole blood
We present a new concept for rapid and fully portable Prostate Specific Antigen (PSA) measurement, termed “Lab-in-a-Briefcase”, which integrates an affordable microfluidic ELISA platform utilising a melt-extruded fluoropolymer Micro Capillary Film (MCF) containing 10 bore, 200 μm internal diameter capillaries, a disposable multi-syringe aspirator (MSA) plus a sample tray pre-loaded with all required immunoassay reagents, and a portable film scanner for colorimetric signal digital quantitation. Each MSA can perform 10 replicate microfluidic immunoassays on 8 samples, allowing 80measurements to be made in less than 15 minutes based on semi-automated operation and norequirement of additional fluid handling equipment. An assay was optimised for measurement of a clinically relevant range of PSA from 0.9 to 60.0 ng/ml in 15 minutes with CVs in the order of 5% based on intra-assay variability when read using a consumer flatbed film scanner. The PSA assay performance in the MSA remained robust in the presence of undiluted or 1:2 diluted human serum or whole blood, and the matrix effect could simply be overcome by extending sample incubation times. The PSA "Lab-in-a-briefcase" is particularly suited to a low-resource health setting where diagnostic labs and automated immunoassay systems are not accessible, by allowing PSA measurement outside the laboratory using affordable equipment
The Metropolis algorithm: A useful tool for epidemiologists
The Metropolis algorithm is a Markov chain Monte Carlo (MCMC) algorithm used
to simulate from parameter distributions of interest, such as generalized
linear model parameters. The "Metropolis step" is a keystone concept that
underlies classical and modern MCMC methods and facilitates simple analysis of
complex statistical models. Beyond Bayesian analysis, MCMC is useful for
generating uncertainty intervals, even under the common scenario in causal
inference in which the target parameter is not directly estimated by a single,
fitted statistical model. We demonstrate, with a worked example, pseudo-code,
and R code, the basic mechanics of the Metropolis algorithm. We use the
Metropolis algorithm to estimate the odds ratio and risk difference contrasting
the risk of childhood leukemia among those exposed to high versus low level
magnetic fields. This approach can be used for inference from Bayesian and
frequentist paradigms and, in small samples, offers advantages over
large-sample methods like the bootstrap.Comment: 26 pages, 3 figure
- …