120 research outputs found
Near-Optimal MNL Bandits Under Risk Criteria
We study MNL bandits, which is a variant of the traditional multi-armed
bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk
criteria are more general goals widely used in industries and bussiness. We
design algorithms for a broad class of risk criteria, including but not limited
to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and
prove that they suffer a near-optimal regret. As a complement, we also conduct
experiments with both synthetic and real data to show the empirical performance
of our proposed algorithms.Comment: AAAI202
Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs
Large Language Models (LLMs) are now widely used in various applications,
making it crucial to align their ethical standards with human values. However,
recent jail-breaking methods demonstrate that this alignment can be undermined
using carefully constructed prompts. In our study, we reveal a new threat to
LLM alignment when a bad actor has access to the model's output logits, a
common feature in both open-source LLMs and many commercial LLM APIs (e.g.,
certain GPT models). It does not rely on crafting specific prompts. Instead, it
exploits the fact that even when an LLM rejects a toxic request, a harmful
response often hides deep in the output logits. By forcefully selecting
lower-ranked output tokens during the auto-regressive generation process at a
few critical output positions, we can compel the model to reveal these hidden
responses. We term this process model interrogation. This approach differs from
and outperforms jail-breaking methods, achieving 92% effectiveness compared to
62%, and is 10 to 20 times faster. The harmful content uncovered through our
method is more relevant, complete, and clear. Additionally, it can complement
jail-breaking strategies, with which results in further boosting attack
performance. Our findings indicate that interrogation can extract toxic
knowledge even from models specifically designed for coding tasks
Opening A Pandora's Box: Things You Should Know in the Era of Custom GPTs
The emergence of large language models (LLMs) has significantly accelerated
the development of a wide range of applications across various fields. There is
a growing trend in the construction of specialized platforms based on LLMs,
such as the newly introduced custom GPTs by OpenAI. While custom GPTs provide
various functionalities like web browsing and code execution, they also
introduce significant security threats. In this paper, we conduct a
comprehensive analysis of the security and privacy issues arising from the
custom GPT platform. Our systematic examination categorizes potential attack
scenarios into three threat models based on the role of the malicious actor,
and identifies critical data exchange channels in custom GPTs. Utilizing the
STRIDE threat modeling framework, we identify 26 potential attack vectors, with
19 being partially or fully validated in real-world settings. Our findings
emphasize the urgent need for robust security and privacy measures in the
custom GPT ecosystem, especially in light of the forthcoming launch of the
official GPT store by OpenAI
WavMark: Watermarking for Audio Generation
Recent breakthroughs in zero-shot voice synthesis have enabled imitating a
speaker's voice using just a few seconds of recording while maintaining a high
level of realism. Alongside its potential benefits, this powerful technology
introduces notable risks, including voice fraud and speaker impersonation.
Unlike the conventional approach of solely relying on passive methods for
detecting synthetic data, watermarking presents a proactive and robust defence
mechanism against these looming risks. This paper introduces an innovative
audio watermarking framework that encodes up to 32 bits of watermark within a
mere 1-second audio snippet. The watermark is imperceptible to human senses and
exhibits strong resilience against various attacks. It can serve as an
effective identifier for synthesized voices and holds potential for broader
applications in audio copyright protection. Moreover, this framework boasts
high flexibility, allowing for the combination of multiple watermark segments
to achieve heightened robustness and expanded capacity. Utilizing 10 to
20-second audio as the host, our approach demonstrates an average Bit Error
Rate (BER) of 0.48\% across ten common attacks, a remarkable reduction of over
2800\% in BER compared to the state-of-the-art watermarking tool. See
https://aka.ms/wavmark for demos of our work
Detecting Backdoors in Pre-trained Encoders
Self-supervised learning in computer vision trains on unlabeled data, such as
images or (image, text) pairs, to obtain an image encoder that learns
high-quality embeddings for input data. Emerging backdoor attacks towards
encoders expose crucial vulnerabilities of self-supervised learning, since
downstream classifiers (even further trained on clean data) may inherit
backdoor behaviors from encoders. Existing backdoor detection methods mainly
focus on supervised learning settings and cannot handle pre-trained encoders
especially when input labels are not available. In this paper, we propose
DECREE, the first backdoor detection approach for pre-trained encoders,
requiring neither classifier headers nor input labels. We evaluate DECREE on
over 400 encoders trojaned under 3 paradigms. We show the effectiveness of our
method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million
image-text pairs. Our method consistently has a high detection accuracy even if
we have only limited or no access to the pre-training dataset.Comment: Accepted at CVPR 2023. Code is available at
https://github.com/GiantSeaweed/DECRE
Solitary beam propagation in a nonlinear optical resonator enables high-efficiency pulse compression and mode self-cleaning
Generating intense ultrashort pulses with high-quality spatial modes is
crucial for ultrafast and strong-field science. This can be accomplished by
controlling propagation of femtosecond pulses under the influence of Kerr
nonlinearity and achieving stable propagation with high intensity. In this
work, we propose that the generation of spatial solitons in periodic layered
Kerr media can provide an optimum condition for supercontinuum generation and
pulse compression using multiple thin plates. With both the experimental and
theoretical investigations, we successfully identify these solitary modes and
reveal a universal relationship between the beam size and the critical
nonlinear phase. Space-time coupling is shown to strongly influence the
spectral, spatial and temporal profiles of femtosecond pulses. Taking advantage
of the unique characters of these solitary modes, we demonstrate single-stage
supercontinuum generation and compression of femtosecond pulses from initially
170 fs down to 22 fs with an efficiency ~90%. We also provide evidence of
efficient mode self-cleaning which suggests rich spatial-temporal
self-organization processes of laser beams in a nonlinear resonator
On-Site Quantification and Infection Risk Assessment of Airborne SARS-CoV-2 Virus Via a Nanoplasmonic Bioaerosol Sensing System in Healthcare Settings
On-site quantification and early-stage infection risk assessment of airborne severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with high spatiotemporal resolution is a promising approach for mitigating the spread of coronavirus disease 2019 (COVID-19) pandemic and informing life-saving decisions. Here, a condensation (hygroscopic growth)-assisted bioaerosol collection and plasmonic photothermal sensing (CAPS) system for on-site quantitative risk analysis of SARS-CoV-2 virus-laden aerosols is presented. The CAPS system provided rapid thermoplasmonic biosensing results after an aerosol-to-hydrosol sampling process in COVID-19-related environments including a hospital and a nursing home. The detection limit reached 0.25 copies/µL in the complex aerosol background without further purification. More importantly, the CAPS system enabled direct measurement of the SARS-CoV-2 virus exposures with high spatiotemporal resolution. Measurement and feedback of the results to healthcare workers and patients via a QR-code are completed within two hours. Based on a dose-responseµ model, it is used the plasmonic biosensing signal to calculate probabilities of SARS-CoV-2 infection risk and estimate maximum exposure durations to an acceptable risk threshold in different environmental settings
Semi-supervised affinity propagation based on density peaks
Zbog nezadovoljavajućeg učinka grupiranja (klasteriranja) pomoću algoritma grupiranja propagacijom afiniteta (AP - affinity propagation) u slučaju nizova podataka složene strukture, u radu se predlaže polu nadzirani algoritam grupiranja propagacije afiniteta temeljen na vršnoj gustoći (SAP-DP). Taj algoritam primjenjuje novi algoritam vršne gustoće (DP - density peaks) čija je prednost višestruko grupiranje uz polu-nadziranje, izgradnja udvojenih ograničenja zbog usklađivanja s matricom sličnosti, a zatim izvršenje grupiranja propagacijom afiniteta. Rezultati simulacijskih eksperimenata potvrdili su da je grupiranje predloženim algoritmom učinkovitije od grupiranja konvencionalnom propagacijom afiniteta (AP).In view of the unsatisfying clustering effect of affinity propagation (AP) clustering algorithm when dealing with data sets of complex structures, a semi-supervised affinity propagation clustering algorithm based on density peaks (SAP-DP) was proposed in this paper. The algorithm uses a new algorithm of density peaks (DP) which has the advantage of the manifold clustering with the idea of semi-supervised, builds pairwise constraints to adjust the similarity matrix, and then executes the AP clustering. The results of the simulation experiments validated that the proposed algorithm has better clustering performance compared with conventional AP
- …