120 research outputs found

    Near-Optimal MNL Bandits Under Risk Criteria

    Full text link
    We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk criteria are more general goals widely used in industries and bussiness. We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and prove that they suffer a near-optimal regret. As a complement, we also conduct experiments with both synthetic and real data to show the empirical performance of our proposed algorithms.Comment: AAAI202

    Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs

    Full text link
    Large Language Models (LLMs) are now widely used in various applications, making it crucial to align their ethical standards with human values. However, recent jail-breaking methods demonstrate that this alignment can be undermined using carefully constructed prompts. In our study, we reveal a new threat to LLM alignment when a bad actor has access to the model's output logits, a common feature in both open-source LLMs and many commercial LLM APIs (e.g., certain GPT models). It does not rely on crafting specific prompts. Instead, it exploits the fact that even when an LLM rejects a toxic request, a harmful response often hides deep in the output logits. By forcefully selecting lower-ranked output tokens during the auto-regressive generation process at a few critical output positions, we can compel the model to reveal these hidden responses. We term this process model interrogation. This approach differs from and outperforms jail-breaking methods, achieving 92% effectiveness compared to 62%, and is 10 to 20 times faster. The harmful content uncovered through our method is more relevant, complete, and clear. Additionally, it can complement jail-breaking strategies, with which results in further boosting attack performance. Our findings indicate that interrogation can extract toxic knowledge even from models specifically designed for coding tasks

    Opening A Pandora's Box: Things You Should Know in the Era of Custom GPTs

    Full text link
    The emergence of large language models (LLMs) has significantly accelerated the development of a wide range of applications across various fields. There is a growing trend in the construction of specialized platforms based on LLMs, such as the newly introduced custom GPTs by OpenAI. While custom GPTs provide various functionalities like web browsing and code execution, they also introduce significant security threats. In this paper, we conduct a comprehensive analysis of the security and privacy issues arising from the custom GPT platform. Our systematic examination categorizes potential attack scenarios into three threat models based on the role of the malicious actor, and identifies critical data exchange channels in custom GPTs. Utilizing the STRIDE threat modeling framework, we identify 26 potential attack vectors, with 19 being partially or fully validated in real-world settings. Our findings emphasize the urgent need for robust security and privacy measures in the custom GPT ecosystem, especially in light of the forthcoming launch of the official GPT store by OpenAI

    WavMark: Watermarking for Audio Generation

    Full text link
    Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism. Alongside its potential benefits, this powerful technology introduces notable risks, including voice fraud and speaker impersonation. Unlike the conventional approach of solely relying on passive methods for detecting synthetic data, watermarking presents a proactive and robust defence mechanism against these looming risks. This paper introduces an innovative audio watermarking framework that encodes up to 32 bits of watermark within a mere 1-second audio snippet. The watermark is imperceptible to human senses and exhibits strong resilience against various attacks. It can serve as an effective identifier for synthesized voices and holds potential for broader applications in audio copyright protection. Moreover, this framework boasts high flexibility, allowing for the combination of multiple watermark segments to achieve heightened robustness and expanded capacity. Utilizing 10 to 20-second audio as the host, our approach demonstrates an average Bit Error Rate (BER) of 0.48\% across ten common attacks, a remarkable reduction of over 2800\% in BER compared to the state-of-the-art watermarking tool. See https://aka.ms/wavmark for demos of our work

    Detecting Backdoors in Pre-trained Encoders

    Full text link
    Self-supervised learning in computer vision trains on unlabeled data, such as images or (image, text) pairs, to obtain an image encoder that learns high-quality embeddings for input data. Emerging backdoor attacks towards encoders expose crucial vulnerabilities of self-supervised learning, since downstream classifiers (even further trained on clean data) may inherit backdoor behaviors from encoders. Existing backdoor detection methods mainly focus on supervised learning settings and cannot handle pre-trained encoders especially when input labels are not available. In this paper, we propose DECREE, the first backdoor detection approach for pre-trained encoders, requiring neither classifier headers nor input labels. We evaluate DECREE on over 400 encoders trojaned under 3 paradigms. We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs. Our method consistently has a high detection accuracy even if we have only limited or no access to the pre-training dataset.Comment: Accepted at CVPR 2023. Code is available at https://github.com/GiantSeaweed/DECRE

    Solitary beam propagation in a nonlinear optical resonator enables high-efficiency pulse compression and mode self-cleaning

    Full text link
    Generating intense ultrashort pulses with high-quality spatial modes is crucial for ultrafast and strong-field science. This can be accomplished by controlling propagation of femtosecond pulses under the influence of Kerr nonlinearity and achieving stable propagation with high intensity. In this work, we propose that the generation of spatial solitons in periodic layered Kerr media can provide an optimum condition for supercontinuum generation and pulse compression using multiple thin plates. With both the experimental and theoretical investigations, we successfully identify these solitary modes and reveal a universal relationship between the beam size and the critical nonlinear phase. Space-time coupling is shown to strongly influence the spectral, spatial and temporal profiles of femtosecond pulses. Taking advantage of the unique characters of these solitary modes, we demonstrate single-stage supercontinuum generation and compression of femtosecond pulses from initially 170 fs down to 22 fs with an efficiency ~90%. We also provide evidence of efficient mode self-cleaning which suggests rich spatial-temporal self-organization processes of laser beams in a nonlinear resonator

    On-Site Quantification and Infection Risk Assessment of Airborne SARS-CoV-2 Virus Via a Nanoplasmonic Bioaerosol Sensing System in Healthcare Settings

    Full text link
    On-site quantification and early-stage infection risk assessment of airborne severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with high spatiotemporal resolution is a promising approach for mitigating the spread of coronavirus disease 2019 (COVID-19) pandemic and informing life-saving decisions. Here, a condensation (hygroscopic growth)-assisted bioaerosol collection and plasmonic photothermal sensing (CAPS) system for on-site quantitative risk analysis of SARS-CoV-2 virus-laden aerosols is presented. The CAPS system provided rapid thermoplasmonic biosensing results after an aerosol-to-hydrosol sampling process in COVID-19-related environments including a hospital and a nursing home. The detection limit reached 0.25 copies/µL in the complex aerosol background without further purification. More importantly, the CAPS system enabled direct measurement of the SARS-CoV-2 virus exposures with high spatiotemporal resolution. Measurement and feedback of the results to healthcare workers and patients via a QR-code are completed within two hours. Based on a dose-responseµ model, it is used the plasmonic biosensing signal to calculate probabilities of SARS-CoV-2 infection risk and estimate maximum exposure durations to an acceptable risk threshold in different environmental settings

    Semi-supervised affinity propagation based on density peaks

    Get PDF
    Zbog nezadovoljavajućeg učinka grupiranja (klasteriranja) pomoću algoritma grupiranja propagacijom afiniteta (AP - affinity propagation) u slučaju nizova podataka složene strukture, u radu se predlaže polu nadzirani algoritam grupiranja propagacije afiniteta temeljen na vršnoj gustoći (SAP-DP). Taj algoritam primjenjuje novi algoritam vršne gustoće (DP - density peaks) čija je prednost višestruko grupiranje uz polu-nadziranje, izgradnja udvojenih ograničenja zbog usklađivanja s matricom sličnosti, a zatim izvršenje grupiranja propagacijom afiniteta. Rezultati simulacijskih eksperimenata potvrdili su da je grupiranje predloženim algoritmom učinkovitije od grupiranja konvencionalnom propagacijom afiniteta (AP).In view of the unsatisfying clustering effect of affinity propagation (AP) clustering algorithm when dealing with data sets of complex structures, a semi-supervised affinity propagation clustering algorithm based on density peaks (SAP-DP) was proposed in this paper. The algorithm uses a new algorithm of density peaks (DP) which has the advantage of the manifold clustering with the idea of semi-supervised, builds pairwise constraints to adjust the similarity matrix, and then executes the AP clustering. The results of the simulation experiments validated that the proposed algorithm has better clustering performance compared with conventional AP
    corecore