56 research outputs found

    Poisoning Retrieval Corpora by Injecting Adversarial Passages

    Full text link
    Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but to what extent can they be safely deployed in real-world applications? In this work, we propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages by perturbing discrete tokens to maximize similarity with a provided set of training queries. When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems to retrieve them for queries that were not seen by the attacker. More surprisingly, these adversarial passages can directly generalize to out-of-domain queries and corpora with a high success attack rate -- for instance, we find that 50 generated passages optimized on Natural Questions can mislead >94% of questions posed in financial documents or online forums. We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised. Although different systems exhibit varying levels of vulnerability, we show they can all be successfully attacked by injecting up to 500 passages, a small fraction compared to a retrieval corpus of millions of passages.Comment: EMNLP 2023. Our code is available at https://github.com/princeton-nlp/corpus-poisonin

    Privacy Implications of Retrieval-Based Language Models

    Full text link
    Retrieval-based language models (LMs) have demonstrated improved interpretability, factuality, and adaptability compared to their parametric counterparts, by incorporating retrieved text from external datastores. While it is well known that parametric models are prone to leaking private data, it remains unclear how the addition of a retrieval datastore impacts model privacy. In this work, we present the first study of privacy risks in retrieval-based LMs, particularly kkNN-LMs. Our goal is to explore the optimal design and training procedure in domains where privacy is of concern, aiming to strike a balance between utility and privacy. Crucially, we find that kkNN-LMs are more susceptible to leaking private information from their private datastore than parametric models. We further explore mitigations of privacy risks. When privacy information is targeted and readily detected in the text, we find that a simple sanitization step would completely eliminate the risks, while decoupling query and key encoders achieves an even better utility-privacy trade-off. Otherwise, we consider strategies of mixing public and private data in both datastore and encoder training. While these methods offer modest improvements, they leave considerable room for future work. Together, our findings provide insights for practitioners to better understand and mitigate privacy risks in retrieval-based LMs. Our code is available at: https://github.com/Princeton-SysML/kNNLM_privacy

    Potential dopant in photocatalysis process for wastewater treatment-a review

    Get PDF
    Nowadays, too much pollution has happened around us, and one of them is water pollution, which each day has become more severe and worse. One of the sources of water pollution comes from the industry that has used dyes either excessively or not. In case of that, the wastewater needs to be treated before released to the river or environment. In this paper, a review of the wastewater treatment using dopants such as nitrogen and magnesium, will be discussed

    Distinguishing T1-2 and T3a tumors of rectal cancer with texture analysis and functional MRI parameters

    Get PDF
    PURPOSEWe aimed to investigate whether the texture analysis and functional magnetic resonance imaging (fMRI) could differentiate rectal cancer pathological stages T1-2 (pT1-2) and T3a (pT3a).METHODSEighty-two rectal adenocarcinoma patients at stage pT1-2 and pT3a received T2 and fMRI examination before surgery. The latter included apparent diffusion coefficient (ADC) sequence, dynamic contrast enhancement (DCE) MRI, and intravoxel incoherent motion (IVIM) diffusion weighted imaging. Patients were grouped into early stage (pT1-2) and advanced stage (pT3a). The MRI accuracy in diagnosing rectal cancer before surgery was calculated. The differences in clinicopathological variables, quantitative parameters including ADC values, IVIM parameters (perfusion fraction [f], true diffusion coefficient [D], and pseudo- diffusion coefficient [D*]), DCE MRI parameters (transfer constant [Ktrans], reflux constant [Kep], and extravascular extracellular fractional volume [Ve]), and texture features were compared between the groups. Receiver operating characteristic (ROC) curves of texture features and fMRI parameters were generated to distinguish pT1-2 and pT3a tumors. The multivariate analysis was used to develop a predictive model and to find independent risk factors. Hosmer–Lemeshow test was used to see the fitness of the model. DeLong test was applied to compare the ROC curves of different features. Correlation of texture features and fMRI parameters with stage were calculated using r (Spearman’s rank correlation coefficient).RESULTSThe preoperative accuracy in differentiating pT1-2 from pT3a rectal cancer using MRI was 74.39%. Kep, Ve, and ADC showed significant differences between the groups. Kep and ADC showed negative correlation with stage. Ve correlated positively with stage. Twenty-five texture features from T2 images showed significant differences between groups, and S(0,2)SumOfSqs and WavEnLH_s_2 among these showed better performance, showing negative correlation with stage. The area under the curve (AUC) values of S(0,2)SumOfSqs, WavEnLH_s_2, ADC, Kep, and Ve were 0.721, 0.699, 0.690, 0.666, and 0.653, respectively. The multivariate analysis showed that S(0,2) SumOfSqs, WavEnLH_s_2, and ADC are risk factors for advanced tumors, and the logistic model built by Kep, Ve, S(0,2)SumOfSqs, WavEnLH_s_2, and ADC has the AUC, sensitivity, and specificity of 0.833, 88.5%, and 73.3%, respectively. ROC curve of the model showed statistical significance between S(0,2)SumOfSqs, ADC, Kep, and Ve. The P value of the Hosmer–Lemeshow test was 0.65.CONCLUSIONS(0,2)SumOfSqs, WavEnLH_s_2, and ADC are risk factors for advanced rectal cancer, and the model built by Kep, Ve, S(0,2)SumOfSqs, WavEnLH_s_2, and ADC has better performance than using a single method. The application of above combinations could be beneficial to patients’ accurate and individualized treatments

    Dividend regulation and cost stickiness: evidence from a quasi-natural experiment

    No full text
    This paper aims to examine the effect of dividend regulation on cost stickiness (i.e. the asymmetric change in firm expense between sales increase and sales decrease) and explore the underlying mechanism. Based on the quasi-natural experiment of the Guideline for Dividend Policy of Listed Companies issued by the Shanghai Stock Exchange (SSE) in 2013, the authors employ a difference-in-difference model to investigate the impact of dividend regulation on cost stickiness. The authors find that the cost stickiness of treatment group firms has decreased significantly when compared with control group firms after the dividend regulation. Moreover, this effect is more pronounced among firms in lower marketization regions, in lower competition industries and those with less analyst coverage and lower cash flow levels. Further analyses show that dividend regulation reduces the cost stickiness of firms by mitigating agency problems. Finally, the conclusion holds after several robust tests, including controlling for firm fixed effect, propensity score matching (PSM), placebo test and reconstruction of expense variable. This paper confirms that dividend regulation serves an important role in corporate governance, which reduces firms' agency costs and thereby decreases cost stickiness. The conclusions shed light on the dividend policies of listed companies and capital market regulation in the future

    Realization of broadband polarization-insensitive negative refraction using water-based metamaterial

    No full text
    We propose a water-based metamaterial to realize the broadband polarization insensitive negative refraction. The designed metamaterial exhibits the multiple resonances in broadband region and displays negative permittivity and permeability simultaneously with a broadband negative refractive index. Simulated result shows that two separated wide bandwidths of negative refractive index are formed at 12.5–22.7 GHz, and 26.2–28.0 GHz, and the relative bandwidths of which are 58.0%, and 6.7%, respectively. In addition, beam shifting simulation is carried out to verify the retrieved effective refractive index from the scatter parameters, and the calculated results based on beam shifting simulation are agreed well with the retrieved effective refractive indices. Finally, the microwave measurement is performed to exam the simulated and calculated results, and three results of simulation, calculation, and measurement are consistent with each other. The design using water-based metamaterial provides an alternative approach to realize a broadband negative refraction

    Effect on Microstructure and Mechanical Properties of Microwave-Assisted Sintered H13 Steel Powder with Different Vanadium Contents

    No full text
    The present work demonstrated the first-ever preparation of block specimens by the microwave sintering of H13 alloy powder. Varying proportions of vanadium powder (1.5%, 2.5%, 3.5%, 4.5%, and 5.5% on a mass basis) were added to H13 mold steel and these mixtures were sintered using microwaves. X-ray fluorescence spectroscopy was employed to determine the compositions of the resulting specimens and vanadium percentages of 1.56%, 2.04%, 3.10%, 4.06%, and 4.20% were determined. These results demonstrate a clear trend, with significantly lower vanadium amounts than expected based on the nominal values at higher vanadium loadings. Different samples were also found to exhibit different degrees of ablation, and this effect was related to the presence of voids in the materials. The surface compositions of these specimens were examined by laser-induced breakdown spectroscopy and were found to be relatively uniform. The microstructures as well as the hardness properties of the materials were assessed. Microwave sintering of 100 g specimens at 1300 °C for 10-min generated samples with hardness values ranging from 205 HV (at the lowest vanadium content) to 175.2 HV (at the highest vanadium content). The wear behavior of samples prepared by microwave sintering H13 die steel with different vanadium contents at room temperature has been studied. The results showed that 1.5% vanadium content is the best mass ratio
    • …
    corecore