1,140 research outputs found

    Inferring the location of neurons within an artificial network from their activity

    Get PDF
    Inferring the connectivity of biological neural networks from neural activation data is an open problem. We propose that the analogous problem in artificial neural networks is more amenable to study and may illuminate the biological case. Here, we study the specific problem of assigning artificial neurons to locations in a network of known architecture, specifically the LeNet image classifier. We evaluate a supervised learning approach based on features derived from the eigenvectors of the activation correlation matrix. Experiments highlighted that for an image dataset to be effective for accurate localisation, it should fully activate the network and contain minimal confounding correlations. No single image dataset was found that resulted in perfect assignment, however perfect assignment was achieved using a concatenation of features from multiple image datasets

    Difference magnitude is not measured by discrimination steps for order of point patterns

    Get PDF
    We have shown in previous work that the perception of order in point patterns is consistent with an interval scale structure (Protonotarios, Baum, Johnston, Hunter, & Griffin, 2014). The psychophysical scaling method used relies on the confusion between stimuli with similar levels of order, and the resulting discrimination scale is expressed in just-noticeable differences (jnds). As with other perceptual dimensions, an interesting question is whether suprathreshold (perceptual) differences are consistent with distances between stimuli on the discrimination scale. To test that, we collected discrimination data, and data based on comparison of perceptual differences. The stimuli were jittered square lattices of dots, covering the range from total disorder (Poisson) to perfect order (square lattice), roughly equally spaced on the discrimination scale. Observers picked the most ordered pattern from a pair, and the pair of patterns with the greatest difference in order from two pairs. Although the judgments of perceptual difference were found to be consistent with an interval scale, like the discrimination judgments, no common interval scale that could predict both sets of data was possible. In particular, the midpattern of the perceptual scale is 11 jnds away from the ordered end, and 5 jnds from the disordered end of the discrimination scale

    Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

    Full text link
    Spurred by the recent rapid increase in the development and distribution of large language models (LLMs) across industry and academia, much recent work has drawn attention to safety- and security-related threats and vulnerabilities of LLMs, including in the context of potentially criminal activities. Specifically, it has been shown that LLMs can be misused for fraud, impersonation, and the generation of malware; while other authors have considered the more general problem of AI alignment. It is important that developers and practitioners alike are aware of security-related problems with such models. In this paper, we provide an overview of existing - predominantly scientific - efforts on identifying and mitigating threats and vulnerabilities arising from LLMs. We present a taxonomy describing the relationship between threats caused by the generative capabilities of LLMs, prevention measures intended to address such threats, and vulnerabilities arising from imperfect prevention measures. With our work, we hope to raise awareness of the limitations of LLMs in light of such security concerns, among both experienced developers and novel users of such technologies.Comment: Pre-prin

    Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

    Get PDF
    Recent efforts have shown that neural text processing models are vulnerable to adversarial examples, but the nature of these examples is poorly understood. In this work, we show that adversarial attacks against CNN, LSTM and Transformer-based classification models perform word substitutions that are identifiable through frequency differences between replaced words and their corresponding substitutions. Based on these findings, we propose frequency-guided word substitutions (FGWS), a simple algorithm exploiting the frequency properties of adversarial word substitutions for the detection of adversarial examples. FGWS achieves strong performance by accurately detecting adversarial examples on the SST-2 and IMDb sentiment datasets, with F1 detection scores of up to 91.4% against RoBERTa-based classification models. We compare our approach against a recently proposed perturbation discrimination framework and show that we outperform it by up to 13.0% F1.Comment: EACL 2021 camera-read
    • …
    corecore