3 research outputs found

    VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

    Full text link
    We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for evaluation of instruction-following vision-language models for real-world use. Our starting point is curating 70 'instruction families' that we envision instruction tuned vision-language models should be able to address. Extending beyond evaluations like VQAv2 and COCO, tasks range from basic recognition to game playing and creative generation. Following curation, our dataset comprises 592 test queries, each with a human-authored instruction-conditioned caption. These descriptions surface instruction-specific factors, e.g., for an instruction asking about the accessibility of a storefront for wheelchair users, the instruction-conditioned caption describes ramps/potential obstacles. These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment. We quantify quality gaps between models and references using both human and automatic evaluations; e.g., the top-performing instruction-following model wins against the GPT-4 reference in just 27% of the comparison. VisIT-Bench is dynamic to participate, practitioners simply submit their model's response on the project website; Data, code and leaderboard is available at visit-bench.github.io

    Are aligned neural networks adversarially aligned?

    Full text link
    Large language models are now tuned to align with the goals of their creators, namely to be "helpful and harmless." These models should respond helpfully to user questions, but refuse to answer requests that could cause harm. However, adversarial users can construct inputs which circumvent attempts at alignment. In this work, we study to what extent these models remain aligned, even when interacting with an adversarial user who constructs worst-case inputs (adversarial examples). These inputs are designed to cause the model to emit harmful content that would otherwise be prohibited. We show that existing NLP-based optimization attacks are insufficiently powerful to reliably attack aligned text models: even when current NLP-based attacks fail, we can find adversarial inputs with brute force. As a result, the failure of current attacks should not be seen as proof that aligned text models remain aligned under adversarial inputs. However the recent trend in large-scale ML models is multimodal models that allow users to provide images that influence the text that is generated. We show these models can be easily attacked, i.e., induced to perform arbitrary un-aligned behavior through adversarial perturbation of the input image. We conjecture that improved NLP attacks may demonstrate this same level of adversarial control over text-only models

    Comparing the participation of men and women in academic medicine in medical colleges in Sudan: A cross-sectional survey

    No full text
    INTRODUCTION: In different countries around the world, the involvement of women in academic medicine was less in comparison with men. This study aimed to assess whether there were significant gender differences in research perception, practice, and publication in Sudan. METHODS: This was an analytical cross-sectional study was carried out using questionnaire among 153 teaching staff of five Sudanese medical faculties from both genders, including teaching assistants, lecturers, assistant professors, associate professors, and full professors. RESULTS: There were no significant differences among participants' gender regarding their universities, qualifications, research training received after graduation, and participation in research currently or in the past or current position, but female participants seem to be younger as their mean of age was 38.8 (±9.2) compared with 42.6 (±10.1) for males. Importantly, the males' researcher has not only published significantly more than females but also appeared to have significantly more years of research experience. The mean score of research perception was higher among male participants who indicated that they had a more favorable perception of research. CONCLUSION: The study showed that in Sudanese medical colleges significantly higher percentage of men published scientific papers more than women. In addition, the male also had a significantly higher mean score of research perception which indicated that they had a more favorable perception of research
    corecore