3 research outputs found
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for
evaluation of instruction-following vision-language models for real-world use.
Our starting point is curating 70 'instruction families' that we envision
instruction tuned vision-language models should be able to address. Extending
beyond evaluations like VQAv2 and COCO, tasks range from basic recognition to
game playing and creative generation. Following curation, our dataset comprises
592 test queries, each with a human-authored instruction-conditioned caption.
These descriptions surface instruction-specific factors, e.g., for an
instruction asking about the accessibility of a storefront for wheelchair
users, the instruction-conditioned caption describes ramps/potential obstacles.
These descriptions enable 1) collecting human-verified reference outputs for
each instance; and 2) automatic evaluation of candidate multimodal generations
using a text-only LLM, aligning with human judgment. We quantify quality gaps
between models and references using both human and automatic evaluations; e.g.,
the top-performing instruction-following model wins against the GPT-4 reference
in just 27% of the comparison. VisIT-Bench is dynamic to participate,
practitioners simply submit their model's response on the project website;
Data, code and leaderboard is available at visit-bench.github.io
Are aligned neural networks adversarially aligned?
Large language models are now tuned to align with the goals of their
creators, namely to be "helpful and harmless." These models should respond
helpfully to user questions, but refuse to answer requests that could cause
harm. However, adversarial users can construct inputs which circumvent attempts
at alignment. In this work, we study to what extent these models remain
aligned, even when interacting with an adversarial user who constructs
worst-case inputs (adversarial examples). These inputs are designed to cause
the model to emit harmful content that would otherwise be prohibited. We show
that existing NLP-based optimization attacks are insufficiently powerful to
reliably attack aligned text models: even when current NLP-based attacks fail,
we can find adversarial inputs with brute force. As a result, the failure of
current attacks should not be seen as proof that aligned text models remain
aligned under adversarial inputs.
However the recent trend in large-scale ML models is multimodal models that
allow users to provide images that influence the text that is generated. We
show these models can be easily attacked, i.e., induced to perform arbitrary
un-aligned behavior through adversarial perturbation of the input image. We
conjecture that improved NLP attacks may demonstrate this same level of
adversarial control over text-only models
Comparing the participation of men and women in academic medicine in medical colleges in Sudan: A cross-sectional survey
INTRODUCTION: In different countries around the world, the involvement of women in academic medicine was less in comparison with men. This study aimed to assess whether there were significant gender differences in research perception, practice, and publication in Sudan.
METHODS: This was an analytical cross-sectional study was carried out using questionnaire among 153 teaching staff of five Sudanese medical faculties from both genders, including teaching assistants, lecturers, assistant professors, associate professors, and full professors.
RESULTS: There were no significant differences among participants' gender regarding their universities, qualifications, research training received after graduation, and participation in research currently or in the past or current position, but female participants seem to be younger as their mean of age was 38.8 (±9.2) compared with 42.6 (±10.1) for males. Importantly, the males' researcher has not only published significantly more than females but also appeared to have significantly more years of research experience. The mean score of research perception was higher among male participants who indicated that they had a more favorable perception of research.
CONCLUSION: The study showed that in Sudanese medical colleges significantly higher percentage of men published scientific papers more than women. In addition, the male also had a significantly higher mean score of research perception which indicated that they had a more favorable perception of research