Supporting the current trend in the AI community, we present the AI Journey
2021 Challenge called Fusion Brain, the first competition which is targeted to
make the universal architecture which could process different modalities (in
this case, images, texts, and code) and solve multiple tasks for vision and
language. The Fusion Brain Challenge combines the following specific tasks:
Code2code Translation, Handwritten Text recognition, Zero-shot Object
Detection, and Visual Question Answering. We have created datasets for each
task to test the participants' submissions on it. Moreover, we have collected
and made publicly available a new handwritten dataset in both English and
Russian, which consists of 94,128 pairs of images and texts. We also propose a
multimodal and multitask architecture - a baseline solution, in the center of
which is a frozen foundation model and which has been trained in Fusion mode
along with Single-task mode. The proposed Fusion approach proves to be
competitive and more energy-efficient compared to the task-specific one