2,015 research outputs found
Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval
In this paper, we investigate the problem of zero-shot sketch-based image
retrieval (ZS-SBIR), where human sketches are used as queries to conduct
retrieval of photos from unseen categories. We importantly advance prior arts
by proposing a novel ZS-SBIR scenario that represents a firm step forward in
its practical application. The new setting uniquely recognizes two important
yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap
between amateur sketch and photo, and (ii) the necessity for moving towards
large-scale retrieval. We first contribute to the community a novel ZS-SBIR
dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000
photos spanning across 110 categories. Highly abstract amateur human sketches
are purposefully sourced to maximize the domain gap, instead of ones included
in existing datasets that can often be semi-photorealistic. We then formulate a
ZS-SBIR framework to jointly model sketches and photos into a common embedding
space. A novel strategy to mine the mutual information among domains is
specifically engineered to alleviate the domain gap. External semantic
knowledge is further embedded to aid semantic transfer. We show that, rather
surprisingly, retrieval performance significantly outperforms that of
state-of-the-art on existing datasets that can already be achieved using a
reduced version of our model. We further demonstrate the superior performance
of our full model by comparing with a number of alternatives on the newly
proposed dataset. The new dataset, plus all training and testing code of our
model, will be publicly released to facilitate future researchComment: Oral paper in CVPR 201
Improving accuracy and speeding up Document Image Classification through parallel systems
This paper presents a study showing the benefits of the EfficientNet models
compared with heavier Convolutional Neural Networks (CNNs) in the Document
Classification task, essential problem in the digitalization process of
institutions. We show in the RVL-CDIP dataset that we can improve previous
results with a much lighter model and present its transfer learning
capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we
present an ensemble pipeline which is able to boost solely image input by
combining image model predictions with the ones generated by BERT model on
extracted text by OCR. We also show that the batch size can be effectively
increased without hindering its accuracy so that the training process can be
sped up by parallelizing throughout multiple GPUs, decreasing the computational
time needed. Lastly, we expose the training performance differences between
PyTorch and Tensorflow Deep Learning frameworks
Multilingual Text Representation
Modern NLP breakthrough includes large multilingual models capable of
performing tasks across more than 100 languages. State-of-the-art language
models came a long way, starting from the simple one-hot representation of
words capable of performing tasks like natural language understanding,
common-sense reasoning, or question-answering, thus capturing both the syntax
and semantics of texts. At the same time, language models are expanding beyond
our known language boundary, even competitively performing over very
low-resource dialects of endangered languages. However, there are still
problems to solve to ensure an equitable representation of texts through a
unified modeling space across language and speakers. In this survey, we shed
light on this iterative progression of multilingual text representation and
discuss the driving factors that ultimately led to the current
state-of-the-art. Subsequently, we discuss how the full potential of language
democratization could be obtained, reaching beyond the known limits and what is
the scope of improvement in that space.Comment: PhD Comprehensive exam repor
- …