38,237 research outputs found
RITnet: Real-time Semantic Segmentation of the Eye for Gaze Tracking
Accurate eye segmentation can improve eye-gaze estimation and support
interactive computing based on visual attention; however, existing eye
segmentation methods suffer from issues such as person-dependent accuracy, lack
of robustness, and an inability to be run in real-time. Here, we present the
RITnet model, which is a deep neural network that combines U-Net and DenseNet.
RITnet is under 1 MB and achieves 95.3\% accuracy on the 2019 OpenEDS Semantic
Segmentation challenge. Using a GeForce GTX 1080 Ti, RITnet tracks at
300Hz, enabling real-time gaze tracking applications. Pre-trained models and
source code are available https://bitbucket.org/eye-ush/ritnet/.Comment: This model is the winning submission for OpenEDS Semantic
Segmentation Challenge for Eye images
https://research.fb.com/programs/openeds-challenge/. To appear in ICCVW 2019.
("Pre-trained models and source code are available
https://bitbucket.org/eye-ush/ritnet/."
RoCOCO: Robust Benchmark MS-COCO to Stress-test Robustness of Image-Text Matching Models
Recently, large-scale vision-language pre-training models and visual semantic
embedding methods have significantly improved image-text matching (ITM)
accuracy on MS COCO 5K test set. However, it is unclear how robust these
state-of-the-art (SOTA) models are when using them in the wild. In this paper,
we propose a novel evaluation benchmark to stress-test the robustness of ITM
models. To this end, we add various fooling images and captions to a retrieval
pool. Specifically, we change images by inserting unrelated images, and change
captions by substituting a noun, which can change the meaning of a sentence. We
discover that just adding these newly created images and captions to the test
set can degrade performances (i.e., Recall@1) of a wide range of SOTA models
(e.g., 81.9% 64.5% in BLIP, 66.1% 37.5% in
VSE). We expect that our findings can provide insights for improving
the robustness of the vision-language models and devising more diverse
stress-test methods in cross-modal retrieval task. Source code and dataset will
be available at https://github.com/pseulki/rococo
Elevating Code-mixed Text Handling through Auditory Information of Words
With the growing popularity of code-mixed data, there is an increasing need
for better handling of this type of data, which poses a number of challenges,
such as dealing with spelling variations, multiple languages, different
scripts, and a lack of resources. Current language models face difficulty in
effectively handling code-mixed data as they primarily focus on the semantic
representation of words and ignore the auditory phonetic features. This leads
to difficulties in handling spelling variations in code-mixed text. In this
paper, we propose an effective approach for creating language models for
handling code-mixed textual data using auditory information of words from
SOUNDEX. Our approach includes a pre-training step based on
masked-language-modelling, which includes SOUNDEX representations (SAMLM) and a
new method of providing input data to the pre-trained model. Through
experimentation on various code-mixed datasets (of different languages) for
sentiment, offensive and aggression classification tasks, we establish that our
novel language modeling approach (SAMLM) results in improved robustness towards
adversarial attacks on code-mixed classification tasks. Additionally, our SAMLM
based approach also results in better classification results over the popular
baselines for code-mixed tasks. We use the explainability technique, SHAP
(SHapley Additive exPlanations) to explain how the auditory features
incorporated through SAMLM assist the model to handle the code-mixed text
effectively and increase robustness against adversarial attacks
\footnote{Source code has been made available on
\url{https://github.com/20118/DefenseWithPhonetics},
\url{https://www.iitp.ac.in/~ai-nlp-ml/resources.html\#Phonetics}}.Comment: Accepted to EMNLP 202
TreeCaps: Tree-Based Capsule Networks for Source Code Processing
Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., Abstract Syntax Trees) and/or semantic information (e.g., Dependency Graphs). While graphs may be better at capturing various viewpoints of code semantics than trees, constructing graph inputs from code needs static code semantic analysis that may not be accurate and introduces noise during learning.
Although syntax trees are precisely defined according to the language grammar and easier to construct and process than graphs, previous tree-based learning techniques have not been able to learn semantic information from trees to achieve better accuracy than graph-based techniques. We propose a new learning technique, named TreeCaps, by fusing together capsule networks with tree-based convolutional neural networks, to achieve learning accuracy higher than existing graph-based techniques while it is based only on trees. TreeCaps introduces novel variable-to-static routing algorithms into the capsule networks to compensate for the loss of previous routing algorithms. Aside from accuracy, we also find that TreeCaps is the most robust to withstand those semantic-preserving program transformations that change code syntax without modifying the semantics. Evaluated on a large number of Java and C/C++ programs, TreeCaps models outperform prior deep learning models of program source code, in terms of both accuracy and robustness for program comprehension tasks such as code functionality classification and function name prediction. Our implementation are publicly available at: https://github.com/bdqnghi/treecaps
- …