Search CORE

107 research outputs found

BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading

Author: Chen Li
Chua Matthew Chin Heng
Hao Xuejie
Tian Jing
Xu Xin
Zhang Kerui
Zhao Ziyuan
Publication venue
Publication date: 01/07/2019
Field of study

Diabetic retinopathy (DR) is a common retinal disease that leads to blindness. For diagnosis purposes, DR image grading aims to provide automatic DR grade classification, which is not addressed in conventional research methods of binary DR image classification. Small objects in the eye images, like lesions and microaneurysms, are essential to DR grading in medical imaging, but they could easily be influenced by other objects. To address these challenges, we propose a new deep learning architecture, called BiRA-Net, which combines the attention model for feature extraction and bilinear model for fine-grained classification. Furthermore, in considering the distance between different grades of different DR categories, we propose a new loss function, called grading loss, which leads to improved training convergence of the proposed approach. Experimental results are provided to demonstrate the superior performance of the proposed approach.Comment: Accepted at ICIP 201

arXiv.org e-Print Archive

Crossref

Prompt-based Alignment of Headlines and Images Using OpenCLIP

Author: Bernstein Abraham
Chan Yuin Kwan
Heitz Lucien
Li Hongji
Rossetto Luca
Zeng Kerui
Publication venue: CEUR-WS
Publication date: 02/02/2024
Field of study

In this paper, we describe how we leverage OpenCLIP to generate automated image recommendations for online news articles for the MediaEval 2023 NewsImages task. By exploring different text prompting techniques, a total of five retrieval approaches were devised. Results show, however, that the best performing approach is an unmodified CLIP version with the raw article headline as input. We reflect on this finding and its implication for future NewsImages tasks

ZORA

Learning Unorthogonalized Matrices for Rotation Estimation

Author: Gu Kerui
Kawaguchi Kenji
Li Zhihao
Liu Jianzhuang
Liu Shiyong
Mi Michael Bi
Xu Songcen
Yan Youliang
Yao Angela
Publication venue
Publication date: 01/12/2023
Field of study

Estimating 3D rotations is a common procedure for 3D computer vision. The accuracy depends heavily on the rotation representation. One form of representation -- rotation matrices -- is popular due to its continuity, especially for pose estimation tasks. The learning process usually incorporates orthogonalization to ensure orthonormal matrices. Our work reveals, through gradient analysis, that common orthogonalization procedures based on the Gram-Schmidt process and singular value decomposition will slow down training efficiency. To this end, we advocate removing orthogonalization from the learning process and learning unorthogonalized `Pseudo' Rotation Matrices (PRoM). An optimization analysis shows that PRoM converges faster and to a better solution. By replacing the orthogonalization incorporated representation with our proposed PRoM in various rotation-related tasks, we achieve state-of-the-art results on large-scale benchmarks for human pose estimation

arXiv.org e-Print Archive