Search CORE

257 research outputs found

Topic driven multimodal similarity learning with multi-view voted convolutional features

Author: Gao Xinjian
Goulermas John Y
Mu Tingting
Wang Meng
Publication venue: 'Elsevier BV'
Publication date: 09/03/2017
Field of study

University of Liverpool Repository

The University of Manchester - Institutional Repository

Modeling Visual Rhetoric and Semantics in Multimedia

Author: Thomas Christopher
Publication venue
Publication date: 16/09/2020
Field of study

Recent advances in machine learning have enabled computer vision algorithms to model complicated visual phenomena with accuracies unthinkable a mere decade ago. Their high-performance on a plethora of vision-related tasks has enabled computer vision researchers to begin to move beyond traditional visual recognition problems to tasks requiring higher-level image understanding. However, most computer vision research still focuses on describing what images, text, or other media literally portrays. In contrast, in this dissertation we focus on learning how and why such content is portrayed. Rather than viewing media for its content, we recast the problem as understanding visual communication and visual rhetoric. For example, the same content may be portrayed in different ways in order to present the story the author wishes to convey. We thus seek to model not only the content of the media, but its authorial intent and latent messaging. Understanding how and why visual content is portrayed a certain way requires understanding higher level abstract semantic concepts which are themselves latent within visual media. By latent, we mean the concept is not readily visually accessible within a single image (e.g. right vs left political bias), in contrast to explicit visual semantic concepts such as objects. Specifically, we study the problems of modeling photographic style (how professional photographers portray their subjects), understanding visual persuasion in image advertisements, modeling political bias in multimedia (image and text) news articles, and learning cross-modal semantic representations. While most past research in vision and natural language processing studies the case where visual content and paired text are highly aligned (as in the case of image captions), we target the case where each modality conveys complementary information to tell a larger story. We particularly focus on the problem of learning cross-modal representations from multimedia exhibiting weak alignment between the image and text modalities. A variety of techniques are presented which improve modeling of multimedia rhetoric in real-world data and enable more robust artificially intelligent systems

D-Scholarship@Pitt

An Interpretable Deep Architecture for Similarity Learning Built Upon Hierarchical Concepts

Author: Gao Xinjian
Goulermas John
Mu Tingting
Thiyagalingam Jeyarajan
Wang Meng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/01/2020
Field of study

In general, development of adequately complex mathematical models, such as deep neural networks, can be an effective way to improve the accuracy of learning models. However, this is achieved at the cost of reduced post-hoc model interpretability, because what is learned by the model can become less intelligible and tractable to humans as the model complexity increases. In this paper, we target a similarity learning task in the context of image retrieval, with a focus on the model interpretability issue. An effective similarity neural network (SNN) is proposed not only to seek robust retrieval performance but also to achieve satisfactory post-hoc interpretability. The network is designed by linking the neuron architecture with the organization of a concept tree and by formulating neuron operations to pass similarity information between concepts. Various ways of understanding and visualizing what is learned by the SNN neurons are proposed. We also exhaustively evaluate the proposed approach using a number of relevant datasets against a number of state-of-the-art approaches to demonstrate the effectiveness of the proposed network. Our results show that the proposed approach can offer superior performance when compared against state-of-the-art approaches. Neuron visualization results are demonstrated to support the understanding of the trained neurons

University of Liverpool Repository

Affective Image Content Analysis: Two Decades Review and New Perspectives

Author: Chua Tat-Seng
Ding Guiguang
Jia Guoli
Keutzer Kurt
Schuller Björn W.
Yang Jufeng
Yao Xingxu
Zhao Sicheng
Publication venue
Publication date: 01/01/2021
Field of study

Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.Comment: Accepted by IEEE TPAM

arXiv.org e-Print Archive

OPUS Augsburg

Affective image content analysis: two decades review and new perspectives

Author: Chua Tat-Seng
Ding Guiguang
Jia Guoli
Keutzer Kurt
Schuller Björn W.
Yang Jufeng
Yao Xingxu
Zhao Sicheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

OPUS Augsburg

Automated scholarly paper review: Technologies and challenges

Author: Chen Yidong
Lin Jialiang
Shi Xiaodong
Song Jiaxin
Zhou Zhangping
Publication venue
Publication date: 27/04/2022
Field of study

Peer review is a widely accepted mechanism for research evaluation, playing a pivotal role in scholarly publishing. However, criticisms have long been leveled on this mechanism, mostly because of its inefficiency and subjectivity. Recent years have seen the application of artificial intelligence (AI) in assisting the peer review process. Nonetheless, with the involvement of humans, such limitations remain inevitable. In this review paper, we propose the concept and pipeline of automated scholarly paper review (ASPR) and review the relevant literature and technologies of achieving a full-scale computerized review process. On the basis of the review and discussion, we conclude that there is already corresponding research and implementation at each stage of ASPR. We further look into the challenges in ASPR with the existing technologies. The major difficulties lie in imperfect document parsing and representation, inadequate data, defective human-computer interaction and flawed deep logical reasoning. Moreover, we discuss the possible moral & ethical issues and point out the future directions of ASPR. In the foreseeable future, ASPR and peer review will coexist in a reinforcing manner before ASPR is able to fully undertake the reviewing workload from humans

arXiv.org e-Print Archive

Semantics-Driven Large-Scale 3D Scene Retrieval

Author: Yuan Juefei
Publication venue: The Aquila Digital Community
Publication date: 01/08/2021
Field of study

Aquila Digital Community

Analysis of community question‐answering issues via machine learning and deep learning: State‐of‐the‐art review

Author: Banerjee Snehasish
Gutub Adnan
Roy Pradeep Kumar
Saumya Sunil
Singh Jyoti Prakash
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 04/05/2022
Field of study

Over the last couple of decades, community question-answering sites (CQAs) have been a topic of much academic interest. Scholars have often leveraged traditional machine learning (ML) and deep learning (DL) to explore the ever-growing volume of content that CQAs engender. To clarify the current state of the CQA literature that has used ML and DL, this paper reports a systematic literature review. The goal is to summarise and synthesise the major themes of CQA research related to (i) questions, (ii) answers and (iii) users. The final review included 133 articles. Dominant research themes include question quality, answer quality, and expert identification. In terms of dataset, some of the most widely studied platforms include Yahoo! Answers, Stack Exchange and Stack Overflow. The scope of most articles was confined to just one platform with few cross-platform investigations. Articles with ML outnumber those with DL. Nonetheless, the use of DL in CQA research is on an upward trajectory. A number of research directions are proposed

White Rose Research Online

Recommended from our members

Self-supervised multicontrast super-resolution for diffusion-weighted prostate MRI

Author: Chatterjee Aritrick
Engelmann Roger
Gundogdu Batuhan
Karczmar Gregory S.
Lee Grace
Medved Milica
Oren Nisa C.
Oto Aytekin
Rosado Avery
Publication venue
Publication date: 04/02/2024
Field of study

Purpose: This study addresses the challenge of low resolution and signal-to-noise ratio (SNR) in diffusion-weighted images (DWI), which are pivotal for cancer detection. Traditional methods increase SNR at high b-values through multiple acquisitions, but this results in diminished image resolution due to motion-induced variations. Our research aims to enhance spatial resolution by exploiting the global structure within multicontrast DWI scans and millimetric motion between acquisitions. Methods: We introduce a novel approach employing a "Perturbation Network" to learn subvoxel-size motions between scans, trained jointly with an implicit neural representation (INR) network. INR encodes the DWI as a continuous volumetric function, treating voxel intensities of low-resolution acquisitions as discrete samples. By evaluating this function with a finer grid, our model predicts higher-resolution signal intensities for intermediate voxel locations. The Perturbation Network's motion-correction efficacy was validated through experiments on biological phantoms and in vivo prostate scans. Results: Quantitative analyses revealed significantly higher structural similarity measures of super-resolution images to ground truth high-resolution images compared to high-order interpolation (p Conclusion: High-resolution details in DWI can be obtained without the need for high-resolution training data. One notable advantage of the proposed method is that it does not require a super-resolution training set. This is important in clinical practice because the proposed method can easily be adapted to images with different scanner settings or body parts, whereas the supervised methods do not offer such an option.</p

Knowledge UChicago