Search CORE

10 research outputs found

Visual question answering using external knowledge

Author: Gulganjalli Narasimhan Medhini
Publication venue
Publication date: 01/05/2019
Field of study

Accurately answering a question about a given image requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains an algorithmic challenge. To advance research in this direction, a novel `fact-based' visual question answering (FVQA) task has been introduced recently along with a large set of curated facts which link two entities, i.e., two possible answers, via a relation. Given a question-image pair, keyword matching techniques have been employed to successively reduce the large set of facts and were shown to yield compelling results despite being vulnerable to misconceptions due to synonyms and homographs. To overcome these shortcomings, we introduce two new approaches in this work. We develop a learning-based approach which goes straight to the facts via a learned embedding space. We demonstrate state-of-the-art results on the challenging recently introduced factbased visual question answering dataset, outperforming competing methods by more than 5%. Upon further analysis, we observe that a successive process which considers one fact at a time to form a local decision is sub-optimal. To counter this, in our second approach we develop an entity graph and use a graph convolutional network to `reason' about the correct answer by jointly considering all entities. We show on the FVQA dataset that this leads to an improvement in accuracy of around 7% compared to the state-of-the-art

Illinois Digital Environment for Access to Learning and Scholarship Repository

Predicting symptom severity and contagiousness of respiratory viral infections

Author: Aguiar-Pulido Vanessa
Mathee Kalai
Mehta Arpit
Narasimhan Giri
Narasimhan Medhini
Rajabli Farid
Vietri Giuseppe
Publication venue: FIU Digital Commons
Publication date: 11/07/2016
Field of study

This work aims at predicting the symptom severity and contagiousness of a person infected with respiratory virus, using time series gene expression data. Four different respiratory viruses were studied – RSV, H1N1, H3N2 and Rhinovirus. Predictive models were built for each virus for each time point. Partial least squares discriminant analysis was used for feature selection and random forest was used for classification. Certain genes were identified as biomarkers in distinguishing the subjects. Gene enrichment analysis was performed on the differentially expressed genes. Prediction accuracy values were high even when expression data from early time points were analyzed. Significant genes were detected as early as 5 and 10 hours post infection, as compared to prior work that did so at 29 hours post infection. The potential biomarkers obtained with the proposed approach need to be investigated further

DigitalCommons@Florida International University

Multimodal Long-Term Video Understanding

Author: Narasimhan Medhini Gulganjalli
Publication venue
Publication date: 01/08/2023
Field of study

Ezid

Multimodal Long-Term Video Understanding

Author: Narasimhan Medhini Gulganjalli
Publication venue
Publication date: 01/08/2023
Field of study

Ezid

Recommended from our members

Multimodal Long-Term Video Understanding

Author: Narasimhan Medhini Gulganjalli
Publication venue: eScholarship, University of California
Publication date: 01/01/2023
Field of study

The internet hosts an immense reservoir of videos, witnessing a constant influx of thousands ofuploads to platforms like YouTube every second. These videos represent a valuable repository of multimodal information, providing an invaluable resource for understanding audio-visual-text relationships. Moreover, understanding the content in long videos (think 2 hours), is an open problem. This thesis investigates the intricate interplay between diverse modalities—audio, visual, and textual—in videos and harnesses their potential for comprehending semantic nuances within long videos. My research explores diverse strategies for combining information from these modalities, leading to significant advancements in video summarization and instructional video analysis. The first part introduces an approach to synthesizing long video textures from short clips by rearranging segments coherently, while also considering audio conditioning. The second part discusses a novel technique for generating concise visual summaries of lengthy videos guided by natural language cues. Additionally, we focus specifically on summarizing instructional videos, capitalizing on audio-visual alignments and task structures to produce informative summaries.To further enrich the comprehension of instructional videos, the thesis introduces a cutting-edgeapproach that facilitates the learning and verification of procedural steps within instructional content, empowering the model to grasp long and complex video sequences and ensure procedural accuracy. Lastly, the potential of large language models is explored for answering questions about images through code generation. Through comprehensive experiments, the research demonstrates the efficacy of the proposed methodologies, envisioning promising future prospects in the field of semantics in long videos by integrating audio, visual, and textual relationships

eScholarship - University of California

Predicting symptom severity and contagiousness of respiratory viral infections

Author: Aguiar-Pulido Vanessa
Mathee Kalai
Mehta Arpit
Narasimhan Giri
Narasimhan Medhini
Rajabli Farid
Vietri Giuseppe
Publication venue: SelectedWorks
Publication date: 02/05/2020
Field of study

DigitalCommons@Florida International University