105 research outputs found

    Visual Question Answering: Exploring Trade-offs Between Task Accuracy and Explainability

    Get PDF
    Given visual input and a natural language question about it, the visual question answering (VQA) task is to answer the question correctly. To improve a system\u27s reliability and trustworthiness, it is imperative that it links the text (question and answer) to specific visual regions. This dissertation first explores the VQA task in a multi-modal setting where questions are based on video as well as subtitles. An algorithm is introduced to process each modality and their features are fused to solve the task. Additionally, to understand the model\u27s emphasis on visual data, this study collects a diagnostic set of questions which strictly require the knowledge of visual input based on a human annotator\u27s judgment. The next phase of this research deals with grounding in VQA systems without any detectors or object annotations. To this end, weak supervision is employed for grounding by training on the VQA task alone. In the initial part of this study, a rubric is provided to measure the grounding performance. This reveals that high accuracy is no guarantee for good grounding, i.e., the system is getting the correct answer despite not attending to the visual evidence. Techniques are introduced to improve VQA grounding by combining attention and capsule networks. This approach benefits the grounding ability in both CNNs and transformers. Lastly, we focus on question answering in videos. By depicting activities and objects as well as their relationships as a graph, a video can be represented compactly capturing necessary information to produce an answer. An algorithm is devised that learns to construct such graphs and uses question-to-graph attention; this solution obtains significant improvement for complex reasoning-based questions on STAR and AGQA benchmarks. Hence, by obtaining higher accuracy and better grounding, this dissertation bridges the gap between task accuracy and explainability of reasoning in VQA systems

    Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence

    Get PDF
    A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context
    • …
    corecore