City Research Online

Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering

Whitehouse, C., Weyde, T. ORCID: 0000-0001-8028-9905 & Madhyastha, P. ORCID: 0000-0002-4438-8161 (2023). Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering. In: Findings of the Association for Computational Linguistics: EACL 2023. The 17th Conference of the European Chapter of the Association for Computational Linguistics, 2-6 May 2023, Dubrovnik, Croatia. doi: 10.18653/v1/2023.findings-eacl.126

Abstract

The field of visual question answering (VQA) has recently seen a surge in research focused on providing explanations for predicted answers. However, current systems mostly rely on separate models to predict answers and generate explanations, leading to less grounded and frequently inconsistent results. To address this, we propose a multitask learning approach towards a Unified Model for Answer and Explanation generation (UMAE). Our approach involves the addition of artificial prompt tokens to training data and fine-tuning a multimodal encoder-decoder model on a variety of VQA-related tasks. In our experiments, UMAE models surpass the prior state-of-the-art answer accuracy on A-OKVQA by 10∼15%, show competitive results on OK-VQA, achieve new state-of-the-art explanation scores on A-OKVQA and VCR, and demonstrate promising out-of-domain performance on VQA-X.

Publication Type: Conference or Workshop Item (Paper)
Subjects: H Social Sciences > HN Social history and conditions. Social problems. Social reform
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Science & Technology
School of Science & Technology > Computer Science
SWORD Depositor:
[thumbnail of 2023.findings-eacl.126.pdf]
Preview
Text - Published Version
Available under License Creative Commons Attribution.

Download (4MB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login