City Research Online

Nature-based Bengali Picture Captioning using Global Attention with GRU

Zohora, F. T., Biswas, S. ORCID: 0000-0002-6770-9845, Bairagi, A. K. & Sharif, K. (2024). Nature-based Bengali Picture Captioning using Global Attention with GRU. In: 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP). 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP), 22-25 Sep 2024, London, UK. doi: 10.1109/mlsp58920.2024.10734813

Abstract

Automatic picture captioning is a prominent re-search area of artificial intelligence technology (AI). Its ability to enhance AI models by translating observed data into human language opens up a wide range of real-time applications. In this study, we explore picture captioning in the Bengali language using the global attention mechanism. Given the limited prior research in this area, we provide a comprehensive assessment of two distinct global attention approaches: the general approach and the concatenation approach. Additionally, we evaluate the performance of two CNN encoders, VGG19 and InceptionV3,within these models. The models are trained on a secondary dataset consisting of 4,849 nature-based images, each annotated with a single caption, enabling the models to gain a broad understanding of related categorical information. To achieve our research objectives, we developed and trained four separate models using this new dataset. Our analysis, both qualitative and quantitative, demonstrates that these algorithms are capable of generating human-like captions for similar images. The results indicate that models using the concatenation approach, particu-larly with the InceptionV3encoder, performed best, achieving a BLEU-1 score of 84.85. In contrast, the model using the general approach with VGG19 underperformed in generating satisfactory captions.

Publication Type: Conference or Workshop Item (Paper)
Additional Information: © 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Publisher Keywords: Bengali image captioning, Global attention, CNN, GRU, Attention Mechanism, Bengali dataset
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Science & Technology
School of Science & Technology > Computer Science
SWORD Depositor:
[thumbnail of Fateema_NU_Image_Captioning.pdf]
Preview
Text - Accepted Version
Download (1MB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login