Nature-based Bengali Picture Captioning using Global Attention with GRU
Zohora, F. T., Biswas, S. ORCID: 0000-0002-6770-9845, Bairagi, A. K. & Sharif, K. (2024). Nature-based Bengali Picture Captioning using Global Attention with GRU. In: 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP). 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP), 22-25 Sep 2024, London, UK. doi: 10.1109/mlsp58920.2024.10734813
Abstract
Automatic picture captioning is a prominent re-search area of artificial intelligence technology (AI). Its ability to enhance AI models by translating observed data into human language opens up a wide range of real-time applications. In this study, we explore picture captioning in the Bengali language using the global attention mechanism. Given the limited prior research in this area, we provide a comprehensive assessment of two distinct global attention approaches: the general approach and the concatenation approach. Additionally, we evaluate the performance of two CNN encoders, VGG19 and InceptionV3,within these models. The models are trained on a secondary dataset consisting of 4,849 nature-based images, each annotated with a single caption, enabling the models to gain a broad understanding of related categorical information. To achieve our research objectives, we developed and trained four separate models using this new dataset. Our analysis, both qualitative and quantitative, demonstrates that these algorithms are capable of generating human-like captions for similar images. The results indicate that models using the concatenation approach, particu-larly with the InceptionV3encoder, performed best, achieving a BLEU-1 score of 84.85. In contrast, the model using the general approach with VGG19 underperformed in generating satisfactory captions.
Publication Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | © 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Publisher Keywords: | Bengali image captioning, Global attention, CNN, GRU, Attention Mechanism, Bengali dataset |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Departments: | School of Science & Technology School of Science & Technology > Computer Science |
SWORD Depositor: |
Download (1MB) | Preview
Export
Downloads
Downloads per month over past year