Read, spot and translate

Specia, L.; Wang, J.; Lee, S. J.; Ostapenko, A.; Madhyastha, P.

Read, spot and translate

Specia, L., Wang, J., Lee, S. J. , Ostapenko, A. & Madhyastha, P. ORCID: 0000-0002-4438-8161 (2021). Read, spot and translate. Machine Translation, 35(2), pp. 145-165. doi: 10.1007/s10590-021-09259-z

Abstract

We propose multimodal machine translation (MMT) approaches that exploit the correspondences between words and image regions. In contrast to existing work, our referential grounding method considers objects as the visual unit for grounding, rather than whole images or abstract image regions, and performs visual grounding in the source language, rather than at the decoding stage via attention. We explore two referential grounding approaches: (i) implicit grounding, where the model jointly learns how to ground the source language in the visual representation and to translate; and (ii) explicit grounding, where grounding is performed independent of the translation model, and is subsequently used to guide machine translation. We performed experiments on the Multi30K dataset for three language pairs: English–German, English–French and English–Czech. Our referential grounding models outperform existing MMT models according to automatic and human evaluation metrics.

Publication Type:	Article
Additional Information:	This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher Keywords:	Multimodal machine learning, Multimodal machine translation
Departments:	School of Science & Technology > Department of Computer Science
SWORD Depositor:	Symplectic Administrator

Preview

Text - Published Version
Available under License Creative Commons: Attribution International Public License 4.0.
Download (2MB) | Preview

Official URL: https://doi.org/10.1007/s10590-021-09259-z

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

Altmetric

View Altmetric information about this item.

Funder Information

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Specia, L. Wang, J. Lee, S. J. Ostapenko, A. Madhyastha, P. ORCID: 0000-0002-4438-8161
Status:	Published
Refereed:	Yes
Journal or Publication Title:	Machine Translation
Publisher:	Springer Science and Business Media LLC
ISSN:	0922-6567
e-ISSN:	1573-0573
URI:	https://openaccess.city.ac.uk/id/eprint/29129
Date available in CRO:	11 Nov 2022 11:52
Date deposited:	1 November 2022
Dates:	Date Event 11 February 2021 Accepted 4 April 2021 Published Online 1 June 2021 Published