ORSA-T: Multi-View Object-Centric Scene Representation Learning with Slot Attention and Transformer
Placek, H., Child, C. H. T. ORCID: 0000-0001-5425-2308 & Weyde, T.
ORCID: 0000-0001-8028-9905 (2025).
ORSA-T: Multi-View Object-Centric Scene Representation Learning with Slot Attention and Transformer.
Paper presented at the International Joint Conference on Neural Networks 2025, 30 Jun - 05 Jul 2025, Rome, Italy.
Abstract
Understanding a scene from multiple, potentially partial views and decomposing it into objects is foundational for human perception and intelligence. Current multi-view objectcentric scene representation learning models that use partial views analyze all views at once. This differs from the way humans process visual information and is not compatible with reinforcement learning, where an agent learns about its environment through actions, such as moving to change the viewpoint. In this paper, we propose ORSA-T (Object-centric scene Representation learning with Slot Attention and Transformer), which combines Implicit Slot Attention with an aggregation of previous views by a Transformer and improves the scene representation iteratively based on a sequence of images annotated with viewpoints. The Transformer uses all previous representations and the current update to aggregate scene information, which makes ORSA-T remember objects better and learn more effectively when applied to partial views. In our experiments, ORSA-T predicts and segments images from a new viewpoint better than MulMON, the current SOTA, and ORSA without aggregation connections and Transformer. As ORSA-T learns iteratively to improve its scene representation, it is suitable for use in reinforcement learning.
Publication Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | For the purpose of open access, the author(s) has applied a Creative Commons Attribution (CC BY) license to any Accepted Manuscript version arising |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science T Technology > T Technology (General) |
Departments: | School of Science & Technology School of Science & Technology > Department of Computer Science |
SWORD Depositor: |
Available under License Creative Commons: Attribution International Public License 4.0.
Download (1MB) | Preview
Export
Downloads
Downloads per month over past year