City Research Online - Deep Learning Solutions for Perception and Motion Forecasting in Autonomous Vehicles

Deep Learning Solutions for Perception and Motion Forecasting in Autonomous Vehicles

Inan, B. A. (2024). Deep Learning Solutions for Perception and Motion Forecasting in Autonomous Vehicles. (Unpublished Doctoral thesis, City, University of London)

Abstract

The major challenges for autonomous driving systems are the need for accurate and real-time perception, tracking, and motion forecasting necessary to navigate safely through complex and dynamic environments. These systems must be able to intuitively make decisions within split seconds in an urban environment with multiple agents interacting. This thesis proposes a comprehensive framework that addresses the LiDAR-based segmentation challenges, multi-object tracking, and trajectory forecasting with a focus on enhancing its accuracy, robustness, and efficiency.

The contribution of this work is to develop a hybrid approach to the segmentation of LiDAR, which would integrate synthetic data with real-world data sets. Synthetic data, created through simulated environments, may allow the model to experience different scenarios that could not be fully captured with real-world data alone. This combination enhances the generalization capability of the segmentation models, which can then handle difficult situations, occlusions, and variations in object density and wide light condition variations. Apart from that, the incorporation of multi-scale feature extraction methods helps in processing fine-grained details over various spatial resolution levels. This significantly improves accuracy in segmentation without choking compromising efficiency. The hybrid approach ensures that the model performance of segmentation is good enough to work on various different urban driving environments.

Besides these improvements, this thesis further exploits the efficacy of Vision Transformers for segmenting LiDAR point clouds. An attention mechanism introduced within the ViT captures both local and global geometric features much better than traditional convolutional networks. Employing transformers make the segmentation model understand much better the relationships between points. This would also successfully improve accuracy in object detection and classification, especially in complex sets of data. This significantly enhances the probability that the segmentation model will find and classify various objects in real-time. In this respect, it is very effective in highly demanding situations where the need for an accurate understanding of the environment is required, such as at busy street intersections or on highways.

The thesis also addresses the need to have good object tracking in dynamic surroundings by proposing a transformer-based multi-object tracking framework. It leverages a joint 2D-3D sensor fusion framework that fuses LiDAR and camera data so as to further enhance the accuracy in tracking of dynamic agents. Together, exploiting these sensing modalities allows the tracker to appreciate depth and geometric information given by LiDAR and the rich visual details provided by camera images. The system is designed to facilitate accurate tracking of multiple agents in real-time, which is crucial for ensuring safety and effective decision-making in autonomous driving systems.

Building on the improvements of segmentation and tracking, this thesis proposes IRMTR, a conveniently novel approach to multi-agent motion forecasting. IRMTR framework employs anchored goal queries and Gaussian Mixture Model that serve to generate intention points representing the most likely positions an agent is willing to take in the near future. These points of intentions are then fine-tuned using a hybrid local-global query mechanism to improve the predictive outcome of the model for future trajectories generated by dynamic agents in real-time. Considering that the IRMTR model updates its predictions by incorporating both the local interaction among neighboring agents and the global context of the driving environment, it boosts trajectory prediction accuracy quite significantly, especially in complicated traffic conditions. In fact, this model turns out to be particularly effective in forecasts of behaviors that involve lane changes, merging, and interaction at intersections where accurate trajectory forecast should be effective to enable collision avoidance and proactive navigation.

Overall, this thesis contributes to the advancement of autonomous vehicle technologies by proposing novel methods for LiDAR segmentation, multi-object tracking, and motion forecasting. By combining state-of-the-art deep learning techniques, this research lays the groundwork for future developments in real-time mapping and decision-making systems in autonomous vehicles.

Publication Type:	Thesis (Doctoral)
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science T Technology > TJ Mechanical engineering and machinery T Technology > TL Motor vehicles. Aeronautics. Astronautics
Departments:	School of Science & Technology > Engineering School of Science & Technology > School of Science & Technology Doctoral Theses Doctoral Theses

[thumbnail of Alp Inan Thesis 2024 PDF-A.pdf]

Preview

Text - Accepted Version
Download (26MB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

Altmetric

Funder Information

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Inan, B. A.
Status:	Unpublished
URI:	https://openaccess.city.ac.uk/id/eprint/34498
Date available in CRO:	23 Jan 2025 11:24
Date deposited:	23 January 2025
Dates:	Date Event 2024 Completed