Robust AI based perception and guidance for autonomous vehicles

Wang, C.

Robust AI based perception and guidance for autonomous vehicles

Wang, C. (2024). Robust AI based perception and guidance for autonomous vehicles. (Unpublished Doctoral thesis, City, University of London)

Abstract

In the domain of autonomous vehicles, artificial intelligence (AI) now plays a crucial role in achieving fully autonomous systems that can navigate complex and dynamic environments without human intervention. To reach full automation, an autonomous driving system must possess not only the ability to make intelligent decisions but also exhibit high levels of robustness and real-time performance. The core of autonomous driving lies in processing and analysing enormous streams of data generated by a range of sensors, such as cameras, LiDAR, radar, and GPS. AI algorithms take on multiple critical tasks, including perceiving the surrounding environment, detecting and classifying objects, and understanding road conditions. Then the decision-making components determine the optimal course of actions, from lane changes to responding to unexpected obstacles. However, despite the progress made, AI-based autonomous driving systems still face numerous challenges, particularly in ensuring reliability in highly unpredictable environments, both in scene understanding and decision-making. Therefore, developing reliable and robust solutions for these components remains a critical area of research.

This thesis explores the novel deep-learning based methods for the perception and the decision-making modules in learning-based autonomous driving, aiming to improve efficiency and robustness of these modules. The proposed solutions tackle the current challenges and contribute to the overarching goal of achieving full autonomy in autonomous driving systems.

To analyse the elements in the environment, a learning-based approach is proposed for monocular semantic segmentation in urban driving scenarios. This method consists of two components: pyramid fusion spatial attention and fusion channel attention, which are designed to capture contextual dependencies while maintaining a lightweight architecture and achieving state-of-the-art performance. To further locate these elements in the 3D world, a 3D object detection method is proposed that uses only monocular camera input. To compensate for the lack of depth information in monocular images, additional adaptive depth supervision signals are introduced, which also achieve the goal of avoiding excessive computational burdens. Following this, a depth acquisition method is proposed to understand the 3D geometry of the entire scene. As part of this research, a synthetic depth completion dataset is collected by combining LiDAR and stereo camera data, addressing the shortcomings of existing datasets that lack dense ground truth.

To test the efficiency of the depth module, a guidance system is proposed that exploits the strengths of both Imitation Learning (IL) and Deep Reinforcement Learning (DRL). Results demonstrate that incorporating depth images improves the performance of the guidance network. Subsequently, the robustness of a single-agent driving system against adversarial attacks is investigated. This study presents a defence algorithm to mitigate state perturbations, ensuring the concrete robustness of the driving system in worst-case scenarios. Additionally, an explainable attack detector is introduced to accurately predict adversarial attacks and visualise the decision-making process, thereby enhancing the reliability of the proposed robust algorithm. The robustness of the complete approach is demonstrated through several synthetic test cases involving various strong perturbations and domain transfer. Lastly, the robustness is explored in the multi-agent systems. A connected, cooperative multi-agent system is introduced to enhance the efficiency of cooperative tasks in ideal environments. However, the challenges of adversarial attacks escalate significantly in MARL systems compared to single-agent systems, due to the increased complexity of dynamics and information sharing. To solve this, this method follows the idea of constrained objective function introduced in the single-agent case, and further adopt it to the multi-agent context with proposed safety criteria guarantee.

Keywords: Deep Learning, Deep Reinforcement Learning, Computer Vision, Optimisation, Adversarial Attacks, Explainability, Decision Making, 3D Object Detection, Semantic Segmentation, Depth Completion, Attention Module, Imitation Learning.

Publication Type:	Thesis (Doctoral)
Subjects:	Q Science T Technology > T Technology (General) T Technology > TA Engineering (General). Civil engineering (General)
Departments:	School of Science & Technology > Department of Engineering School of Science & Technology > School of Science & Technology Doctoral Theses Doctoral Theses

[thumbnail of Wang thesis 2024 PDF-A.pdf]

Preview

Text - Accepted Version
Download (59MB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Wang, C.
Status:	Unpublished
URI:	https://openaccess.city.ac.uk/id/eprint/34241
Date available in CRO:	12 Dec 2024 13:41
Date deposited:	12 December 2024
Dates:	Date Event 2024 Completed