Deep Reinforcement Learning for Autonomous Satellite Guidance and Control
Tammam, A. (2025). Deep Reinforcement Learning for Autonomous Satellite Guidance and Control. (Unpublished Doctoral thesis, City St George's, University of London)
Abstract
As autonomous satellite operations such as docking, inspection, and debris removal become more common, there is a growing demand for onboard control systems that can operate reliably under uncertainty, intermittent communication, and limited opportunities for ground intervention. Traditional model-based control methods have demonstrated strong performance in well-characterised environments, but can face challenges when extended to close-proximity operations involving modelling errors, unmodelled disturbances, degraded sensing or actuation, and nonlinear operational constraints. This thesis aims to develop and evaluate deep reinforcement learning (DRL) control architectures for autonomous six-degree-of-freedom (6-DoF) spacecraft close-proximity operations that are robust, scalable, and suitable for real-time deployment.
The thesis develops a suite of DRL-based controllers for 6-DoF guidance and control tasks, progressing from architectural baselines to safety, sparse-reward learning, and uncertainty-aware decision making. A modular control design is adopted throughout, decoupling translational and rotational control to improve training stability and reduce interference between objectives. This design choice is first evaluated through a comparative study of centralised and decentralised TD3-based controllers for a 1U CubeSat approaching and aligning with a passive, non-cooperative target. Across 100 randomised trials, the decentralised controller achieved more stable and precise behaviour, producing lower position and orientation errors, reduced control effort, and more consistent convergence than the centralised formulation.
Building on this baseline, the thesis introduces a hybrid safe-learning framework that combines adaptive domain randomisation (ADR) with relaxed control barrier functions (CBFs) to improve robustness under uncertainty while maintaining operational constraints during both training and deployment. In a 6-DoF rendezvous task with a progressive actuator and sensor degradation cascade, the Baseline controller diverged rapidly, with position error exceeding 45m and reaching up to 1000m across trials. In contrast, the Safe controller consistently bounded position error within 5m and maintained a controlled attitude response, while also remaining within the 0.5m s−1 linear velocity constraint under nominal conditions, at the cost of slower positional convergence.
To address sparse rewards and long-horizon planning without dense, hand-crafted reward shaping, a goal-conditioned hierarchical deep reinforcement learning (HDRL) framework is developed. The controller decomposes control into high-level subgoal generation and low-level actuation, enabling learning from binary terminal rewards using hindsight experience replay and subgoal relabelling. In simulation, across 100 Monte Carlo trials, the hierarchical controller achieved mission-compliant terminal accuracy with a final position error of approximately 2.75 m and final orientation error below 1◦, comparable to PID baseline performance with similar peak translational speeds. Real-time feasibility is evaluated using a custom hardware-in-the-loop (HIL) testbed combining a reaction-wheel-actuated CubeSat mock-up on a spherical airbearing with a 6-DoF robotic arm for translational emulation. Both HDRL and TD3-based controllers maintained closed-loop stability under real-time sensing, actuation, and communication conditions, reducing position error from approximately 110 m to below 5 m and regulating attitude within 5◦, while the PID controller was unable to stabilise the vehicle in the same HIL setting.
Finally, the thesis addresses decision making under uncertainty through an uncertainty-aware distributional RL architecture based on a novel Uncertainty-Aware Implicit Quantile Network (UA-IQN). By modelling the full return distribution and using variance-penalised action selection, UA-IQN supports risk-sensitive control by favouring actions with more predictable outcomes. Applied to a 6-DoF final-approach docking task, UA-IQN achieved task success comparable to a baseline decentralised TD3 controller, with sub-centimetre lateral accuracy at the docking threshold (mean offtrack 0.004 m, mean cross-track 0.006 m) and sub-degree attitude errors at docking (mean roll/pitch/yaw 0.263◦/0.175◦/0.277◦), while exhibiting more consistent and conservative translational approach behaviour across diverse initial conditions in simulation and HIL evaluation.
Together, these results demonstrate that modular DRL architectures can provide accurate spacecraft guidance and control while improving robustness to uncertainty, degraded sensing and actuation, and operational constraints, with validation in simulation and hardware-in-the-loop. By integrating architectural modularity, safety mechanisms, sparse-reward learning, and uncertainty-aware decision making, this thesis provides a pathway towards deployable DRL-based control for autonomous spacecraft close-proximity operations.
Download (78MB) | Preview
Export
Downloads
Downloads per month over past year
Metadata
Metadata