Temporal Difference Learning with Adaptive Horizon Control for Stable Multi-Step Collaborative Decisions

Authors

  • Wei Chen School of Computing, National University of Singapore, Singapore 117417, Singapore Author
  • Jun Jie Tan School of Computing, National University of Singapore, Singapore 117417, Singapore Author
  • Li Wang School of Computing, National University of Singapore, Singapore 117417, Singapore Author

DOI:

https://doi.org/10.71465/fair778

Keywords:

Temporal difference learning, Adaptive horizon, Reinforcement learning, Multi-step decision-making, Stability

Abstract

Instability in long-horizon decision-making often arises from improper credit assignment across extended time steps. This study explores an adaptive horizon control mechanism integrated with temporal difference (TD) learning to improve stability in multi-step collaborative tasks. Instead of using a fixed discount factor, the method dynamically adjusts the effective planning horizon based on reward sparsity and task progression signals. The approach is validated on 10,300 multi-step decision sequences with horizon lengths ranging from 10 to 50 steps. Compared with standard TD learning, the proposed method reduces cumulative reward variance by 25.9% and improves final task success rate by 14.6%. Furthermore, convergence is achieved with fewer training iterations, indicating improved learning efficiency. The results suggest that adaptive horizon control is a practical solution for stabilizing long-range coordination.

Downloads

Download data is not yet available.

Downloads

Published

2026-04-01