Temporal Difference Learning with Adaptive Horizon Control for Stable Multi-Step Collaborative Decisions
DOI:
https://doi.org/10.71465/fair778Keywords:
Temporal difference learning, Adaptive horizon, Reinforcement learning, Multi-step decision-making, StabilityAbstract
Instability in long-horizon decision-making often arises from improper credit assignment across extended time steps. This study explores an adaptive horizon control mechanism integrated with temporal difference (TD) learning to improve stability in multi-step collaborative tasks. Instead of using a fixed discount factor, the method dynamically adjusts the effective planning horizon based on reward sparsity and task progression signals. The approach is validated on 10,300 multi-step decision sequences with horizon lengths ranging from 10 to 50 steps. Compared with standard TD learning, the proposed method reduces cumulative reward variance by 25.9% and improves final task success rate by 14.6%. Furthermore, convergence is achieved with fewer training iterations, indicating improved learning efficiency. The results suggest that adaptive horizon control is a practical solution for stabilizing long-range coordination.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Wei Chen, Jun Jie Tan, Li Wang (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.