Belief-Aware Agentic Reinforcement Learning for Web Decision Models under Multi-Cost and Failure Risk Constraints

Authors

  • Arjun K. Singh Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom Author
  • Priya Menon Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom Author
  • Karthik Raman Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom Author

DOI:

https://doi.org/10.71465/fbf755

Keywords:

Belief-space planning, partially observable MDP, agentic reinforcement learning, web agents, failure risk modeling, multi-cost constraints

Abstract

Web interaction is inherently partially observable, as critical task-relevant information is distributed across multiple pages, dynamic UI elements, and delayed system feedback. This study formulates web agent decision-making as a belief-space constrained MDP, where the agent maintains a probabilistic belief over hidden task states and latent failure conditions. A belief-aware agentic reinforcement learning model is proposed that jointly updates task belief and failure-risk belief while optimizing task success under multiple cumulative cost budgets, including interaction steps, latency, and external tool usage. Failure risk is modeled as a belief-dependent hazard that evolves with both observed UI transitions and unobserved system states. The policy is trained using belief-conditioned value estimation and cost-regularized returns. Experiments are designed on a benchmark of approximately 1,200 web tasks across 50 website templates with partial observability induced by delayed confirmations and hidden irreversible actions. Results are evaluated in terms of success rate, belief calibration error, average cost per success, and failure incidence under fixed budgets. The proposed framestudy demonstrates improved robustness in long-horizon tasks where incorrect belief updates frequently lead to catastrophic decisions.

Downloads

Download data is not yet available.

Downloads

Published

2026-03-25