Belief-Aware Agentic Reinforcement Learning for Web Decision Models under Multi-Cost and Failure Risk Constraints
DOI:
https://doi.org/10.71465/fbf755Keywords:
Belief-space planning, partially observable MDP, agentic reinforcement learning, web agents, failure risk modeling, multi-cost constraintsAbstract
Web interaction is inherently partially observable, as critical task-relevant information is distributed across multiple pages, dynamic UI elements, and delayed system feedback. This study formulates web agent decision-making as a belief-space constrained MDP, where the agent maintains a probabilistic belief over hidden task states and latent failure conditions. A belief-aware agentic reinforcement learning model is proposed that jointly updates task belief and failure-risk belief while optimizing task success under multiple cumulative cost budgets, including interaction steps, latency, and external tool usage. Failure risk is modeled as a belief-dependent hazard that evolves with both observed UI transitions and unobserved system states. The policy is trained using belief-conditioned value estimation and cost-regularized returns. Experiments are designed on a benchmark of approximately 1,200 web tasks across 50 website templates with partial observability induced by delayed confirmations and hidden irreversible actions. Results are evaluated in terms of success rate, belief calibration error, average cost per success, and failure incidence under fixed budgets. The proposed framestudy demonstrates improved robustness in long-horizon tasks where incorrect belief updates frequently lead to catastrophic decisions.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Arjun K. Singh, Priya Menon, Karthik Raman (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.