Belief-Aware Agentic Reinforcement Learning for Web Decision Models under Multi-Cost and Failure Risk Constraints

Arjun K. Singh; Priya Menon; Karthik Raman

doi:10.71465/fbf755

Authors

Arjun K. Singh Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom Author
Priya Menon Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom Author
Karthik Raman Department of Computer Science, University of Oxford, Oxford OX1 3QD, United Kingdom Author

DOI:

https://doi.org/10.71465/fbf755

Keywords:

Belief-space planning, partially observable MDP, agentic reinforcement learning, web agents, failure risk modeling, multi-cost constraints

Abstract

Web interaction is inherently partially observable, as critical task-relevant information is distributed across multiple pages, dynamic UI elements, and delayed system feedback. This study formulates web agent decision-making as a belief-space constrained MDP, where the agent maintains a probabilistic belief over hidden task states and latent failure conditions. A belief-aware agentic reinforcement learning model is proposed that jointly updates task belief and failure-risk belief while optimizing task success under multiple cumulative cost budgets, including interaction steps, latency, and external tool usage. Failure risk is modeled as a belief-dependent hazard that evolves with both observed UI transitions and unobserved system states. The policy is trained using belief-conditioned value estimation and cost-regularized returns. Experiments are designed on a benchmark of approximately 1,200 web tasks across 50 website templates with partial observability induced by delayed confirmations and hidden irreversible actions. Results are evaluated in terms of success rate, belief calibration error, average cost per success, and failure incidence under fixed budgets. The proposed framestudy demonstrates improved robustness in long-horizon tasks where incorrect belief updates frequently lead to catastrophic decisions.

Downloads

Download data is not yet available.

Belief-Aware Agentic Reinforcement Learning for Web Decision Models under Multi-Cost and Failure Risk Constraints

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

Journal Information

Latest publications

Information

Make a Submission

Keywords