Multi-Agent Post-Co-Training of Large Language Models via Reinforcement Learning

James L. Carter; Yuxuan Liu; Thomas K. Lee

doi:10.71465/fapm716

Authors

James L. Carter Department of Computer Science, Princeton University, Princeton, NJ 08544, USA Author
Yuxuan Liu Department of Computer Science, Princeton University, Princeton, NJ 08544, USA Author
Thomas K. Lee Department of Computer Science, Princeton University, Princeton, NJ 08544, USA Author

DOI:

https://doi.org/10.71465/fapm716

Keywords:

Post-training, multi-agent learning, LLM collaboration, verifier-based reward, discussion optimization

Abstract

This study introduces MAPoRL2, a post-training framework that enhances collaborative LLM performance through multi-agent reinforcement learning and structured discussion. Multiple LLM agents independently generate candidate solutions, engage in iterative discussion rounds, and are jointly optimized using verifier-based rewards that assess both correctness and corrective reasoning. Experiments across 5 reasoning and generation benchmarks with 4,500 training samples demonstrate improvements of 18.9% in answer accuracy and 22.4% in correction efficiency over single-agent post-training, highlighting the effectiveness of discussion-aware RL signals.

Downloads

Download data is not yet available.

Multi-Agent Post-Co-Training of Large Language Models via Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

Journal Information

Latest publications

Information

Make a Submission

Keywords