High-Confidence Policy Improvement from Human Feedback
Hon Tik Tse, Philip S. Thomas, Scott Niekum
Reinforcement Learning Journal, 2025
We propose an algorithm to perform high-confidence policy improvement in the reinforcement learning from human feedback setting.
