Repairing Reward Functions with Human Feedback to Mitigate Reward Hacking
Published in arXiv, 2025
By Stephane Hatgis-Kessell, Logan Mondal Bhamidipaty, and Emma Brunskill
Download here
Published in arXiv, 2025
By Stephane Hatgis-Kessell, Logan Mondal Bhamidipaty, and Emma Brunskill
Download here
Published in Arxiv Preprint, 2023
By Stephane Hatgis-Kessell, W. Bradley Knox, Serena Booth, Scott Niekum, Peter Stone
Download here
Published in UT Austin Computer Science Honors Thesis, 2023
By Stephane Hatgis-Kessell
Download here
Published in Proceedings of the AAAI Conference on Artificial Intelligence, 2023
By W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone,Scott Niekum
Download here
Published in Transactions on Machine Learning Research (TMLR), 2023
By W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, and Alessandro Allievi
Download here