Posts

When are LLM’s sufficient black box policy optimizers?

Inspired by AutoResearch, we ask when are LLM’s sufficient black box policy optimizers? I.e., when can we replace classic RL algorithms like PPO or SAC with an LLM?

Stephane Hatgis-Kessell

Posts

When are LLM’s sufficient black box policy optimizers?