PromptPO: When are LLMs sufficient policy optimizers for sequential RL tasks? Published: April 12, 2026