long-horizon reinforcement learning
Training a coding agent by running many rollouts on real problems and reinforcing the ones that succeed; a single rollout can reach 200K tokens and hundreds of tool calls.