Sampling a model N times and keeping the best answer; a rising best-of-16 score shows training isn't collapsing the model to a single solution.