A training reward that discourages a model from being too verbose or too terse, so easy tasks finish fast and hard tasks get more room.