home / content / blog

Minor rant about rating scales

posted 2021.12.30

In the NBA Slam Dunk Contest, five judges score each dunk on a scale up to 10 points. This means the highest score possible is a 50. ~~The lowest you could get, of course, is 0.~~

The judges can’t give scores lower than 6. Even the least impressive dunk possible has to get 30 points. The judges are pretty liberal with their 50s as well: in the 2020 contest, 10 of the 18 dunks got the highest possible score.

I don’t really mind this in the dunk contest. For one thing, if a dunk contest has a lot of 50s it is probably quite entertaining to watch. And dunking a basketball is hard enough on its own that it feels a little rude to give a low score for not doing a cool enough trick. But why bother using a scale up to 10 if you never use 5 and below?

I think most people’s scales up to ten go like this:

0 or 1 = terrible!

2 = really bad

3 or 4 = bad

5 = mediocre

6 = fine i guess

7 = normal

8 = good

9 = really good

10 = really really good

This scale is cool and all, but in practice some of the low numbers may as well not exist. What is the difference between a 0 and a 1? Between a 2 and a 3? Usually nothing. Meanwhile, there is no way to distinguish between “really really good” and “the best”.1

I think scores skew high like this because people don’t want to hurt each other’s feelings. I wouldn’t want to give someone’s cooking a 6 and have to explain that, sure, there may be four tiers above that, but I assure you it was a good meal.

But I don’t have to live this way internally, so I don’t. Here is the scale I use when I think about music:

0 = assault on the ears. simply horrid

1 = unpleasant

2 = not my cup of tea

3 = neutral

4 = slightly good

5 = pretty good, but i’m not getting excited about it or anything

6 = good.

7 = damn, this is hot!

8 = best in class. (roughly equal to the 41-100 tier in my faves list)

9 = basically flawless. (11-40 tier)

10 = transcendent. (1-10 tier)

Most songs I hear are not among my favorite ever, so the top end of the scale doesn’t get used that much, but that’s because the differences at the top end are more salient than the differences at the bottom end. The difference between #1 and #20 is bigger than the difference between #1001 and #1020. And I find no need to have 5 flavors of bad at the bottom of the list: if it’s bad, it doesn’t matter all that much how bad it is, because I’ll probably never listen to it again.

The scale would be different in a different domain. For example, I would have more levels below neutral if I were rating food, because the difference between bad and disgusting food really is salient. So “neutral” might be a 5.

Olympic judges seem to have figured this out. In judged sports like gymnastics and diving, they don’t give out perfect scores, because they have to leave room for something more impressive. And the difference between a 5 and a 6 is probably the same as between a 3 and a 4.2

Users of app stores, meanwhile, are fools and mostly 80% five-star or one-star reviews. What does a two-star app look like? We may never know.

In conclusion, I should be allowed to judge dunks at the dunk contest and give some of them 3/10s.

  1. Some people give out 11s, because they just don’t know when to stop. See also: tiers above S on tier lists. 

  2. I just assumed this. But if it’s not true then that seems like a flaw.