home / content / blog

Minor rant about rating scales

posted 2021.12.30

In the NBA Slam Dunk Contest, five judges score each dunk on a scale up to 10 points. This means the highest score possible is a 50. ~~The lowest you could get, of course, is 0.~~

The judges can’t give scores lower than 6. Even the least impressive dunk possible has to get 30 points. The judges are pretty liberal with their 50s as well: in the 2020 contest, 10 of the 18 dunks got the highest possible score.

I don’t really mind this in the dunk contest. For one thing, dunk contests with a lot of 50s are generally quite entertaining to watch. But why bother using a scale up to 10 if you never use 5 and below?

I think most people’s scales up to ten go like this:

0 or 1 = terrible!

2 = really bad

3 or 4 = bad

5 = mediocre

6 = fine i guess

7 = normal

8 = good

9 = really good

10 = really really good

This scale is cool and all, but in practice some of the low numbers may as well not exist. What is the difference between a 0 and a 1? Between a 2 and a 3? Usually nothing. Meanwhile, there is no way to distinguish between “really really good” and “the best”.1

I think scores skew high like this because people don’t want to hurt each other’s feelings. I wouldn’t want to give someone’s cooking a 6 and have to explain that, sure, there may be four tiers above that, but I assure you it was a good meal.

But I don’t have to live this way, so I don’t. Here is the scale I use when I think about music:

0 = assault on the ears. simply horrid

1 = unpleasant

2 = not my cup of tea

3 = neutral

4 = slightly good

5 = pretty good, but i’m not getting excited about it or anything

6 = good.

7 = damn, this is hot!

8 = best in class. (roughly equates to the 41-100 tier in my faves list)

9 = basically flawless. (11-40 tier)

10 = transcendent. (1-10 tier)

Most songs I hear are not among my favorite ever, so the top end of the scale doesn’t get used that much, but that’s because the differences at the top end are more salient than the differences at the bottom end. The difference between #1 and #20 is bigger than the difference between #1001 and #1020. And I find no need to have 5 flavors of bad at the bottom of the list: if it’s bad, it doesn’t matter all that much how bad it is, because I’ll probably never listen to it again.

The scale would likely be different in a different domain. For example, I would make “neutral” be about a 5 if I were rating food, because the difference between bad and disgusting food really is salient.

Olympic judges seem to have figured this out. In judged sports like gymnastics and diving, they don’t give out maximum scores, because they have to leave room for something more impressive. And the difference between a 5 and a 6 is probably the same as between a 3 and a 4.2

Users of app stores, meanwhile, are fools and mostly 80% five-star or one-star reviews. What does a two-star app look like? We may never know.

In conclusion, I should be allowed to judge dunks at the dunk contest and give some of them 3s.

  1. Some people give out 11s, because they just don’t know when to stop. Even then, an 11 often doesn’t mean “the best”. 

  2. Okay, really I just assumed this. But if it’s not true it should be.