Do Balance Patches Actually Change Winrates?

Online Ultimate

Today, I'll examine online tournament data to see if Balance Patch 8.0.0 produced any statistical significant differences. Specifically, I'll be looking at data from two date ranges: from January 28th to June 29th ("pre-patch"), and June 29th to October 13th ("post-patch"). I picked 8.0.0 because it was somewhat recent, and had a patch before it and a patch after it (that's where January 28th and October 13th came from - those were the release dates of patch 7.0.0 and 9.0.0 respectively).

Also, I'll limit the scope to a handful of predetermined characters, mostly because of time. But, if you're curious about other characters, you'll have a way to look for yourself below.

The characters will be: Marth, Falco, Ike, King K. Rool, and Mario. Mario is here as a sanity check - he only received a final smash update - ideally, we should see no significant change.

Marth and Falco were both buffed and fairly "mainstream" competitive characters. K Rool was also buffed, but he is, well, not considered a very good character. Ike is kinda mixed.

Methodology

In order to do this I'm modeling character wins and losses as a binomial distribution. This has one major assumption that is not true: independence. These characters are being piloted by players, players of varying skills and even beyond just skill, styles. So they're not truly independent - but hopefully the relatively large sample sizes mitigates biasing that this assumption produces.

Additionally, of course, some characters have better average players - but we're comparing between what should be roughly the same population of players.

With that under way, given that basically every character has n > 2000, I can approximate the binomials as a Normal distribution and then just do a normal Z-test. Additionally, I realized halfway through that computers are fast now and you can do the Fisher Exact Test for these scales without a problem. These should produce roughly similar numbers with these sample sizes but I'll provide both.

Marth, Falco, K Rool will be a one-sided test (should be higher), Mario and Ike will be two-sided.

I'll stick with the normal arbitrary threshold of p < 5%.


Marth

RxC Table:

Pre Patch Post Patch
Wins 1325 942
Losses 1938 1386

Ouch, if you do the division, you may notice that Marth, in fact, loses winrate. So that axes my original idea. Well, sometimes it happens. Now, instead of doing a one-sided test, I'll do a two-sided test - is this just noise, or somehow did Marth get worse?

Normal: 0.91457
Fisher: 0.93394

So, no, thankfully I am not going crazy, and it seems that it was just noise. I will note that Marth has a very large, and very, well, unskilled playerbase, so if you want to explain this in another way than "the buffs were useless", it could be simply that Marth players are bad, on average, at making use of the buffs.

Well, it could be noise. Just because it's above the threshold doesn't mean it can't be truly different - just that through this method, I can't be confident that anything significant happened.

Falco

RxC Table:

Pre Patch Post Patch
Wins 1995 1472
Losses 2926 1789
Normal: 0.000018
Fisher: 0.000021

Wow! That's not only below our target p value, it's SO far below it seems clear that Falco did greatly benefit from patch.

King K. Rool

Pre Patch Post Patch
Wins 4897 3081
Losses 5088 2999
Normal: 0.02248
Fisher: 0.02336

Still comfortably below the 0.05 threshold. I would also note that K. Rool by far has the highest sample size so far.

Ike (two-sided)

Pre Patch Post Patch
Wins 6183 3051
Losses 5067 2617
Normal: 0.16296
Fisher: 0.16443

First, I would note that Ike also regressed slightly in winrate. But seemingly not a significant one, although it's cutting it close - if I did a one-sided test for being nerfed, it would be quite close to 0.05.

Mario

Pre Patch Post Patch
Wins 8360 2674
Losses 9364 3000
Normal: 0.95764897567626905417
Fisher: 0.96344417480320632130

This is my sanity check - and it seems pretty sane. Indeed, very likely nothing changed.

Heck, I'll do a couple more

Mewtwo

Pre Patch Post Patch
Wins 1429 803
Losses 1867 938
Normal: 0.03003357877003944368
Fisher: 0.03217526916549692112

Pit

Pre Patch Post Patch
Wins 825 504
Losses 1076 641
Normal: 0.36926673508202484397
Fisher: 0.38344963085808192460

So it seems that Falco, King K. Rool, and Mewtwo pass the frequentist test.

Curious about other characters? What to know what a Bayesian approach would show? Want to use a more complicated model?

Now you can. Happy hunting.

As usual, you can contact me at stu2b@statsmash.io or @stu2b50 on Twitter.