Fairly unfair dice

Published on 2024-06-15 by TomatoSoup

Given the fact that I mentioned practrand in an earlier post what are the odds that I have an interest in probability and random number generators?

If you've played XCOM or any other Tactics game you know how bullshit it is when an 80% hit-chance misses. And then misses again. And again. Or you're a tabletop player and know that the dice are listening and that's why you threw a 1 on a very important check.

Unfortunately most dice are fair to within reasonable tolerances. What does it mean for a die to be fair?

It means that every roll has equal probability of any given face landing rightside up. In statistics terms, the values are uniform, independent, and identically distributed. Independent meaning that the outcome of one roll doesn't affect the outcome of another and identically distributed meaning that every roll comes from the same distribution. What distribution is that? The uniform distribution where all values are equally likely.

Why is this annoying? It means that if you play table top games you have a 1-in-400 chance of rolling two 1's in a row. In a 3 hour session where you're rolling once a minute, you have a 36% chance for that to happen at least once.

How does that math?

Well, for 20 sides we have p = 1/20 = 0.05 for any given roll. The probability of rolling a certain number twice in a row then is 0.05*0.05 == 0.0025. Conversely, the odds of rolling any number twice in a row is still 0.05. Why? Well in the first example you pick your certain number and you need both numbers to equal that number. But if you don't put any constraints on the first number, just the second one, then you only have one to match one number.

Three hours, once per minute, is 180 attempts. This is a bit of a fiat because a strict interpretation would say we have 179 attempts, if we are looking for two 1's in a row. And it's not independent, because if we don't roll a 1 at some point the odds of the next roll being two 1's in a row is not 0.0025, it's 0.

There's two approaches we can take here.

We could reach for the binomial distribution, which tells you the odds of an event occuring X number of times for a given probability p and number of trials n. Working the math on this would be tedious and I don't have the background to argue why it's correct.

Or we could simulate it.

n: p: X:

Try it. Put in 180, 0.0025, and 1. You'll see a lil' result that says 0.36 or so. All that this does is it generates n random numbers and sees if they're lower than p. If they are, it counts that as a success. If there's more successes than X, that means that this trial met the criteria. It does this process 10000 times and tells you what fraction of those trials passed.

This is called the Monte Carlo method. Broadly speaking it breaks down to the idea that if you don't know an analytical solution to a problem you can always brute force an approximation by evaluating the problem many times. An analytical solution is known to the above problem, but compute power is so cheap that it's easier to do this. That's why the numbers in the rest of this will be somewhat inconsistent and approximate. Of course, everything gets more accurate the more iterations you do, but 10k is enough for acceptable results with this level of randomness.

Try other things. In Terraria, the Rod of Discord as a 0.2% (p=0.002) drop rate. What are your odds of getting it after you kill 100 Chaos Elementals?

What are the odds of rolling 1's twice in a row three times in a session?

So we can see how unlikely things, when you roll often enough, are actually pretty likely. How do we make these outliers less likely?

Let's talk about expected value.

Expected value is a way of expressing what you'll get, on average, out of an event. It's very easy to compute! You multiply the results of each outcome by the probability of that outcome and sum them up.

Let's work a few examples.

Double or Nothing. You give a shady guy on the street a dollar, he flips a coin. Heads you get 2 dollars, tails he keeps it all. Well, that's a 50/50. So you're doing 0*0.5 + 2*0.5 which is... 1 dollar. On a long enough scale, nobody makes any progress with this, so the coin is probably weighted or he's using slight of hand to not actually flip it.
You're opening trading cards. Common cards make up 90% of the pack for 1 dollar, rares make up 9% for 2 dollars, ultra rares 0.9% for 10 dollars, and legendaries 0.1% for 100. If there's 10 cards in a pack, 10 * (0.9*1 + 0.09*2 + 0.09*10 + 0.01*100) is... 29.8 dollars. If you can buy a pack for less than this then, over time, you can expect to come out ahead. Of course, there's no guarantee how far in the hole you'll go before you actually come out ahead. You can totally get a run of bad cards!
Another shady guy is doing Double or Nothing. But it's a weird one. You pay him and he's gonna keep flipping a coin. If it's tails, the game stops and he pays out. Every time it's heads, he doubles the money in the pot, which starts at 2 dollars with the first heads. This one will take a little bit of extra math. The odds of getting N heads in a row is 0.5 ^ N. So 1/2, 1/4, 1/8, 1/16th, etcetera. But simultaneously the money is doubling, 2 ^ N. So 2, 4, 8, 16. And because the game might never end, we'll need to take the limit to figure out the expected value. So we start multiplying. 0.5*2 + 0.25*4 + 0.125*8 + ... and... uh. uh oh.

So the expected value of a typical dice is it's maximum roll plus one divided by 2. Or it's maximum roll plus 0.5. To work out a D4, 0.25 * (1+2+3+4) = 0.25*10 = 2.5.

A D20 has an expected value of 10.5, but there's other common combinations that have the same expected value: 2d10 and 3d6 both have an expected value of 10.5. But clearly the minimum that you can roll on two D10 is 2, the odds of rolling a 20 is p = 0.1*0.1 = 0.01, and on three D6 the minimum is 3 and the maximum is 18!

So not only do we need to come up with a new method for smoothing out our dice rolls but we need to maintain a lot of probabilities that the game designers counted on. Remember, we're trying to make a system that feels nicer but is just as balanced as before!

Let's start with just eliminating repeats. This works really well in some ways and falls flat in others.

The easiest way to do this is to just fill a bag with the values on the die and draw from the bag with removal. When the bag is empty, fill it up again! It's impossible to get a run of 1's this way.

As with all programs, implementation is everything.

Let's naively shuffle. In psuedocode:

for every index in bag:
    randomIndex = random(sizeOfBag)
    swap(index, randomIndex)

Or, in plain english, go to every position in the bag and swap it with a random other position in the bag.

Huh. Why is 2341 showing up 5 times as often as 1423? We're doing this 10000 times and there's 4! = 24 combinations, so surely they'd all appear about 416 times, right?

Well, turns out that shuffle algorithm sucks. It's biased. Like 33% of the time 2 is the first number you'll draw from it. It should be 25%! No, we need to use a better algorithm like Fisher-Yates.

for every index in bag:
    randomIndex = randomly select an index we haven't looped to yet.
    swap(index, randomIndex)

The problem, intuitively, was that numbers could be swapped later in the list where they would be traversed again. That's why 1's and 2's were more likely to be first in the list, they'd be swapped and have more chances to swap back. 3's and 4's would get moved somewhere and then stick.

much better.

So under this scheme we'd have 4 numbers in a bag and draw one when we want to throw a D4. When the bag is empty, we put them all back in and shake it up.

This has the exact same long-term statistical properties as a regular D4. Same min, max, expected value, and even standard deviation (which we may or may not get to later).

Two problems. First of all, a player can observe the gameplay and always know what numbers can't be drawn next. If three numbers have been drawn, the player might decide to take a different action knowing exactly what the next number will be.

Second: On a regular D4, if I tell you that I just rolled at 4, what are the odds that the next number will be a 4? p=1/4. The events are independent, after all!

In this bag, if I tell you that I just drew a 4, what are the odds that the next number will be a 4? Not p=1/4, not p=0, but p=1/16. Because if all you know is that I just drew a 4, you don't know if the bag is now empty or not. For a previously unobserved bag there's a 1-in-4 chance that it's now empty. If that's the case, then it's a further 1-in-4 to actually get the 4.

It completely throws off your reasoning about doubles.

Let me put it another way: Let's put Heads and Tails into the bag. You'll be able to draw any combination of HH, TT, HT, and TH. But TH and HT will three times as common as HH and TT and you'll never draw HHH. Why three times? Well if you know that you just drew H, and you have a 1-in-4 chance of drawing H again, the other 3 scenarios must be T.

It also doesn't work for continuous distributions: You have to have discrete objects you can put into the bag.

What we really want to do is just nudge the dice based on its history. We want all things to be possible, but some things less likely. If it threw low this time, maybe make it throw higher next time.

So let's make a counter. Start it at 0. Every time it rolls below the expected value, decrease the counter. Every time it rolls above the expected value, increase it. Based on this counter we want to skew the results of the roll, but without making any numbers impossible. Let's just make some numbers more probable than others.

We can do this by generating a floating point number and bending it with an exponent. A number in the range 0-1, put to any positive power will remain in the range 0-1. But putting it to an exponent greater than 1 will reduce the outcome: 0.5 ^ 2 = 0.25. And conversely, 0.5 ^ 0.5 ~ 0.707. So we'll simply generate a number in the 0-1 range. If the counter is positive, we put it to the counterth-power. If it's negative, we put it to the -1/counterth power. Then we remap it to the 1-4 range.

diceSize:

It's... somewhat harder to see for a D4. It gets pushed too far to one extreme too fast and sticks there for a bit, resulting in more runs of the same number. For a D20 it's a lot better, but clearly shows that some fine-tuning is necessary for the counter. After all, 0.5 ^ 2 = 0.25 means that you have a 50/50 chance of rolling the bottom quarter of results. On a D4 that's a 1! And 0.5 ^ 5 = 0.03125 means you have a 50/50 chance of rolling the lowest 3% of results, which means you've gotta roll really well to not get a 1 on a D20.

Mostly this scheme keeps things away from the middle values. It could be modified to only adjust the counter when the dice rolls a critical failure or success! That way a natural 1 starts a good luck streak only ended by a natural 20 and vice-versa. The rate that the counter scales could also be adjusted, so that it doesn't immediately skew to the top or bottom quarter of values.

I set out writing this to demonstrate a simple technique to make dice feel more fair. I had a few ideas and, in the end, they didn't pan out. I would consider this a negative result. But that's not bad, we still learned something! And these ideas aren't useless, it should be clear that there's a lot of levers you can pull to tweak the bias-via-exponent version. As for the grab bag, you might decide to refill and reshuffle the bag after only drawing a few dice. Or maybe put two entire sets of dice in and reshuffle after pulling half of the bag. But the goal was a simple algorithm that had a neat explanation for why it made numbers that felt right. These ideas have too many levers and dials to achieve that goal.

For all of these schemes there's the consideration that different players might need different tracking for their die. It'd really suck to see an enemy get a critical success and know that this means you won't be as lucky. On the other hand, a player getting lucky knows that their enemy is about to flub. And on the gripping hand, one player getting lucky for a lackluster attack means other players will sense that they wasted the groups luck.

I think the important take away for all this is to just use fair dice. Not only are modern psuedorandom number generators fast and free of statistical flaws but even cryptographically secure ones are fast enough for most general purpose situations.