Solving Bruno’s Dice Probability Challenge

This nerd-sniping problem deals 5d6 psychic damage

math

programming

python

Author

Vincent “VM” Mercator

Published

April 17, 2024

Modified

May 13, 2024

Five six-sided dice in different colors reading 'one, six, four, one, two' from left-to-right on a wooden background. — If one hundred dice is a “pound”, then I guess this is “0.8 ounces”.

Bruno Rubim posted a probability challenge on his website related to a card game he’s making that piqued my interest. Since nobody had posted a solution yet at the time, I wanted to try my hand at it.

“Did you expect this to show up here? Me neither, but well, my challenge for you is finding out, what are the odds that when throwing 5 6-sided die, at least 3 will land the same number? … If you’re able to give me a function with the number if die using that’s great!”

– Bruno Rubim

While finding the solution to the first part was easy enough to do with pure brute force, finding a general solution was a lot harder than I had originally thought! Here’s how I did it.

import datetime
import math
import sys

import numpy as np
import pandas as pd
import scipy as sp

pd.options.display.show_dimensions = False

Naive Solution by Brute-Force

The first part of Bruno’s question requires finding all the possible ways to roll five six-sided dice such that at least three share the same face. The simplest way to do this is to directly count the outcomes by brute-force.

def multi_similar_brute_force(n_similar, n_dice, die_sides, exact=False):
    """Calculate how many multi-similar rolls there are by brute force."""
    # 1. Tabulate all possible ways to roll the dice
    outcomes = (
        pd.DataFrame(
            np.ndindex((die_sides,) * n_dice),
            columns=[f"die_{idx}" for idx in range(1, n_dice + 1)],
        )
        + 1
    )
    outcomes.index.name = "roll_id"
    # 2. Count how many times each number shows up on each roll, find maximum
    result_counts = outcomes.apply(pd.Series.value_counts, axis=1).fillna(0).astype(int)
    result_counts.columns = [f"cnt_{idx}" for idx in result_counts.columns]
    result_counts["max"] = result_counts.max(axis=1)

    # 3. Count the ways k (or more) of n d-sided dice can have the same face
    if exact:
        answer = np.count_nonzero(result_counts["max"] == n_similar)
    else:
        answer = np.count_nonzero(result_counts["max"] >= n_similar)
    return (outcomes, result_counts, answer)


bruno_outcomes, bruno_count_values, bruno_counts = multi_similar_brute_force(3, 5, 6)

First, we tabulate all the possible ways to roll 5 dice, from all ones (⚀⚀⚀⚀⚀) to all sixes (⚅⚅⚅⚅⚅). Since the dice are assumed fair, all these dice rolls can occur with equal probability.

Truncated table of all outcomes for rolling five six-sided dice.
	die_1	die_2	die_3	die_4	die_5
roll_id
0	1	1	1	1	1
1	1	1	1	1	2
2	1	1	1	1	3
3	1	1	1	1	4
4	1	1	1	1	5
...	...	...	...	...	...
7771	6	6	6	6	2
7772	6	6	6	6	3
7773	6	6	6	6	4
7774	6	6	6	6	5
7775	6	6	6	6	6

Next, we count how many times each number appears in each roll by applying the value_counts function on each row in the roll table. We can find the maximum number of similar dice in the roll by getting the row-wise maximum of the value counts.

Truncated table of all count values for rolling five six-sided dice.
	cnt_1	cnt_2	cnt_3	cnt_4	cnt_5	cnt_6	max
roll_id
0	5	0	0	0	0	0	5
1	4	1	0	0	0	0	4
2	4	0	1	0	0	0	4
3	4	0	0	1	0	0	4
4	4	0	0	0	1	0	4
...	...	...	...	...	...	...	...
7771	0	1	0	0	0	4	4
7772	0	0	1	0	0	4	4
7773	0	0	0	1	0	4	4
7774	0	0	0	0	1	4	4
7775	0	0	0	0	0	5	5

Finally, we count how many rolls have maximum value counts greater than or equal to 3. We can just divide the count of rolls that satisfy the condition by the number of total possible rolls since each roll can occur with equal probability. So, our brute-force solution tells us that the probability of getting 3 or more dice to land on the same face when rolling five six-sided dice is $\frac{23}{108} \approx 21.30\%$, making the odds about $3.69 : 1$.

print("Ways to roll 3+ same from 5d6:", bruno_counts)
bruno_proba = bruno_counts / 6**5
print(f"P(3+ of 5d6 same) = {100 * bruno_proba:.2f}%")

Ways to roll 3+ same from 5d6: 1656
P(3+ of 5d6 same) = 21.30%

While the brute-force solution is quick to calculate for small values of $k$, $n$, and $d$, its execution time and memory requirements will skyrocket with larger values. There’s got to be a better way to find the solution.

Partial Solution for k > n/2 with Combinatorics

The easier of the two non-trivial cases I found for the general solution is where the number of similar dice $k$ is greater than half of the total number of dice $n$. In this scenario, there will not be enough remaining non-similar dice to make another subset of similar dice with size greater than or equal to $k$. Because we can discretely separate each die’s outcome as “part of the similar dice” or “not part of the similar dice”, we can use the definition of the binomial distribution to figure out the probability of obtaining $k$ or more dice with the same face.

The binomial distribution describes the probability of getting $k$ successes out of $n$ independent success-or-failure Bernoulli trials, where each Bernoulli trial has a success probability $p$.¹

\[ f_{K}(k; p, n) = \binom{n}{k} p^k (1 - p)^{n-k} \]

The symbol $\binom{n}{k}$ is the binomial coefficient, read as “n choose k”; it denotes the number of ways to choose an unordered subset of $k$ objects out of $n$ objects.²

\[ \binom{n}{k} = \frac{n!}{k!(n-k)!} \]

The binomial distribution comes into play when we split the event of “rolling $k$+ of $n$ $d$-sided dice with the same face” into mutually exclusive events with probabilities we can add up together.

Let’s first look at how the probability of rolling exactly $k$ ones out of $n$ $d$-sided dice is calculated. We can represent the spread of ways to roll any number of ones out of $n$ $d$-sided dice as a binomial distribution with $n = n$ and success probability $p = \frac{1}{d}$. The outcome of rolling all ones on $k$ out of $n$ dice corresponds to a probability value in that binomial distribution’s probability mass function.³

\[ \begin{aligned} \mathbb{P}(\text{$k$ of $n$ $d$-sided dice roll 1}) &= \binom{n}{k} \left( \frac{1}{d} \right)^k \left( 1 - \frac{1}{d} \right)^{n-k} \\ \mathbb{P}(\text{3 of 5d6 roll 1}) &= \binom{5}{3} \left( \frac{1}{6} \right)^3 \left( \frac{5}{6} \right)^{2} &\approx 3.22\% \end{aligned} \]

If we want to calculate the probability of rolling exactly $k$ dice with the same face of $n$ $d$-sided dice, then we can multiply the above probability by the number of sides per die. We can do this because each event of rolling $k$-of-a-kind is mutually exclusive for values of $k > n/2$. For instance, it’s impossible to roll both three ones and three fours on only five dice.

\[ \begin{aligned} \mathbb{P}(\text{$k$ of $n$ $d$-sided dice same}) &= d \binom{n}{k} \left( \frac{1}{d} \right)^k \left( 1 - \frac{1}{d} \right)^{n-k} \\ \mathbb{P}(\text{3 of 5d6 roll same}) &= 6 \binom{5}{3} \left( \frac{1}{6} \right)^3 \left( \frac{5}{6} \right)^{2} &\approx 19.29\% \end{aligned} \]

If we want to calculate the probability of rolling $k$ or more of $n$ $d$-sided dice with the same face, then we can add the probabilities of rolling exactly $k$ dice the same, $k+1$ dice the same, and so on until rolling $n$ dice the same. Once again, we can do this because each of those events are mutually exclusive for $k > n/2$.

\[ \begin{aligned} \mathbb{P}(\text{$k$+ of $n$ $d$-sided dice same}|k > n/2) &= \sum_{i = k}^n \left( d \binom{n}{i} \left( \frac{1}{d}\right)^i \left( 1 - \frac{1}{d} \right)^{n-i} \right) \\ \mathbb{P}(\text{3+ of 5d6 same}) &= \sum_{i = 3}^n \left( 6 \binom{5}{i} \left( \frac{1}{6} \right)^i \left( \frac{5}{6} \right)^{5-i} \right) &\approx 21.30\% \end{aligned} \]

I’ve also typed the equation into Wolfram|Alpha here for you to play with.

Why This Equation Won’t Work for k ≤ n/2

When the number of similar dice $k$ is less than or equal to half the number of similar dice $n$, then modeling the probabilities as a binomial distribution won’t work. The remaining dice may form at least one other subset of dice with a similar face different from the original group in the probability calculation.

For example, the roll ⚀⚀⚅⚃⚃ has two pairs. When counting the ways to roll two out of five six-sided dice with the same face, using the binomial distribution will count this roll (and others like it) more than once, skewing the results and returning probabilities greater than 100%. The combinatorics required needs to take into account both the ways to roll one pair and the ways to roll two pairs.

\[ \begin{aligned} \mathbb{P}(\text{2 of 3d6 same, 1 pair}) &= 6 \binom{5}{2} \times \left( \frac{1}{6} \right)^2 \times \frac{5}{6} \times \frac{4}{6} \times \frac{3}{6} \\ \mathbb{P}(\text{2 of 3d6 same, 2 pairs}) &= 6 \binom{5}{2} \binom{3}{2} \left( \frac{1}{6} \right)^3 \times \frac{5}{6} \times \frac{4}{6} \times \frac{3}{6} \div 2 \end{aligned} \]

print(
    "Count(2 of 5d6 same) with combinatorics:\t",
    sum(
        [
            # one pair
            (math.comb(5, 2) * 6 * 1 * 5 * 4 * 3),
            # two pairs
            (math.comb(5, 2) * math.comb(3, 2) * 6 * 1 * 5 * 1 * 4) // 2,
        ]
    ),
)
print(
    "Count(2 of 5d6 same) with brute force:\t",
    multi_similar_brute_force(2, 5, 6, exact=True)[-1],
)

Count(2 of 5d6 same) with combinatorics:     5400
Count(2 of 5d6 same) with brute force:   5400

I had too much trouble trying to wrap my head around trying to create a recursive function that would calculate the probabilities, so I tried a different approach.

General Solution with Number Theory

Enter the multinomial distribution, a generalization of the binomial distribution that determines the probability of a specific number of counts per outcome of a series of independent discrete probability trials. In our case, we’re using it to determine the probabilities of counts for $n$ $d$-sided dice. The random vector $\vec{X} = [x_1, \ldots, x_d]$ denotes the number of counts for each of the $d$ die faces from 1 to $d$.⁴

\[ f_{\vec{X}}(x_1, \ldots, x_d; n, p_1, \ldots, p_d) = \begin{cases} \frac{n!}{\prod_{i=1}^{d} x_i!} \prod_{i=1}^{d} p_{i}^{x_i} & \text{if $\sum_{i=1}^d x_i = n$}\\ 0 & \text{otherwise} \end{cases} \]

Since only $n$ dice are present, the sum of the random vector $\vec{X}$’s outcomes must sum to $n$. For example, you can’t roll six threes with only five dice, so the probability that $\vec{X} = [0, 0, 6, 0, 0, 0]$ is 0.

five_d_six = sp.stats.multinomial(n=5, p=(1 / 6,) * 6)
print(
    "P(two ones, two fours, one six on 5d6):",
    five_d_six.pmf([2, 0, 0, 2, 0, 1]),
)
print("P(six threes on 5d6):", five_d_six.pmf([0, 0, 6, 0, 0, 0]))

P(two ones, two fours, one six on 5d6): 0.0038580246913580258
P(six threes on 5d6): 0.0

To determine the probability of rolling exactly $k$ of $n$ $d$-sided dice the same, we need to find all possible ways that $d$ whole numbers in the range $\{0, \ldots, k\}$ can sum to $n$ and use the number $k$ at least once.

def integer_partitions(target, n_numbers, min_=0, max_=None):
    """Return the list of integer partitions within the desired bounds."""
    if max_ is None:
        max_ = target

    if n_numbers == 1:
        if min_ <= target <= max_:
            return [(target,)]
        return []
    if n_numbers == 2:
        return [(i, target - i) for i in range(min_, max_ + 1) if target - i <= max_]
    if n_numbers > 2:
        total_partitions = []
        for minuend in range(min_, max_ + 1):
            sub_partitions = integer_partitions(
                target - minuend,
                n_numbers - 1,
                min_=min_,
                max_=min(max_, target - minuend),
            )
            if sub_partitions:
                total_partitions += [(minuend, *part) for part in sub_partitions]
        return total_partitions

    msg = "Incorrect inputs"
    raise ValueError(msg)


partition_example = pd.DataFrame(
    data=integer_partitions(5, 6, min_=0, max_=2),
    columns=[f"cnt_{i}" for i in range(1, 7)],
)
partition_example.index.name = "partition_id"
partition_example

Table of partitions arranging 6 numbers between 0 and 2 that sum to 5.
	cnt_1	cnt_2	cnt_3	cnt_4	cnt_5	cnt_6
partition_id
0	0	0	0	1	2	2
1	0	0	0	2	1	2
2	0	0	0	2	2	1
3	0	0	1	0	2	2
4	0	0	1	1	1	2
...	...	...	...	...	...	...
121	2	1	2	0	0	0
122	2	2	0	0	0	1
123	2	2	0	0	1	0
124	2	2	0	1	0	0
125	2	2	1	0	0	0

The rows in this table represent all the possible ways to arrange six numbers between 0 and 2 inclusive that sum to five; the columns represent the counts of each die face (from 1 to 6) that are present on an individual roll. These integer partitions will be input to the dice roll’s probability mass function, and the probabilities will be summed together to calculate the total probability.

def k_of_n_d_dice_proba(n_similar, n_dice, die_sides):
    """Get the probability of rolling k of n d-sided dice with the same face."""
    if n_similar > n_dice / 2:
        return die_sides * sp.stats.binom(n=n_dice, p=1 / die_sides).pmf(n_similar)

    dice_rvs = sp.stats.multinomial(n=n_dice, p=(1 / die_sides,) * die_sides)
    partitions = integer_partitions(n_dice, die_sides, max_=n_similar)
    return sum(dice_rvs.pmf(part) for part in partitions if (n_similar in part))


def k_plus_of_n_d_dice_proba(n_similar, n_dice, die_sides):
    """Get the probability of rolling k+ of n d-sided dice with the same face."""
    return sum(
        k_of_n_d_dice_proba(i, n_dice, die_sides) for i in range(n_similar, n_dice + 1)
    )

Here’s some code that generates a table of probabilities for rolling $k$ or more out of $n$ six-sided dice with the same face for different values of $k$ and $n$.

k_range = np.arange(2, 11)
n_range = np.arange(3, 11)
dice_table = [[k_plus_of_n_d_dice_proba(k, n, 6) for n in n_range] for k in k_range]

dice_df = pd.DataFrame(
    data=dice_table,
    columns=n_range,
    index=k_range,
)
dice_df.index.name = "n_similar, k"
dice_df.columns.name = "n_dice, n"
# Represent probabilities as percentages
(dice_df * 100).map(lambda f: "{:.5f}%".format(f) if f != 0 else "--")

Table of probabilities for rolling $k$ or more out of $n$ 6-sided dice with the same face.
n_dice, n	3	4	5	6	7	8	9	10
n_similar, k
2	44.44444%	72.22222%	90.74074%	98.45679%	100.00000%	100.00000%	100.00000%	100.00000%
3	2.77778%	9.72222%	21.29630%	36.72840%	54.08951%	70.74331%	84.24640%	93.24846%
4	--	0.46296%	2.00617%	5.22119%	10.57956%	18.33133%	28.40030%	40.31969%
5	--	--	0.07716%	0.39866%	1.20242%	2.76527%	5.37004%	9.27093%
6	--	--	--	0.01286%	0.07716%	0.26470%	0.68147%	1.46289%
7	--	--	--	--	0.00214%	0.01465%	0.05632%	0.16051%
8	--	--	--	--	--	0.00036%	0.00274%	0.01167%
9	--	--	--	--	--	--	0.00006%	0.00051%
10	--	--	--	--	--	--	--	0.00001%

If you scroll to the right of the table, you might notice something peculiar: when rolling seven or more six-sided dice, there is a one hundred percent chance that two or more dice will share the same face. This is an example of the pigeonhole principle in action. When rolling seven or more six-sided dice, there is no way for each die to land on a unique face because there are less die faces than there are dice to roll.

Because of how the number theory solution works, it would be incredibly difficult (maybe impossible?) to give a full general solution for finding the probability that $k$ or more of $n$ $d$-sided dice have the same face that would be concise enough to plug into Wolfram|Alpha. I’d say that this is close enough.

Conclusion

Because of floating-point rounding issues, the brute-force probability calculation and the number theory probability calculation will not be exactly the same.

print("Brute-force probability:\t", multi_similar_brute_force(2, 5, 6)[-1] / 6**5)
print("Number theory probability:\t", k_plus_of_n_d_dice_proba(2, 5, 6))

Brute-force probability:     0.9074074074074074
Number theory probability:   0.9074074074074066

For example, compare the probabilities of rolling two or more out of five six-sided dice with the same face, calculated with two different methods. The error is a couple magnitudes smaller than the probability calculation, so it’s basically negligible.

Package Versions & Last Run Date

Python 3.13.5 (main, Jun 25 2025, 18:55:22) [GCC 14.2.0]
NumPy 2.3.3
Pandas 2.3.3
SciPy 1.16.2
Last Run 2025-10-04 12:51:30.602185

References

Leon-Garcia, Alberto. Probability, Statistics, and Random Processes for Electrical Engineering. 3. ed. Upper Saddle River, NJ: Pearson/Prentice Hall, 2008.

Footnotes

Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, chap. 3.1.5.↩︎
Leon-Garcia, chap. 2.6.3.↩︎
This is why I chose n and k as variable names. The fact that p and d are each other flipped is a coincidence (since I chose d for dice).↩︎
Leon-Garcia, Probability, Statistics, and Random Processes for Electrical Engineering, chap. 8.7.↩︎

Reuse

CC BY SA 4.0 International License(View License)