Solving MathDoku with Mixed-Integer Programming

KenKen™ is Hebrew for YesYes


October 28, 2023


December 3, 2023

MathDoku (also known as KenKen™, KenDoku™, CalcuDoku, and SquareLogic) is a mathematical logic puzzle that I really liked in middle school. Like Sudoku, MathDoku is based on the Latin square: the goal of the puzzle is to fill in an N-by-N grid with the digits 1 to N in such a way that each row and column only contains one of each digit. Additionally, the MathDoku puzzle grid is further subdivided into cages, collections of adjacent cells whose numbers must perform an operation to reach a target value. With a little bit of mathematical rearranging, we can convert MathDoku puzzles into a mixed-integer programming problem and leverage the power of linear programming to solve them.

In general, optimization problems require making the right decisions subject to some constraints to complete some objective as much or as little as possible. To convert this into mathematics, we need to find the chosen extremum of some objective function by choosing the right decision variables that are subject to constraints. Linear programming is a technique used for solving optimization problems where both the objective function and constraints are defined by linear equations, equalities, or inequalities relating to the decision variables. Mixed-integer linear programming is a further restriction on linear programming, where some to all decision variables are integers.1 I’m not going to go into too much detail about linear programming in this article, but it’s a fascinating tool to use for solving problems. I highly recommend watching “What in the world is a linear program?” by OptWhiz, as I think this was the best introduction to linear programming that I’ve seen so far.

Furthermore, I’ve chosen Julia as my language of choice for this article, since it is more suited to scientific computing problems than Python and its one-based indexing makes formulating the problem a little easier. For optimizing, I’m using the JuMP optimization library to connect to HiGHS, a powerful linear optimization tool.

using Dates
using Printf

using JuMP
using HiGHS

Setting up the MathDoku in Julia

Here is a 6-by-6 MathDoku puzzle that I generated with KSudoku on my computer that we’re going to solve using mixed-integer programming. If you want to try this puzzle yourself, you can download the XML file.

A 6-by-6 MathDoku Puzzle. Cage 1 has the instruction 30-times, and has cells at indices (1, 1), (1, 2), (1, 3). Cage 2 has the instruction 7-plus, and has cells at indices (1, 4), (1, 5). Cage 3 has the instruction 2-equals, and has cells at indices (1, 6). Cage 4 has the instruction 30-times, and has cells at indices (2, 1), (3, 1), (4, 1). Cage 5 has the instruction 2-times, and has cells at indices (2, 2), (2, 3). Cage 6 has the instruction 21-plus, and has cells at indices (2, 4), (2, 5), (2, 6), (3, 5). Cage 7 has the instruction 2-divide, and has cells at indices (3, 2), (4, 2). Cage 8 has the instruction 4-times, and has cells at indices (3, 3), (3, 4). Cage 9 has the instruction 90-times, and has cells at indices (3, 6), (4, 5), (4, 6), (5 5). Cage 10 has the instruction 90-times, and has cells at indices (4, 3), (5, 2), (5, 3). Cage 11 has the instruction 3-equals, and has cells at indices (4, 4). Cage 12 has the instruction 1-equals, and has cells at indices (5, 1). Cage 13 has the instruction 120-times, and has cells at indices (5, 4), (6, 3), (6, 4), (6, 5). Cage 14 has the instruction 3-minus, and has cells at indices (5, 6), (6, 6). Cage 15 has the instruction 1-minus, and has cells at indices (6, 1), (6, 2).

The 6x6 MathDoku puzzle used in this article.

Operations as an Enum

Most variations of MathDoku that I’ve seen usually have five different types of operations, which I’m going to represent using an enum. The operations for each cage are detailed in the table below:

Cage Type Symbol Cage Rule Description
Equality = (or None) The one cell value equals the target number.
Addition + The cell values sum together to the target number.
Subtraction - The two cell values have a difference equal to the target number.
Multiplication × The cell values have the target number as their product.
Division ÷ The maximum cell value divided by the minimum cell value equals the target number.
Different operations that a MathDoku cage can have.
@enum Operation begin
Side Note

Each operation has special requirements about cage sizes. Equality cages can only contain 1 cell, subtraction cages and division cages only contain 2 cells, and addition and multiplication cages can have any number of cells. Checking for this requirement is outside the scope of this article.

Cages as a Struct

Each cage can be represented by its target number, its operation, and the coordinates on the MathDoku puzzle grid it occupies. We can represent their locations as a vector of CartesianIndex structs with length 2. This built-in type can extract values from matrices, which makes the most sense for keeping track of cage locations on the MathDoku puzzle’s grid. The CartesianIndex’s first element corresponds to a cell’s row, and its second element corresponds to a cell’s column.

A single cage (collection of cells) in a MathDoku puzzle.
struct Cage
    """Correct value when applying the operation on the cell values."""
    """Mathematical operation used to produce the result."""
    """Coordinate pairs belonging to this cage."""
    """Instantiate Cage with a tuple vector to avoid repeatedly typing `CartesianIndex`."""
    function Cage(target::Int, operation::Operation, coords::Vector{Tuple{Int,Int}})
        return new(target, operation, [CartesianIndex(i) for i in coords])

Repeatedly typing out CartesianIndex multiple times when defining a cage makes reading code difficult, so I’m going to instead create a constructor method that converts an inputted vector of tuples into a vector of CartesianIndices. Here are all fifteen cages from my MathDoku puzzle instantiated as Cage structs.

c1 = Cage(30, times, [(1, 1), (1, 2), (1, 3)])
c2 = Cage(7, plus, [(1, 4), (1, 5)])
c3 = Cage(2, equals, [(1, 6)])
c4 = Cage(30, times, [(2, 1), (3, 1), (4, 1)])
c5 = Cage(2, times, [(2, 2), (2, 3)])
c6 = Cage(21, plus, [(2, 4), (2, 5), (2, 6), (3, 5)])
c7 = Cage(2, divide, [(3, 2), (4, 2)])
c8 = Cage(4, times, [(3, 3), (3, 4)])
c9 = Cage(90, times, [(3, 6), (4, 5), (4, 6), (5, 5)])
c10 = Cage(90, times, [(4, 3), (5, 2), (5, 3)])
c11 = Cage(3, equals, [(4, 4)])
c12 = Cage(1, equals, [(5, 1)])
c13 = Cage(120, times, [(5, 4), (6, 3), (6, 4), (6, 5)])
c14 = Cage(3, minus, [(5, 6), (6, 6)])
c15 = Cage(1, minus, [(6, 1), (6, 2)])

my_cages = (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15);
Side Note

Cages that belong to the same MathDoku puzzle are not supposed to overlap (i.e., share cell coordinates). Checking for this requirement is also outside the scope of this article.

Assembling the MathDoku Puzzle

Armed with all the necessary components, we can now define our MathDoku puzzle as the set of cages that define it. To calculate the MathDoku puzzle’s size, we sum up the lengths of each cage’s coordinates vector and calculate the square root. This is an important part of the MathDoku puzzle’s definition, as it is a key requirement for how the linear program is instantiated; I’m going to include it as a part of the struct as well by creating a constructor method that calculates it.

A MathDoku puzzle.
struct MathDoku
    """Cages belonging to the MathDoku puzzle."""
    """Size length of the MathDoku puzzle."""
    """Instantiate a new MathDoku puzzle by inputting the cages that define it."""
    function MathDoku(cages::Tuple{Vararg{Cage}})
        return new(cages, Int(√sum([length(cage.coords) for cage in cages])))

Let’s define the MathDoku puzzle we want to solve. We’ve already defined its cages, so we can reuse them in the definition.

md = MathDoku(my_cages);

Solving the MathDoku

Now that we have our MathDoku, let’s solve it using mixed-integer programming.

md_model = Model(HiGHS.Optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: HiGHS

Each cell in an N-by-N MathDoku puzzle has a possibility of being any one of N numbers from 1 to N, but can only be one number at a time. So, we can represent these values for a single cell as a one-hot encoded vector of binary values with size N where the position of the true value in the vector corresponds to the value of the cell.

For example:

[false, true, false, false, false, false] => 2

We have one of these vectors for each cell in the MathDoku puzzle, so we can represent this as an N-by-N-by-N multidimensional binary array C. The first dimension corresponds to the values stored in each row of the puzzle, the second dimension corresponds to the values stored in each column of the puzzle, and the third dimension corresponds to which digit is stored in which cell. The candidate vectors for row i and column j are stored in the 1-by-1-by-N “tube” that contains all elements located in row i and column j of the multidimensional array.

\[ c_{i,j,k} = \begin{cases} 1 & \text{if cell }(i, j) = k\\ 0 & \text{otherwise} \end{cases} \quad \forall \left(i, j, k\right) \in \{1, \ldots, N\}^3 \]

@variable(md_model, candidates[row=1:(md.size), col=1:(md.size), digit=1:(md.size)], Bin);

Row, Column, and Candidate Constraints

Since we know that the MathDoku puzzle is based on a Latin square, we know that there can only be one number of each type per row and column. To do this, we begin adding constraints to our candidate variables such that every row of candidate variables and column of candidate variables must sum to one. As previously mentioned, we also need to constrain the candidates so that only one value in each 1-by-1-by-N tube is true at a time.2

This leads to the following constraints:

\[ \begin{aligned} \text{rows: } & \sum_{i = 1}^{N} c_{i,j,k} = 1 \quad \forall \left(j, k\right) \in \{1, \ldots, N\}^2 \\ \text{columns: } & \sum_{j = 1}^{N} c_{i,j,k} = 1 \quad \forall \left(i, k\right) \in \{1, \ldots, N\}^2 \\ \text{candidates: } & \sum_{k = 1}^{N} c_{i,j,k} = 1 \quad \forall \left(i, j\right) \in \{1, \ldots, N\}^2 \\ \end{aligned} \]

This is how we implement those constraints in Julia.

# Latin Square Constraints
for a in 1:(md.size)
    for b in 1:(md.size)
        # Only one number of each type per row
        @constraint(md_model, sum(candidates[:, a, b]) == 1)
        # Only one number of each type per column
        @constraint(md_model, sum(candidates[a, :, b]) == 1)
        # Only one number can occupy each cell
        @constraint(md_model, sum(candidates[a, b, :]) == 1)

Cage Constraints for Cell Values

Let’s define some expressions to make constraining the cells in each cage easier. These are shorthand for specific linear combinations of variables that we can reuse so that we don’t have to clutter our constraints with overly complicated equations.

Let’s define the N-by-N matrix of expressions V that contains the values of each cell in the MathDoku grid. Each element in V is equal to the corresponding tube of candidates multiplied elementwise by their position and summed together, as shown here.3

\[ v_{i,j} = \sum_{k = 1}^{N} \left(k \times c_{i,j,k}\right) \quad \forall \left(i, j\right) \in \{1, \ldots, N\}^2 \]

    vals[row=1:(md.size), col=1:(md.size)],
    sum(candidates[row, col, :] .* collect(1:(md.size)))

Equality Cage Constraints

Equality cages are the easiest cages to form constraints around. The value of the only cell in the cage vi,j must be equal to the target number t. If we wanted to, we could expand this by substituting vi,j with its definition.4

\[ \begin{aligned} &v_{i,j} &= t \\ \Leftrightarrow & \sum_{k = 1}^{N} \left(k \times c_{i,j,k}\right) & = t \end{aligned} \]

Addition Cage Constraints

Addition cages are a little harder, but still doable. The sum of the values in cage A must be equal to the target number t.5

\[ \sum_{(i, j) \in \mathbf{A}} v_{i,j} = t \]

Subtraction Cage Constraints

Subtraction cages are the first type of cage that we have to introduce more auxiliary variables to deal with; this is because the raw constraint required for subtraction cages shown below is nonlinear.

\[ \begin{aligned} \left| v_{i_1, j_1} - v_{i_2, j_2} \right| & = t \\ & \Downarrow & \\ \{ v_{i_1, j_1} - v_{i_2, j_2} = t \} & \text{ or } \{ v_{i_1, j_1} - v_{i_2, j_2} = -t \} \end{aligned} \]

To solve this, we’ll have to add a binary variable δ to linearize the solution. Using this binary variable δ, We can set the right-hand side of the equation equal to be equal to either t or -t, but not both.6

\[ \delta \in \{0, 1\} \]

\[ \left.\begin{matrix} \{ v_{i_1, j_1} - v_{i_2, j_2} &=& -t \} & \text{ or} \\ \{ v_{i_1, j_1} - v_{i_2, j_2} &=& t \} \end{matrix}\right\} \Rightarrow v_{i_1, j_1} - v_{i_2, j_2} = t \left(2 \delta - 1 \right) \]

Division Cage Constraints

While we could get by with subtraction cages by adding auxiliary variables, division itself is nonlinear! How are we supposed to rewrite a division cage’s constraints below as a system of linear equations?

\[ \{ v_{i_1, j_1} \div v_{i_2, j_2} = t \} \text{ or } \Big\{ v_{i_1, j_1} \div v_{i_2, j_2} = \frac{1}{t} \Big\} \]

Fortunately, we can use properties of logarithms to help us out. By definition, the product of two numbers is equal to the sum of their logarithms, and the difference of two numbers is equal to the difference of their logarithms. Additionally, the logarithm of the inverse of a number is equal to the negative logarithm of that number.

\[ \begin{aligned} \log(a \times b) &= \log(a) + \log(b) \\ \log(a \div b) &= \log(a) - \log(b) \\ \log\left(\frac{1}{a}\right) &= -\log(a) \end{aligned} \]

By using logarithm properties, we can perform substitution to convert our division constraints into a subtraction constraint and condense it using the same tactic for subtraction cages. However, a problem arises; the resulting constraint is a linear combination of the logarithms of our values.

\[ \begin{aligned} \{ v_{i_1, j_1} \div v_{i_2, j_2} = t \} & \text{ or } \Big\{ v_{i_1, j_1} \div v_{i_2, j_2} = \frac{1}{t} \Big\} \\ & \Downarrow & \\ \{ \log(v_{i_1, j_1}) - \log(v_{i_2, j_2}) = \log(t) \} & \text{ or } \{ \log(v_{i_1, j_1}) - \log(v_{i_2, j_2}) = -\log(t) \} \\ & \Downarrow & \\ \log\left(v_{i_1, j_1}\right) - \log\left(v_{i_2, j_2}\right) & = \log(t) \left(2 \delta - 1 \right) \end{aligned} \]

We can create a new expression matrix L that will be equal to the log of the value expressions V when the puzzle is fully solved. Since each 1x1xN “tube” of candidate values should only have one true that corresponds to V, we can simply define the corresponding log-value in L equal to summing the elementwise product with the logarithms of 1 to N. We can use these expressions to skirt around the fact that logarithms themselves are also nonlinear.7

\[ \begin{aligned} \ell_{i,j} &= \sum_{k = 1}^{N} \left(\log(k) \times c_{i,j,k}\right) \quad \forall \left(i, j\right) \in \{1, \ldots, N\}^2 \\ \ell_{i,j} &= \log\left(v_{i,j}\right) \text{ when constraints are solved}\ \end{aligned} \]

With the elements of the expression matrix L, we can finalize the constraints for the division cage below.

\[ \begin{aligned} \log\left(v_{i_1, j_1}\right) - \log\left(v_{i_2, j_2}\right) & = \log(t) \left(2 \delta - 1 \right) \\ & \Downarrow & \\ \ell_{i_1, j_1} - \ell_{i_2, j_2} & = \log(t) \left(2 \delta - 1 \right) \end{aligned} \]

This is how we would define the log-values in Julia.

    log_vals[row=1:(md.size), col=1:(md.size)],
    sum(candidates[row, col, :] .* log2.(1:(md.size)))

Multiplication Cage Constraints

It’s no surprise that multiplication itself is also nonlinear; but with the logarithm-values provided from figuring out division cages, we now know that we can convert some nonlinear constraints into linear constraints. With the logarithm-value expressions, we can convert a product of values into a sum of log-values.8

\[ \begin{aligned} &\prod_{(i, j) \in \mathbf{A}} v_{i,j} &=& t \\ \Rightarrow & \sum_{(i, j) \in \mathbf{A}} \log\left(v_{i,j}\right) &=& \log(t) \\ \Rightarrow & \sum_{(i, j) \in \mathbf{A}} \ell_{i,j} &=& \log(t) \\ \end{aligned} \]

Completing the Cage Constraints in Julia

Since we don’t need the auxiliary binary variable δ for each cage, we can immediately set it to 0 for cages where it is not needed. Therefore, this is how we write our cage constraints for each cage in the MathDoku puzzle.

# Auxiliary binary variables for subtraction and division cages
@variable(md_model, δ[1:length(md.cages)], Bin)
# Cage Constraints
for (idx, cage) in enumerate(md.cages)
    if cage.operation == equals
        @constraint(md_model, vals[cage.coords[1]] ==
        fix(δ[idx], 0)
    elseif cage.operation == plus
        @constraint(md_model, sum(vals[cage.coords]) ==
        fix(δ[idx], 0)
    elseif cage.operation == minus
        @constraint(md_model, -(vals[cage.coords]...) == * (2 * δ[idx] - 1))
    elseif cage.operation == times
        @constraint(md_model, sum(log_vals[cage.coords]) == log2(
        fix(δ[idx], 0)
    elseif cage.operation == divide
            -(log_vals[cage.coords]...) == log2( * (2 * δ[idx] - 1)
        throw("Invalid cage operation")

Solving the MathDoku

Finally, now that we have completed the constraints, initiated the variables, and used the expressions, we can finally solve the MathDoku puzzle. Since we assume that the MathDoku puzzle only has one solution, we don’t need to define an objective function to solve it.

lp_solution = round.(Int, value.(vals))
Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Presolving model
123 rows, 192 cols, 736 nonzeros
105 rows, 149 cols, 582 nonzeros
12 rows, 0 cols, 0 nonzeros
0 rows, 0 cols, 0 nonzeros
Presolve: Optimal

Solving report
  Status            Optimal
  Primal bound      0
  Dual bound        0
  Gap               0% (tolerance: 0.01%)
  Solution status   feasible
                    0 (objective)
                    0 (bound viol.)
                    0 (int. viol.)
                    0 (row viol.)
  Timing            0.00 (total)
                    0.00 (presolve)
                    0.00 (postsolve)
  Nodes             0
  LP iterations     0 (total)
                    0 (strong br.)
                    0 (separation)
                    0 (heuristics)
6×6 Matrix{Int64}:
 6  5  1  4  3  2
 3  1  2  6  4  5
 5  2  4  1  6  3
 2  4  5  3  1  6
 1  6  3  2  5  4
 4  3  6  5  2  1

Look at that! There’s the solution to the MathDoku puzzle.

Issues with This Approach

The main problem with using linear programming to solve logic puzzles like MathDoku and Sudoku is that the tools for these types of problems usually find only one solution. This is perfectly fine for most optimization problems, as they need to find the best values for decision variables; however, logic puzzles like MathDoku aren’t supposed to have more than one solution. Alternative methods of formulating these logic puzzles as constraint satisfaction problems or iterating through possible solutions using search algorithms would definitely work for finding multiple solutions, but they would take much more time.

Package Versions & Last Run Date
@printf "Julia %s\n" VERSION
@printf "JuMP %s\n" pkgversion(JuMP)
@printf "HiGHS %s\n" pkgversion(HiGHS)
@printf "Last Run %s\n" now(UTC)
Julia 1.9.2
JuMP 1.16.0
HiGHS 1.7.5
Last Run 2023-11-28T12:58:11.371


Ercsey-Ravasz, Maria, and Zoltan Toroczkai. “The Chaos Within Sudoku.” arXiv, August 2012.
JuMP core developers and contributors. JuMP,” October 2023.
Kalvelagen, Erwin. “Yet Another Math Programming Consultant: MIP Modeling: From Sudoku to KenKen via Logarithms.” Yet Another Math Programming Consultant, October 2016.
Melkonian, Vardges. “An Integer Programming Model for the KenKen Problem.” American Journal of Operations Research 6, no. 3 (May 2016): 213–25.


  1. JuMP core developers and contributors, JuMP.↩︎

  2. Ercsey-Ravasz and Toroczkai, “The Chaos Within Sudoku.↩︎

  3. Ercsey-Ravasz and Toroczkai.↩︎

  4. Kalvelagen, MIP Modeling.↩︎

  5. Melkonian, “An Integer Programming Model for the KenKen Problem.↩︎

  6. Melkonian.↩︎

  7. Kalvelagen, MIP Modeling.↩︎

  8. Kalvelagen.↩︎