\newtheoremstyle{basic}% name of the style to be used

{}% measure of space to leave above the theorem. E.g.: 3pt

{}% measure of space to leave below the theorem. E.g.: 3pt

{\normalfont}% name of font to use in the body of the theorem

{}% measure of space to indent

{\bfseries}% name of head font

{:}% punctuation between head and body

{.5em}% space after theorem head; " " = normal interword space

{}% Manually specify head

\theoremstyle{basic}

\newtheorem{defi}{Definition}[section]

\newtheorem{thm}{Theorem}[section]

\DeclarePairedDelimiter\ceil{\lceil}{\rceil}

\DeclarePairedDelimiter\floor{\lfloor}{\rfloor}

\title{ An attempt to solve 3D Connect Four \\\small{NI-MVI: Semestral project Milestone}}

\author{Jan Pokorný }

\date{December 2021}

\begin{document}

\maketitle

\section{Introduction}

This

The game Tic-Tac-Toe is a well-known game. On a $3\times3$ board both players are trying to occupy a line with 3 of their symbols. If both players play optimally, then the game results in a draw. This simple game can be generalized and modified into many various positional games.

A simple generalization is to increase the grid size and add more dimension. This gives us for example the game Qubic ($4\times4\times4$ board). A modification can be to add gravity and we get the known game connect four. By combining these two games we get Connect four 3D, which is played on the same board as Qubic, but the players can only choose columns and the stone is placed at first empty slot in that column.

The space of states of these games is large. Determining the outcome under optimal play and strategy is possible largely thanks to computers. One of the first computer-aided results in, combinatorial game theory was solving Qubic

\cite{patashnik1980qubic}. To the best of our knowledge, nobody yet solved Connect Four 3D.

\section{Research}

\subsection{Positional games}

Positional games are games of perfect information between two players that are played on hypergraphs (\textit{board}). The players alternate in choosing a vertex (\textit{point}) to occupy. A player wins if he occupies hyperedge (\textit{winning set}) before the opponent does. There are multiple variants of the game that differentiate mostly by the winning condition. The described type of positional game is called a \textit{strong positional game}.

\begin{thm} Let $H$ is a hypergraph. The first player of a strong game played on $H$ can force a draw or a win. \cite{beck}

\textit{Idea of a proof: } For contradiction, assume that the second player has a winning strategy. A winning strategy is just a series of instructions where to play after the last opponent move. The first player can steal this strategy by playing at an arbitrary point and then pretend he is the second player and following the strategy of a second player. Whenever the winning strategy tells the player to play to the arbitrary point, choose another arbitrary point. The extra occupied point benefits only the owner, not the opponent.

Since both players cannot have a winning strategy, we have come to a contradiction.

\end{thm}

The theorem tells us that in a strong positional game of perfect players, either the first player can force a win or the second player can force a draw.

The two possible outcomes can be further classified into these classes according to Golomb and Hales \cite{golomb2000hypercube}:

\begin{enumerate}

\item The first player must win no matter his plies.

\item No draws are possible, therefore the first player can force a win.

\item Even though a draw position exists, the first player can force a win.

\item The second player can force a draw that is not a trivial pairing strategy. \label{nontrivialDraw}

\item The second player can force a draw using a pairing strategy.

\end{enumerate}

Beck further distinguished class \ref{nontrivialDraw} into two classes on whether the first player can force a weak win - to occupy a winning set after the second player already occupies a winning set \cite{beck}.

\subsubsection{Connect four 3D}

\begin{figure}

\includegraphics[scale=0.1]{c4example.jpg}

\caption{An example of Connect for 3D position from \cite{romen}}

\end{figure}

The game is played on 4x4x4 board, players can only choose from the columns. There are 16 columns total, meaning there are $16^64$ of game at most (this ignores the fact that some columns will be full before 64 plies are made). With more careful analysis there is only 30 possible states for each column, so $30^16$ possible board states. This is still too much to evaluate by brute force, so more clever ways to search space of game states are needed, which will be examined in more detail in following section \ref{search-algos}.

This game is very similar to Qubic, since they share the board and winning lines. Unfortunately because of the gravity effect there is much harder to do threats than in Qubic.

It has "only" 8 automorphisms - there are 4 rotations and mirrorings. So it is much harder to reduce the space of search in comparison to Qubic.

There were a few attempts in recent years to solve this game. The most recent attempt \cite{romen} managed to solve position with 7 stones in play in 43 days on computer with 28 cores.

To determine the outcome of the game is a PSPACE-Complete problem for many games such as Hex, Go, Checkers, or Gobang (also known as Gomoku or 5-in-a-row) \cite{reisch1981hex}. In addition, the number of possible plays of combinatorial games is large. For example, the number of different board states of Qubic (including positions having more than 1 winning set occupied) is $4.1\times10^{29}$\cite{Dvorak}. Therefore using an algorithm that will search all game positions is impractical. Fortunately, not every position has to be evaluated. If we want to show that the game is a win for the first player, we do not have to show all possible moves of the first player. It is sufficient to show that for a given position, there exists one move of the first player that transforms the game to a position that for any move of the second player is a winning position for the first player. Similarly, if we want to show that the second player can force a draw. The number of positions can be decreased further by using automorphisms.

The idea of searching only one move per first player position was used in solving Qubic \cite{patashnik1980qubic}, where Patashnik picked the first player strategic moves by hand (there were 2929 such moves) and more than a million moves were made by the computer in forcing moves that created a sequence of threats (positions with a point that the opponent must play or loses) leading up to a double threat.

\subsubsection{Proof number search}

A more general algorithm, the proof number search (PNS), was developed by Allis \cite{allis1994searching}. The search is made on a AND/OR tree. Every node is either AND or OR node and contains information whether it was solved and if so, what is the outcome. It also contains information how many of its children have to be proved or disproved to either prove or disprove the node itself, those numbers are called the proof and disproof numbers. The OR node corresponds to a position when we do not have to verify every move, but one is sufficient (first player position if we want to prove win of the first player). The AND node corresponds to the positions of the opposite player.

The algorithm repeatedly expands the most proving node (MPN) until the game is solved. A MPN is a node that is guaranteed to decrease the proof and disproof number of its parent if it is solved. Inside the expansion, the evaluation of the new nodes is made, this part can be done by other algorithms such as threat space search. After the expansion, the proof and disproof numbers of the ancestor nodes are updated. The ancestor nodes are updated even though the MPN remains in their subtree. This inefficiency tries to solve PNS variant DF-PN by staying in the current subtree until the MPN is in any other subtree. Another variant, WPNS, addresses that the search space is not a tree but a DAG.

\cite{PNS20Years}

To not solve the same position multiple times transposition tables are used, which stores the positions and their states. The transposition table is usually very large, hence the PNS algorithm uses a vast amount of computer memory.

\subsubsection{PNS with neural networks}

(Un)fortunately the game Connect four 3d is not well studied, so I have not found any paper on using neural networks for this particular game or even for the Hypercube Tic-Tac-Toe. But the game Hex is played competitively and lots of papers were made on that topic including solving it with neural networks.

Both of papers I have read on this topic used convolutional neural network \cite{Move-prediction-CNN} and \cite{Focused-CNN}. In both of the papers padded the game board with extra cells, so the board was not getting smaller than it already was when applying the convolution filters.

They introduced two ways how to use neural networks in PNS. The policy neural network tries to predict the probability of playing for each possible move. Which dictates the order in which are children nodes examined. The second value network was used only in \cite{Focused-CNN}, it predicts the proof and disproof numbers, adjusting which node will become MPN.

Also both papers gave the neural networks more information than just whether each position is black stone or white stone or empty, but identified in which type of position the stone was. Also they did not use pooling, since it removes information about the position.

They trained the nets on games played with the state of the art solvers (MoHex 2.0 and Wolve), since they are stronger than most human players.

There was also another approach to solving Hex using Q learning \cite{Neurohex}, but it scored poorly (20 \% wins as first player and 2 \% wins as a second player) against the MoHex solver.

\section{My work so far}

My work to this day was to read about game search, particularly PNS and ways to improve them using machine learning. Currently the best approach seems to be using neural networks.

I also made a simple demo of a proof number search (I was not able to add machine learning yet).

\subsection{Current demo}

In the demo I have implemented algorithms PNS and DFPN, which I am planning to improve using neural networks. It is done for Qubic, because it is a little bit simpler than Connect four 3D and it is easier to verify, since it is a solved game. But it is very similar to Qubic and could be changed to it quite easily.

There is also a script \texttt{GameTest.py} for measuring the performance of the algorithms which is based on two metrics - the time it took to solve the current position and how much nodes was created in the process. A simple modification to the order of the moves can be seen quite significantly int these metrics.

\section{Data and Use of neural networks}

Since there is not any strong available solver, I have to generate my own data. The first data will be evaluated by PNS. Since PNS alone probably cannot solve the game with current power of computers, only games that are close to end are able to generate this way. After training the first version of neural network, hopefully it could be used in the PNS to evaluate harder positions.

The heuristic to predict is most likely how many plies are needed for the win of the current game, if the game is lost then it will be infinity. This can be easily generated and the children nodes of a position can be easily sorted from least moves needed.

This heuristic can be also used for other searches like MinMax used in \cite{Romen}.

Unfortunately the CNNs may not be ideal for this game since, the board is so small, but that might enable to use fully connected layers more.

To get more data, it is possible to use the symmetries of the game and by randomly rotating and mirroring and getting up to 8 times more data.