This milestone report is divided into two main parts research and my work/ideas. In the research part I included explanation of the topic of positional games, since it may not be obvious.

This milestone report is divided into two main parts, research and my work/ideas. In the research part, I included explanation of the topic of positional games, since it may not be obvious.

The game Tic-Tac-Toe is a well-known game. On a $3\times3$ board both players are trying to occupy a line with 3 of their symbols. If both players play optimally, then the game results in a draw. This simple game can be generalized and modified into many various positional games.

A simple generalization is to increase the grid size and add more dimension. This gives us for example the game Qubic ($4\times4\times4$ board). A modification can be to add gravity and we get the known game connect four. By combining these two games we get Connect four 3D, which is played on the same board as Qubic, but the players can only choose columns and the stone is placed at first empty slot in that column.

A simple generalization is to increase the grid size and add more dimensions. This gives us, for example, the game Qubic ($4\times4\times4$ board). A modification can be to add gravity and we get the known game connect four. By combining these two games, we get Connect four 3D, which is played on the same board as Qubic, but the players can only choose columns and the stone is placed at the first empty slot in that column.

The space of states of these games is large. Determining the outcome under optimal play and strategy is possible largely thanks to computers. One of the first computer-aided results in, combinatorial game theory was solving Qubic

\cite{patashnik1980qubic}. To the best of our knowledge, nobody yet solved Connect Four 3D.

...

...

@@ -86,12 +86,12 @@ Beck further distinguished class \ref{nontrivialDraw} into two classes on whethe

\caption{An example of Connect for 3D position from \cite{romen}}

\end{figure}

The game is played on 4x4x4 board, players can only choose from the columns. There are 16 columns total, meaning there are $16^64$ of game at most (this ignores the fact that some columns will be full before 64 plies are made). With more careful analysis there is only 30 possible states for each column, so $30^16$ possible board states. This is still too much to evaluate by brute force, so more clever ways to search space of game states are needed, which will be examined in more detail in following section \ref{search-algos}.

The game is played on a 4x4x4 board, players can only choose from the columns. There are 16 columns total, meaning there are $16^64$ of games at most (this ignores the fact that some columns will be full before 64 plies are made). With more careful analysis, there is only 30 possible states for each column, so $30^16$ possible board states. This is still too much to evaluate by brute force, so more clever ways to search the space of game states are needed, which will be examined in more detail in following section \ref{search-algos}.

This game is very similar to Qubic, since they share the board and winning lines. Unfortunately because of the gravity effect there is much harder to do threats than in Qubic.

It has "only" 8 automorphisms - there are 4 rotations and mirrorings. So it is much harder to reduce the space of search in comparison to Qubic.

This game is very similar to Qubic, since they share the board and winning lines. Unfortunately, because of the gravity effect, there is much harder to do threats than in Qubic.

It has "only" 8 automorphisms - there are 4 rotations and mirrorings. Therefore, it is much harder to reduce the space of search in comparison to Qubic.

There were a few attempts in recent years to solve this game. The most recent attempt \cite{romen} managed to solve position with 7 stones in play in 43 days on computer with 28 cores.

There were a few attempts in recent years to solve this game. The most recent attempt \cite{romen} managed to solve a position with 7 stones in play in 43 days on a computer with 28 cores.

@@ -111,13 +111,13 @@ To not solve the same position multiple times transposition tables are used, whi

\subsubsection{PNS with neural networks}

(Un)fortunately the game Connect four 3d is not well studied, so I have not found any paper on using neural networks for this particular game or even for the Hypercube Tic-Tac-Toe. But the game Hex is played competitively and lots of papers were made on that topic including solving it with neural networks.

(Un)fortunately, the game Connect Four 3D is not well studied, so I have not found any paper on using neural networks for this particular game or even for the Hypercube Tic-Tac-Toe. However, the game Hex is played competitively and lots of papers were made on that topic, including solving it with neural networks.

Both of papers I have read on this topic used convolutional neural network \cite{Move-prediction-CNN} and \cite{Focused-CNN}. In both of the papers padded the game board with extra cells, so the board was not getting smaller than it already was when applying the convolution filters.

Both papers I have read about neural networks in PNS, used convolutional neural network \cite{Move-prediction-CNN} and \cite{Focused-CNN}. In both papers they padded the game board with extra cells, so the board was not getting smaller than it already was when applying the convolution filters.

They introduced two ways how to use neural networks in PNS. The policy neural network tries to predict the probability of playing for each possible move. Which dictates the order in which are children nodes examined. The second value network was used only in \cite{Focused-CNN}, it predicts the proof and disproof numbers, adjusting which node will become MPN.

They introduced two ways how to use neural networks in PNS. The policy neural network tries to predict the probability of playing for each possible move, which dictates the order in which child nodes are examined. The value network was used only in \cite{Focused-CNN}, it predicts the proof and disproof numbers, adjusting which node will become MPN.

Also both papers gave the neural networks more information than just whether each position is black stone or white stone or empty, but identified in which type of position the stone was. Also they did not use pooling, since it removes information about the position.

In addition, both papers gave the neural networks more information than just whether each position is either a black stone or a white stone or it is empty. They also identified in which type of position the stone was. Moreover, they did not use pooling, since it removes information about the position.

They trained the nets on games played with the state of the art solvers (MoHex 2.0 and Wolve), since they are stronger than most human players.

...

...

@@ -126,25 +126,25 @@ There was also another approach to solving Hex using Q learning \cite{Neurohex},

\section{My work so far}

My work to this day was to read about game search, particularly PNS and ways to improve them using machine learning. Currently the best approach seems to be using neural networks.

My work to this day was to read about game searches, particularly PNS, and ways to improve them using machine learning. Currently, the best approach seems to be using neural networks.

I also made a simple demo of a proof number search (I was not able to add machine learning yet).

\subsection{Current demo}

In the demo I have implemented algorithms PNS and DFPN, which I am planning to improve using neural networks. It is done for Qubic, because it is a little bit simpler than Connect four 3D and it is easier to verify, since it is a solved game. But it is very similar to Qubic and could be changed to it quite easily.

In the demo, I have implemented the algorithms PNS and DFPN, which I am planning to improve using neural networks. It is done with Qubic, because it is a little simpler than Connect Four 3D and it is easier to verify, since it is a solved game. However, it is very similar to Qubic and could be changed to it quite easily.

There is also a script \texttt{GameTest.py} for measuring the performance of the algorithms which is based on two metrics - the time it took to solve the current position and how much nodes was created in the process. A simple modification to the order of the moves can be seen quite significantly int these metrics.

There is also a script \texttt{GameTest.py} for measuring the performance of the algorithms, which is based on two metrics - the time it takes to solve the current position and how many nodes were created in the process. A simple modification to the order of the moves can be seen quite significantly in these metrics.

\section{Data and Use of neural networks}

Since there is not any strong available solver, I have to generate my own data. The first data will be evaluated by PNS. Since PNS alone probably cannot solve the game with current power of computers, only games that are close to end are able to generate this way. After training the first version of neural network, hopefully it could be used in the PNS to evaluate harder positions.

Since there is not any powerful available solver, I have to generate my own data. The first data will be evaluated by PNS. Since PNS alone probably cannot solve the game with the current power of computers, only games that are close to the end of the game are able to be evaluated this way. After training the first version of neural network, hopefully it could be used in the PNS to evaluate harder positions.

The heuristic to predict is most likely how many plies are needed for the win of the current game, if the game is lost then it will be infinity. This can be easily generated and the children nodes of a position can be easily sorted from least moves needed.

This heuristic can be also used for other searches like MinMax used in \cite{Romen}.

The heuristic to predict is most likely how many plies are needed for the win of the current game, if the game is lost, then it will be infinity. This can be easily calculated and the child nodes of a position can be easily sorted by least moves needed.

This heuristic can be also used for other searches like MinMax used in \cite{romen}.

Unfortunately the CNNs may not be ideal for this game since, the board is so small, but that might enable to use fully connected layers more.

Unfortunately, the CNNs may not be ideal for this game, since the board is so small, but that might enable to use fully connected layers more.

To get more data, it is possible to use the symmetries of the game and by randomly rotating and mirroring and getting up to 8 times more data.