Tic-Tac-AI: A Strong Tic-Tac-Toe AI Opponent using Forward Sampling

You want to build your own Tic-Tac-Toe opponent? Then you need to read further! In the Tic-Tac-AI series, I will present a couple of Artificial Intelligence algorithms implemented as Tic-Tac-Toe opponent. In this first article, I will introduce a method called Forward Sampling which is capable of not losing any game of Tic-Tac-Toe!

The repository for this article is found on GitHub.

Introduction

Tic-Tac-Toe is a very basic game and therefore extremely helpful to display the most elegant Artificial Intelligence algorithms. In this article, I will explain Forward Sampling implemented in a Tic-Tac-Toe player for you. Maybe you are interested in implementing Tic-Tac-Toe in the form of a web app? Then definitely read this article!

Forward Sampling!

Here, we will model the winning probability for our smart player by $latex p_{win}$. Let $latex S$ denote the current state of the game (that is, the tiles we have picked and the tiles the opponent has picked).  So, what is our probability to win? In order to simulate this, we pick a valid random action $latex a_1$ (a tile that is not occupied yet) and we evaluate our winning probability given the action we just picked ($latex p_{win}(a_1)$). Now, our opponent can pick any tile. The opponent also picks a random valid next action and denote this by $latex a_2$. Now, it is our turn again. So, eventually, we end up calculating $latex p_{win}(a_1, a_2, …, a_n)$.

At any point in the game, $latex S$ consists of some consecutive actions, say $latex S=a_1, …, a_s$. What is the next move we need to make in order to win the game? That is, what valid action $latex k$ should we take such that $latex p_{win}(S, k)$ is maximal? Let $latex T$ be the remainder of the game. That is, the set of actions which were picked to finish the game ($latex T=b_1, …, b_t$). Notice that there are many possibilities for $latex T$ given $latex S$ and $latex k$. With our Forward Sampling procedure, we pick many different options for $latex T$ and check for every valid $latex k$ $latex p_{win}(S, k, T)$. For example, suppose that we are in the following state (and we are O and our opponent is X):

If we do not pick the bottom-right cell, we will lose! Thus, if $latex k$ does not equal the bottom-right cell, there are only a few options for $latex T$ in which we will not lose. Our procedure will therefore pick the bottom-right cell, since this has the highest winning probability.

Implementation

In this section, I will give the code for the Forward Sampling player:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
class SamplingPlayer(Player):
 A Player based on forward sampling.
 """
 GAME_TIE = 0
 GAME_WIN = 1
 GAME_LOSE = 2
 num_samples = 100

 A player based on forward sampling.
 """
 def make_action(self, my_tiles, opponent_tiles, free_tiles):
 # Keep track of the number of wins
 num_wins = {}

 # Loop through all free tiles
 for free_tile in free_tiles:
 wins = 0

 # Now simulate a game with two random players from the current state on where you choose the free tile and
 # see who will win
 for _ in range(self.num_samples):
 my_tiles_copy = [tile for tile in my_tiles]
 opponent_tiles_tiles_copy = [tile for tile in opponent_tiles]
 sample = self.sample_game(True, free_tile, my_tiles_copy, opponent_tiles_tiles_copy)

 # A tie or a win is good enough
 if sample == self.GAME_TIE or sample == self.GAME_WIN:
 wins += 1
 num_wins[free_tile] = wins

 # Check which move has the highest expected number of wins
 best_score, best_item = None, None
 for item, score in num_wins.items():
 if best_score is None or score > best_score:
 best_score = score
 best_item = item

 # Give back this action
 return best_item

 def sample_game(self, is_my_turn, action, my_tiles, opponent_tiles):
 """
 Run a game with two random players.

 :param is_my_turn: Whether or not it is my turn (True or False).
 :param action: The next action to take.
 :param my_tiles: The tiles belonging to me (list of integers 0-8).
 :param opponent_tiles: The tiles belonging to the opponent (list of integers 0-8).
 :return: GAME_TIE when a tie happened, GAME_WIN when a win happened and GAME_LOSE when you loose the game.
 """

 # Calculate which tiles are still left
 free_tiles = [tile for tile in range(9) if
 tile not in my_tiles and tile not in opponent_tiles and tile != action]

 # Add the action to either my tiles or the opponent tiles depending on who's turn it is
 if is_my_turn:
 my_tiles += [action]
 else:
 opponent_tiles += [action]

 # When there are no tiles left, it is a tie
 if len(free_tiles) == 0:
 return self.GAME_TIE
 else:
 # Otherwise, I win when it is my turn and my tiles contain a winning combination
 # If they don't contain a winning combination, the game will continue
 # The same reasoning applies when it is not my turn
 if is_my_turn:
 if Game.check_win(my_tiles):
 return self.GAME_WIN
 else:
 random.shuffle(free_tiles)
 return self.sample_game(False, free_tiles[0], my_tiles, opponent_tiles)
 else:
 if Game.check_win(opponent_tiles):
 return self.GAME_LOSE
 else:
 random.shuffle(free_tiles)
 return self.sample_game(True, free_tiles[0], my_tiles, opponent_tiles)

SamplingPlayer

The tiles (my_tiles, opponent_tiles, free_tiles) are lists of integers where 0 is the left-top cell and 8 is the bottom-right cell. The make_action method, returns an action (any free tile) based on the described sampling procedure. Every time, 100 samples are generated (that is, 100 different options for $latex T$ are computed) and then the best tile is picked.

Conclusion

Forward sampling is very effective (and does not require any learning). Can you beat the algorithm? If you have any questions or suggestions, feel free to comment below this post!