An idea I have been playing with for a while is an AI that can counter you while playing TicTacToe. The idea would be to avoid hard coding any sort of response or move sequence. The only moves that are recorded are the ones that led to a victory for either the player or computer. Early on the AI is far from perfect, but it can force ties and even win if you are not paying attention.

While I will not provide source code for this project, I will tell you how the program is designed. The model utilizes what is known as Upper Bound Confidence. Basically, it determines a confidence range applicable for each potential move. Since this is how the analysis is performed it is important to record only "successful moves" in the data set. As a result, the only time the dataset is updated is when a win occurs, and only the winning moves are recorded. In the case of a tiebreaker (winning in 2+ ways) the more simple victory is recorded. This sort of Machine Learning is common for advertising where A/B testing can help you maximize your earnings. That's why you see a lot of 1s and 0s in the dataset because each "hit" is a 1, and each "miss" is a 0. Weirdly enough, since I'm not really a mathematician I think that was the most straightforward part of this application.

Now, if you take that Machine A-Z course and try to use their reinforced learning algorithm you would be confused and run into errors probably like I did. You need to either stick the entire algorithm into a function and return the "sums of rewards" list to iterate through for the move choice, or make your own list that sums the amount of times each individual element appears in the "ad selection" equivalent list. Neither solution is necessarily as pretty as the graph you get from the "ad selection" list itself, but they both work for this purpose. The "ad selection" equivalent list itself will not work for this purpose, but is the list you want if you want to use if you're trying to graph the relationships.

For me, one of the harder parts was determining the legal rule set, and finding a way to ensure the computer made a legal move each time. It was especially difficult when moves were determined randomly because an infinite loop was technically possible, and definitely not desired. However, one trick I'm happy about is that I managed to keep the structure of the board as a 1-D array, and use some trickery to make it look like a grid instead of using a 2-D array. It made input validation a lot easier once I did implement this algorithm instead of a random number generator. All I had to do once I implemented the algorithm was tell it to find the most commonly used, but currently unused, square and play that square. If it hits a square in play, it simply skips that option each time. It takes a little longer, but it also requires less code than removing previously played options. Of course, without maintaining the data set none of this work matters.

I violated one of the important principles of "Clean Code" by using a function to perform two purposes. The same function that checks to see if the game has been won by either side also records the winning sequence to file. Instead of keeping track of each individual move, I check each iteration of the loop, and if one of the conditions is satisfied the binary representation of that combination is written to file. While you can watch the operation occur in the video, in reality it would be an invisible operation.

code to force legal moves from the computer

All-in-all, the file is only about 150 lines of code. There are some bugs that I have not ironed out, for example it does not validate user input. This was designed more as a proof of concept than something I intended for release so that's fine for me. I have spoken about it and some people have said it'd be cool if I hosted it as an app, but this was primarily meant to be a proof of concept for me. There's a lot of different things I can do with this application since I was right. As long as I can express a game's rule set with math and logic, as well as easily record the successful moves that led to victory, and prevent illegal moves from occurring then replicating this success will not be a problem. I would rather build on this idea instead of perfecting it because there's so much that can be done still.

Anyway, here is an updated video showing different behavior from the algorithm. Keep in mind I've only recorded about 260 wins at this point. Plus, I did not verify user input like I should have prior to the video. Nonetheless, it starts out on a different square and aims for a different type of victory. I'm definitely going to build on this, and use it for more useful stuff in the future. I just needed a quick proof of concept.