Adapted for the Simple Pair Game
Each node's win count (w) represents wins for the player who made the move to reach that node.
Player 1's turn. 4 possible moves: positions 0, 1, 2, or 3
As iterations continue, MCTS will:
Win rate. Favors moves that have performed well in past simulations.
Exploration bonus. Grows as parent increases but child ni stays low, encouraging exploration of rarely-visited nodes.
Select the child with the highest visit count.
Why it works: Visit counts are stable, and UCB1 naturally directs visits to promising moves. A highly-visited node has been thoroughly explored.
Select the child with the highest win rate.
Caution: Can be unreliable for low-visit nodes. A node with v=1, w=1 has 100% win rate but very limited information.
Each node tracks wins from its player's perspective. During backpropagation, we flip the result (1-result) at each level. This implements minimax thinking: what's good for P1 is bad for P2!