{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 6) Using Gambit with OpenSpiel\n", "\n", "This tutorial demonstrates the interoperability of the Gambit and OpenSpiel Python packages for game-theoretic analysis.\n", "\n", "Where Gambit is used to compute exact equilibria for games, OpenSpiel provides a variety of iterative learning algorithms that can be used to approximate strategies. Another key distinction is that the PyGambit API allows the user a simple way to define custom games (see tutorials 1-3). This is also possible in OpenSpiel for normal form games, and you can load `.efg` files created from Gambit for extensive form, however some of the key functionality for iterated learning of strategies is only available for games from the built-in library (see the [OpenSpiel documentation](https://openspiel.readthedocs.io/en/latest/games.html)).\n", "\n", "This tutorial demonstrates:\n", "\n", "1. Transferring examples of normal (strategic) form and extensive form games between OpenSpiel and Gambit\n", "2. Simulating evolutionary dynamics of populations of strategies in OpenSpiel for normal form games\n", "3. Training agents using self-play of extensive form games in OpenSpiel to create strategies\n", "4. Comparing the strategies from OpenSpiel against equilibria strategies computed with Gambit\n", "\n", "Note:\n", "- The version of OpenSpiel used in this tutorial is `1.6.1`. If you are running this tutorial locally, this will be the version installed via the included `requirements.txt` file.\n", "- The OpenSpiel code was adapted from the introductory tutorial for the OpenSpiel API on colab [here](https://colab.research.google.com/github/deepmind/open_spiel/blob/master/open_spiel/colabs/OpenSpielTutorial.ipynb)." ] }, { "cell_type": "code", "execution_count": 1, "id": "ebb78322", "metadata": {}, "outputs": [], "source": [ "from io import StringIO\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "from open_spiel.python import rl_environment\n", "from open_spiel.python.algorithms import tabular_qlearner\n", "from open_spiel.python.algorithms.gambit import export_gambit\n", "from open_spiel.python.egt import dynamics\n", "from open_spiel.python.egt.utils import game_payoffs_array\n", "\n", "import pyspiel\n", "\n", "import pygambit as gbt" ] }, { "cell_type": "markdown", "id": "fd324814", "metadata": {}, "source": [ "## OpenSpiel game library\n", "\n", "The [library of games](https://openspiel.readthedocs.io/en/latest/games.html) included in OpenSpiel is extensive. Many of these games will not be amenable to equilibrium computation with Gambit, due to their size. For the purposes of this tutorial, we'll pick small games from the list below." ] }, { "cell_type": "code", "execution_count": 2, "id": "b3eb3671", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['2048', 'add_noise', 'amazons', 'backgammon', 'bargaining', 'battleship', 'blackjack', 'blotto', 'breakthrough', 'bridge', 'bridge_uncontested_bidding', 'cached_tree', 'catch', 'checkers', 'chess', 'cliff_walking', 'clobber', 'coin_game', 'colored_trails', 'connect_four', 'coop_box_pushing', 'coop_to_1p', 'coordinated_mp', 'crazy_eights', 'cribbage', 'cursor_go', 'dark_chess', 'dark_hex', 'dark_hex_ir', 'deep_sea', 'dots_and_boxes', 'dou_dizhu', 'efg_game', 'einstein_wurfelt_nicht', 'euchre', 'first_sealed_auction', 'gin_rummy', 'go', 'goofspiel', 'hanabi', 'havannah', 'hearts', 'hex', 'hive', 'kriegspiel', 'kuhn_poker', 'laser_tag', 'leduc_poker', 'lewis_signaling', 'liars_dice', 'liars_dice_ir', 'lines_of_action', 'maedn', 'mancala', 'markov_soccer', 'matching_pennies_3p', 'matrix_bos', 'matrix_brps', 'matrix_cd', 'matrix_coordination', 'matrix_mp', 'matrix_pd', 'matrix_rps', 'matrix_rpsw', 'matrix_sh', 'matrix_shapleys_game', 'mfg_crowd_modelling', 'mfg_crowd_modelling_2d', 'mfg_dynamic_routing', 'mfg_garnet', 'misere', 'mnk', 'morpion_solitaire', 'negotiation', 'nfg_game', 'nim', 'nine_mens_morris', 'normal_form_extensive_game', 'oh_hell', 'oshi_zumo', 'othello', 'oware', 'pathfinding', 'pentago', 'phantom_go', 'phantom_ttt', 'phantom_ttt_ir', 'pig', 'quoridor', 'rbc', 'repeated_game', 'restricted_nash_response', 'sheriff', 'skat', 'solitaire', 'spades', 'start_at', 'stones_and_gems', 'tarok', 'tic_tac_toe', 'tiny_bridge_2p', 'tiny_bridge_4p', 'tiny_hanabi', 'trade_comm', 'turn_based_simultaneous_game', 'twixt', 'ultimate_tic_tac_toe', 'universal_poker', 'y', 'zerosum']\n" ] } ], "source": [ "print(pyspiel.registered_names())" ] }, { "cell_type": "markdown", "id": "e628a86d", "metadata": {}, "source": [ "## Normal form games from the OpenSpiel library\n", "\n", "Let's start with a simple normal form game of rock-paper-scissors, in which the payoffs can be represented by a 3x3 matrix.\n", "\n", "Load matrix rock-paper-scissors from OpenSpiel:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "ops_matrix_rps_game = pyspiel.load_game(\"matrix_rps\")" ] }, { "cell_type": "markdown", "id": "fda1204e", "metadata": {}, "source": [ "In order to simulate a playthrough of the game, you can first initialise a game state:" ] }, { "cell_type": "code", "execution_count": 4, "id": "1bcdb97b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Terminal? false\n", "Row actions: Rock Paper Scissors \n", "Col actions: Rock Paper Scissors \n", "Utility matrix:\n", "0,0 -1,1 1,-1 \n", "1,-1 0,0 -1,1 \n", "-1,1 1,-1 0,0 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state = ops_matrix_rps_game.new_initial_state()\n", "state" ] }, { "cell_type": "markdown", "id": "eeee015a", "metadata": {}, "source": [ "The possible actions for both players (player 0 and player 1) are Rock, Paper and Scissors, but these are not labelled and must be accessed via integer indices:" ] }, { "cell_type": "code", "execution_count": 5, "id": "70575dc7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 1, 2]\n", "[0, 1, 2]\n" ] } ], "source": [ "print(state.legal_actions(0)) # Player 0 (row) actions\n", "print(state.legal_actions(1)) # Player 1 (column) actions" ] }, { "cell_type": "markdown", "id": "fdea7e5b", "metadata": {}, "source": [ "Since Rock-paper-scissors is a 1-step simultaneous-move normal form game, we'll apply a list of player actions in one step to reach the terminal state.\n", "\n", "Let's simulate player 0 playing Rock (0) and player 1 playing Paper (1):" ] }, { "cell_type": "code", "execution_count": 6, "id": "a532321e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Terminal? true\n", "History: 0, 1\n", "Returns: -1,1\n", "Row actions: \n", "Col actions: \n", "Utility matrix:\n", "0,0 -1,1 1,-1 \n", "1,-1 0,0 -1,1 \n", "-1,1 1,-1 0,0 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state.apply_actions([0, 1])\n", "state" ] }, { "cell_type": "markdown", "id": "045cf8dd", "metadata": {}, "source": [ "OpenSpiel can generate an NFG representation of the game loadable in Gambit:" ] }, { "cell_type": "code", "execution_count": 7, "id": "f5fa4e42", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'NFG 1 R \"OpenSpiel export of matrix_rps()\"\\n{ \"Player 0\" \"Player 1\" } { 3 3 }\\n\\n0 0\\n1 -1\\n-1 1\\n-1 1\\n0 0\\n1 -1\\n1 -1\\n-1 1\\n0 0\\n'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nfg_matrix_rps_game = pyspiel.game_to_nfg_string(ops_matrix_rps_game)\n", "nfg_matrix_rps_game" ] }, { "cell_type": "markdown", "id": "70d1df64", "metadata": {}, "source": [ "Now let's load the NFG in Gambit. Since Gambit's `read_nfg` function expects a file like object, we'll convert the string with `io.StringIO`.\n", "We can also add labels for the actions to make the output more interpretable." ] }, { "cell_type": "code", "execution_count": 8, "id": "b684325e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Rock-Paper-Scissors

\n", "
RockPaperScissors
Rock0,0-1,11,-1
Paper1,-10,0-1,1
Scissors-1,11,-10,0
\n" ], "text/plain": [ "Game(title='Rock-Paper-Scissors')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gbt_matrix_rps_game = gbt.read_nfg(StringIO(nfg_matrix_rps_game))\n", "\n", "gbt_matrix_rps_game.title = \"Rock-Paper-Scissors\"\n", "\n", "for player in gbt_matrix_rps_game.players:\n", " player.strategies[0].label = \"Rock\"\n", " player.strategies[1].label = \"Paper\"\n", " player.strategies[2].label = \"Scissors\"\n", "\n", "gbt_matrix_rps_game" ] }, { "cell_type": "markdown", "id": "6d7da6f3", "metadata": {}, "source": [ "The equilibrium mixed strategy profile for both players is to choose rock, paper, and scissors with equal probability:" ] }, { "cell_type": "code", "execution_count": 9, "id": "707c6c30", "metadata": {}, "outputs": [ { "data": { "text/latex": [ "$\\left[\\left[\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right],\\left[\\frac{1}{3},\\frac{1}{3},\\frac{1}{3}\\right]\\right]$" ], "text/plain": [ "[[Rational(1, 3), Rational(1, 3), Rational(1, 3)], [Rational(1, 3), Rational(1, 3), Rational(1, 3)]]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gbt.nash.lcp_solve(gbt_matrix_rps_game).equilibria[0]" ] }, { "cell_type": "markdown", "id": "966e7e3f", "metadata": {}, "source": [ "We can use OpenSpiel's dynamics module to demonstrate evolutionary game theory dynamics, or \"replicator dynamics\", which models how a mixed strategy profile evolves over time based on how the strategies (e.g., choice of actions A, B, C with probabilities X, Y, Z) perform against one another.\n", "\n", "Let's start with an initial profile that is not at equilibrium, but weighted towards scissors with proportions: 30% Rock, 30% Paper, 40% Scissors:" ] }, { "cell_type": "code", "execution_count": 10, "id": "cf1acdeb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.03, -0.03, 0. ])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "matrix_rps_payoffs = game_payoffs_array(ops_matrix_rps_game)\n", "dyn = dynamics.SinglePopulationDynamics(matrix_rps_payoffs, dynamics.replicator)\n", "x = np.array([0.3, 0.3, 0.4])\n", "dyn(x)" ] }, { "cell_type": "markdown", "id": "fa382753", "metadata": {}, "source": [ "`dyn(x)` calculates the rate of change (derivative) for each strategy in the current profile and returns how fast each strategy's frequency is changing.\n", "\n", "In replicator dynamics, a strategy that performs well against others will increase in frequency, while strategies performing worse will decrease.\n", "In our rock-paper-scissors example, the performance of each strategy depends on the probability it is assigned in the mixed strategy profile. At the start, whilst there are more players choosing scissors as their action, then rock will perform well and increase in frequency (be more likely to get played in subsequent rounds), while paper will perform poorly and decrease in frequency. We can plot how the frequency of each strategy changes over time:" ] }, { "cell_type": "code", "execution_count": 11, "id": "b9a352c5", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def plot_rps_dynamics(proportions, steps=100, alpha=0.1, plot_average_strategy=False):\n", " x = np.array(proportions)\n", " rock_proportions = [x[0]]\n", " paper_proportions = [x[1]]\n", " scissors_proportions = [x[2]]\n", " y = []\n", " for _ in range(steps):\n", " x += alpha * dyn(x)\n", " rock_proportions.append(x[0])\n", " paper_proportions.append(x[1])\n", " scissors_proportions.append(x[2])\n", " if plot_average_strategy:\n", " y.append([np.mean(rock_proportions), np.mean(paper_proportions), np.mean(scissors_proportions)])\n", " else:\n", " y.append(x.copy())\n", " y = np.array(y)\n", "\n", " plt.plot(y[:, 0], label=\"Rock\")\n", " plt.plot(y[:, 1], label=\"Paper\")\n", " plt.plot(y[:, 2], label=\"Scissors\")\n", " plt.xlabel(\"Time step\")\n", " if plot_average_strategy:\n", " plt.ylabel(\"Strategy frequency average up to time step\")\n", " else:\n", " plt.ylabel(\"Strategy frequency\")\n", " plt.legend()\n", " plt.show()\n", "\n", "plot_rps_dynamics([0.3, 0.3, 0.4])" ] }, { "cell_type": "markdown", "id": "8569aef4", "metadata": {}, "source": [ "Through the dynamics, we can see that the population proportions oscillate around the equilibrium point (1/3, 1/3, 1/3) without converging to it, because the best strategy depends on the likelihood of the opponents' actions, as defined by the current action probabilities.\n", "\n", "However, if we start with the initial population already at the equilibrium mixed strategy profile computed by Gambit (each action is chosen exactly 1/3 of the time), the strategy frequencies will remain constant over time (at the equilibrium point):" ] }, { "cell_type": "code", "execution_count": 12, "id": "86c6aa52", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_rps_dynamics([1/3, 1/3, 1/3])" ] }, { "cell_type": "markdown", "id": "a1f6662e", "metadata": {}, "source": [ "When starting from an unbalanced initial mixed strategy profile, the strategy frequencies will oscillate around the equilibrium point without converging to it. However, if we plot the average strategy frequencies over time, we can see that this begins to converge to the equilibrium point:" ] }, { "cell_type": "code", "execution_count": 13, "id": "189f898f", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_rps_dynamics([0.3, 0.3, 0.4], plot_average_strategy=True)" ] }, { "cell_type": "markdown", "id": "078a21e0", "metadata": {}, "source": [ "## Normal form games created with Gambit\n", "\n", "You can also set up a normal form game in Gambit and export it to OpenSpiel. Here we demonstrate this with the Prisoner's Dilemma game from tutorial 1." ] }, { "cell_type": "code", "execution_count": 14, "id": "cdd0bfe0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "

Prisoner's Dilemma

\n", "
CooperateDefect
Cooperate-1,-1-3,0
Defect0,-3-2,-2
\n" ], "text/plain": [ "Game(title='Prisoner's Dilemma')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gbt_prisoners_dilemma_game = gbt.read_nfg(\"games/prisoners_dilemma.nfg\")\n", "gbt_prisoners_dilemma_game" ] }, { "cell_type": "code", "execution_count": 15, "id": "d42e6545", "metadata": {}, "outputs": [ { "data": { "text/latex": [ "$\\left[\\left[0,1\\right],\\left[0,1\\right]\\right]$" ], "text/plain": [ "[[Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1)]]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gbt.nash.lcp_solve(gbt_prisoners_dilemma_game).equilibria[0]" ] }, { "cell_type": "markdown", "id": "15dd432d", "metadata": {}, "source": [ "As expected, Gambit computes the equilibrium strategy for both players as choosing cooperate with probability 0 and defect with probability 1.\n", "\n", "To re-create the game in OpenSpiel we extract the player payoffs to NumPy arrays, which are then used to create a matrix game in OpenSpiel:" ] }, { "cell_type": "code", "execution_count": 16, "id": "fcd42af0", "metadata": {}, "outputs": [], "source": [ "p1_payoffs, p2_payoffs = gbt_prisoners_dilemma_game.to_arrays(dtype=float)\n", "ops_prisoners_dilemma_game = pyspiel.create_matrix_game(\n", " gbt_prisoners_dilemma_game.title,\n", " \"Classic Prisoner's Dilemma\", # description\n", " [strategy.label for strategy in gbt_prisoners_dilemma_game.players[0].strategies],\n", " [strategy.label for strategy in gbt_prisoners_dilemma_game.players[1].strategies],\n", " p1_payoffs,\n", " p2_payoffs\n", ")" ] }, { "cell_type": "markdown", "id": "625a35a4", "metadata": {}, "source": [ "Like rock-paper-scissors, the Prisoner's Dilemma is a 1-step simultaneous-move normal form game; we'll apply a list of player actions in one step to reach the terminal state. Let's have both player choose to defect (1):" ] }, { "cell_type": "code", "execution_count": 17, "id": "7ce6f2e2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Terminal? true\n", "History: 1, 1\n", "Returns: -2,-2\n", "Row actions: \n", "Col actions: \n", "Utility matrix:\n", "-1,-1 -3,0 \n", "0,-3 -2,-2 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state = ops_prisoners_dilemma_game.new_initial_state()\n", "state.apply_actions([1, 1])\n", "state" ] }, { "cell_type": "markdown", "id": "1fea0224", "metadata": {}, "source": [ "Unlike in rock-paper-scissors, the Prisoner's Dilemma has a dominant strategy equilibrium, in which both players defect.\n", "Using evolutionary dynamics, we can see that a population starting with a mix of cooperators and defectors will evolve towards all defectors over time:" ] }, { "cell_type": "code", "execution_count": 18, "id": "d1495c7c", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matrix_pd_payoffs = game_payoffs_array(ops_prisoners_dilemma_game)\n", "pd_dyn = dynamics.SinglePopulationDynamics(matrix_pd_payoffs, dynamics.replicator)\n", "\n", "def plot_pd_dynamics(proportions, steps=100, alpha=0.1):\n", " x = np.array(proportions)\n", " y = []\n", " for _ in range(steps):\n", " x += alpha * pd_dyn(x)\n", " y.append(x.copy())\n", " y = np.array(y)\n", " plt.plot(y[:, 0], label=\"Cooperate\")\n", " plt.plot(y[:, 1], label=\"Defect\")\n", " plt.xlabel(\"Time step\")\n", " plt.ylabel(\"Frequency\")\n", " plt.legend()\n", " plt.show()\n", "\n", "plot_pd_dynamics([0.8, 0.2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "" ] }, { "cell_type": "markdown", "id": "b12f6330", "metadata": {}, "source": [ "## Extensive form games from the OpenSpiel library\n", "\n", "For extensive form games, OpenSpiel can export to the EFG format used by Gambit. Here we demonstrate this with **Tiny Hanabi**, loaded from the OpenSpiel [game library](https://openspiel.readthedocs.io/en/latest/games.html)." ] }, { "cell_type": "code", "execution_count": 19, "id": "02a42600", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'EFG 2 R \"tiny_hanabi()\" { \"Pl0\" \"Pl1\" } \\nc \"\" 1 \"\" { \"d0\" 0.5000000000000000 \"d1\" 0.5000000000000000 } 0\\n c \"p0:d0\" 2 \"\" { \"d0\" 0.5000000000000000 \"d1\" 0.5000000000000000 } 0\\n p \"\" 1 1 \"\" { \"p0a0\" \"p0a1\" \"p0a2\" } 0\\n p \"\" 2 1 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 1 \"\" { 10.0 10.0 }\\n t \"\" 2 \"\" { 0.0 0.0 }\\n t \"\" 3 \"\" { 0.0 0.0 }\\n p \"\" 2 2 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 4 \"\" { 4.0 4.0 }\\n t \"\" 5 \"\" { 8.0 8.0 }\\n t \"\" 6 \"\" { 4.0 4.0 }\\n p \"\" 2 3 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 7 \"\" { 10.0 10.0 }\\n t \"\" 8 \"\" { 0.0 0.0 }\\n t \"\" 9 \"\" { 0.0 0.0 }\\n p \"\" 1 1 \"\" { \"p0a0\" \"p0a1\" \"p0a2\" } 0\\n p \"\" 2 4 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 10 \"\" { 0.0 0.0 }\\n t \"\" 11 \"\" { 0.0 0.0 }\\n t \"\" 12 \"\" { 10.0 10.0 }\\n p \"\" 2 5 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 13 \"\" { 4.0 4.0 }\\n t \"\" 14 \"\" { 8.0 8.0 }\\n t \"\" 15 \"\" { 4.0 4.0 }\\n p \"\" 2 6 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 16 \"\" { 0.0 0.0 }\\n t \"\" 17 \"\" { 0.0 0.0 }\\n t \"\" 18 \"\" { 10.0 10.0 }\\n c \"p0:d1\" 3 \"\" { \"d0\" 0.5000000000000000 \"d1\" 0.5000000000000000 } 0\\n p \"\" 1 2 \"\" { \"p0a0\" \"p0a1\" \"p0a2\" } 0\\n p \"\" 2 1 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 19 \"\" { 0.0 0.0 }\\n t \"\" 20 \"\" { 0.0 0.0 }\\n t \"\" 21 \"\" { 10.0 10.0 }\\n p \"\" 2 2 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 22 \"\" { 4.0 4.0 }\\n t \"\" 23 \"\" { 8.0 8.0 }\\n t \"\" 24 \"\" { 4.0 4.0 }\\n p \"\" 2 3 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 25 \"\" { 0.0 0.0 }\\n t \"\" 26 \"\" { 0.0 0.0 }\\n t \"\" 27 \"\" { 0.0 0.0 }\\n p \"\" 1 2 \"\" { \"p0a0\" \"p0a1\" \"p0a2\" } 0\\n p \"\" 2 4 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 28 \"\" { 10.0 10.0 }\\n t \"\" 29 \"\" { 0.0 0.0 }\\n t \"\" 30 \"\" { 0.0 0.0 }\\n p \"\" 2 5 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 31 \"\" { 4.0 4.0 }\\n t \"\" 32 \"\" { 8.0 8.0 }\\n t \"\" 33 \"\" { 4.0 4.0 }\\n p \"\" 2 6 \"\" { \"p1a0\" \"p1a1\" \"p1a2\" } 0\\n t \"\" 34 \"\" { 10.0 10.0 }\\n t \"\" 35 \"\" { 0.0 0.0 }\\n t \"\" 36 \"\" { 0.0 0.0 }\\n'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ops_hanabi_game = pyspiel.load_game(\"tiny_hanabi\")\n", "efg_hanabi_game = export_gambit(ops_hanabi_game)\n", "efg_hanabi_game" ] }, { "cell_type": "markdown", "id": "fa354c9f", "metadata": {}, "source": [ "Now let's load the EFG in Gambit.\n", "We can then compute equilibria strategies for the players as usual." ] }, { "cell_type": "code", "execution_count": 20, "id": "1a534e25", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pl0\n", "Pl1\n" ] } ], "source": [ "gbt_hanabi_game = gbt.read_efg(StringIO(efg_hanabi_game))\n", "eqm = gbt.nash.lcp_solve(gbt_hanabi_game).equilibria[0]\n", "for player in gbt_hanabi_game.players:\n", " print(player.label)" ] }, { "cell_type": "markdown", "id": "cdfe924e", "metadata": {}, "source": [ "We can look at player 0's equilibrium strategy:" ] }, { "cell_type": "code", "execution_count": 21, "id": "1ec19b1c", "metadata": {}, "outputs": [ { "data": { "text/latex": [ "$\\left[\\left[0,0,1\\right],\\left[0,1,0\\right]\\right]$" ], "text/plain": [ "[[Rational(0, 1), Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1), Rational(0, 1)]]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eqm['Pl0']" ] }, { "cell_type": "markdown", "id": "b54411c0", "metadata": {}, "source": [ "...and use Gambit to explore what those numbers actually mean for player 0:" ] }, { "cell_type": "code", "execution_count": 22, "id": "ae9fc7a7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "At information set 0, Player 0 plays action 0 with probability: 0 and action 1 with probability: 0 and action 2 with probability: 1\n", "At information set 1, Player 0 plays action 0 with probability: 0 and action 1 with probability: 1 and action 2 with probability: 0\n" ] } ], "source": [ "for infoset, mixed_action in eqm[\"Pl0\"].mixed_actions():\n", " print(\n", " f\"At information set {infoset.number}, \"\n", " f\"Player 0 plays action 0 with probability: {mixed_action['p0a0']}\"\n", " f\" and action 1 with probability: {mixed_action['p0a1']}\"\n", " f\" and action 2 with probability: {mixed_action['p0a2']}\"\n", " )" ] }, { "cell_type": "markdown", "id": "eac73a24", "metadata": {}, "source": [ "For player 1, we can do the same:" ] }, { "cell_type": "code", "execution_count": 23, "id": "8528e1bd", "metadata": {}, "outputs": [ { "data": { "text/latex": [ "$\\left[\\left[0,0,1\\right],\\left[0,1,0\\right],\\left[1,0,0\\right],\\left[0,0,1\\right],\\left[0,1,0\\right],\\left[0,0,1\\right]\\right]$" ], "text/plain": [ "[[Rational(0, 1), Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1), Rational(0, 1)], [Rational(1, 1), Rational(0, 1), Rational(0, 1)], [Rational(0, 1), Rational(0, 1), Rational(1, 1)], [Rational(0, 1), Rational(1, 1), Rational(0, 1)], [Rational(0, 1), Rational(0, 1), Rational(1, 1)]]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eqm['Pl1']" ] }, { "cell_type": "code", "execution_count": 24, "id": "2965aed0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "At information set 0, Player 1 plays action 0 with probability: 0 and action 1 with probability: 0 and action 2 with probability: 1\n", "At information set 1, Player 1 plays action 0 with probability: 0 and action 1 with probability: 1 and action 2 with probability: 0\n", "At information set 2, Player 1 plays action 0 with probability: 1 and action 1 with probability: 0 and action 2 with probability: 0\n", "At information set 3, Player 1 plays action 0 with probability: 0 and action 1 with probability: 0 and action 2 with probability: 1\n", "At information set 4, Player 1 plays action 0 with probability: 0 and action 1 with probability: 1 and action 2 with probability: 0\n", "At information set 5, Player 1 plays action 0 with probability: 0 and action 1 with probability: 0 and action 2 with probability: 1\n" ] } ], "source": [ "for infoset, mixed_action in eqm[\"Pl1\"].mixed_actions():\n", " print(\n", " f\"At information set {infoset.number}, \"\n", " f\"Player 1 plays action 0 with probability: {mixed_action['p1a0']}\"\n", " f\" and action 1 with probability: {mixed_action['p1a1']}\"\n", " f\" and action 2 with probability: {mixed_action['p1a2']}\"\n", " )" ] }, { "cell_type": "markdown", "id": "d628c0d5", "metadata": {}, "source": [ "Let's now train 2 agents using independent Q-learning on Tiny Hanabi, and play them against eachother.\n", "\n", "We can compare the learned strategies played to the equilibrium strategies computed by Gambit.\n", "\n", "First let's open the RL environment for Tiny Hanabi and create the agents, one for each player (2 players in this case):" ] }, { "cell_type": "code", "execution_count": 25, "id": "4e72c924", "metadata": {}, "outputs": [], "source": [ "# Create the environment\n", "env = rl_environment.Environment(\"tiny_hanabi\")\n", "num_players = env.num_players\n", "num_actions = env.action_spec()[\"num_actions\"]\n", "\n", "# Create the agents\n", "agents = [\n", " tabular_qlearner.QLearner(player_id=idx, num_actions=num_actions)\n", " for idx in range(num_players)\n", "]" ] }, { "cell_type": "markdown", "id": "4bf9eea4", "metadata": {}, "source": [ "Now we can train the Q-learning agents in self-play." ] }, { "cell_type": "code", "execution_count": 26, "id": "53547263", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Episodes: 0\n", "Episodes: 10000\n", "Episodes: 20000\n", "Episodes: 30000\n" ] } ], "source": [ "for cur_episode in range(30000):\n", " if cur_episode % 10000 == 0:\n", " print(f\"Episodes: {cur_episode}\")\n", "\n", " time_step = env.reset()\n", " while not time_step.last():\n", " player_id = time_step.observations[\"current_player\"]\n", " agent_output = agents[player_id].step(time_step)\n", " time_step = env.step([agent_output.action])\n", "\n", " # Episode is over, step all agents with final info state.\n", " for agent in agents:\n", " agent.step(time_step)\n", "\n", "print(f\"Episodes: {cur_episode+1}\")" ] }, { "cell_type": "markdown", "id": "75cddd36", "metadata": {}, "source": [ "Let's check out the strategies our agents have learned by playing them against eachother again, this time in evaluation mode (setting `is_evaluation=True`):" ] }, { "cell_type": "code", "execution_count": 27, "id": "d71bc733", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "p0:d0 p1:d1\n", "Agent 0 chooses p0a2\n", "\n", "p0:d0 p1:d1 p0:a2\n", "Agent 1 chooses p1a2\n", "\n", "p0:d0 p1:d1 p0:a2 p1:a2\n", "Rewards: [10.0, 10.0]\n" ] } ], "source": [ "time_step = env.reset()\n", "\n", "while not time_step.last():\n", " print(\"\")\n", " print(env.get_state)\n", "\n", " player_id = time_step.observations[\"current_player\"]\n", " agent_output = agents[player_id].step(time_step, is_evaluation=True)\n", " print(f\"Agent {player_id} chooses {env.get_state.action_to_string(agent_output.action)}\")\n", " time_step = env.step([agent_output.action])\n", "\n", "print(\"\")\n", "print(env.get_state)\n", "print(f\"Rewards: {time_step.rewards}\")" ] }, { "cell_type": "markdown", "id": "f1e9b174", "metadata": {}, "source": [ "Are the learned strategies chosen by p0 and p1 consistent with an equilibrium computed by Gambit?\n", "\n", "When I ran the above I got the final game state `p0:d0 p1:d0 p0:a2 p1:a0` with payoffs `[10.0, 10.0]`. This is consistent with the equilibrium computed by Gambit:\n", "- The node `p0:d0 p1:d0` is part of player 0's information set 0.\n", "- p0 picks a2 which matches the first equilibrium strategy in `eqm['Pl0']` where action `p0a2` is played with probability 1.0.\n", "- This puts player 1 in their information set 2, and player 1 picks action 0, which is consistent with `eqm['Pl1']` where action `p1a0` is played with probability 1.0." ] }, { "cell_type": "markdown", "id": "6f356383", "metadata": {}, "source": [ "## Extensive form games created with Gambit\n", "\n", "It's also possible to create an extensive form game in Gambit and export it to OpenSpiel. Here we demonstrate this with the one-card poker game introduced in tutorial 3." ] }, { "cell_type": "code", "execution_count": 28, "id": "07340e32", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "efg_game()" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open(\"../poker.efg\", \"r\") as f:\n", " poker_efg_string = f.read()\n", " ops_one_card_poker = pyspiel.load_efg_game(poker_efg_string)\n", "ops_one_card_poker" ] }, { "cell_type": "markdown", "id": "ef6939f6", "metadata": {}, "source": [ "Games loaded from EFG in OpenSpiel do not take advantage of the full functionality of the package, for example, it is not possible to carry out training with RL algorithms on these games, as in the example above with Tiny Hanabi. The OpenSpiel documentation explains [how to submit new games to the library](https://openspiel.readthedocs.io/en/latest/developer_guide.html#adding-a-game) if you wish to add your own games.\n", "\n", "We can however use the state representation and play through the game step by step:" ] }, { "cell_type": "code", "execution_count": 29, "id": "c01c4d6f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ops_one_card_poker.num_distinct_actions()" ] }, { "cell_type": "markdown", "id": "9986860c", "metadata": {}, "source": [ "The one-card poker game has 4 distinct actions, 2 are for the first player (Alice in the example game): \"Raise\" and \"Fold\", and 2 for the second player (Bob): \"Meet\" and \"Pass\".\n", "\n", "Initialising the game state, we can see the current player at the start is the chance player, who deals the cards:" ] }, { "cell_type": "code", "execution_count": 30, "id": "3b9cc43b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0: Chance: 1 King 0.5 Queen 0.5" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state = ops_one_card_poker.new_initial_state()\n", "state" ] }, { "cell_type": "markdown", "id": "7b0959f9", "metadata": {}, "source": [ "Let's make the chance player's action dealing a King (action 0):" ] }, { "cell_type": "code", "execution_count": 31, "id": "4dd5d504", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1: Player: 1 1 Raise Fold" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state.apply_action(0)\n", "state" ] }, { "cell_type": "markdown", "id": "b4291f07", "metadata": {}, "source": [ "As expected, it's now the first player's (Alice's) turn.\n", "Let's have Alice choose to \"Raise\" (action 0):" ] }, { "cell_type": "code", "execution_count": 32, "id": "bd15369f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3: Player: 2 1 Meet Pass" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state.apply_action(0)\n", "state" ] }, { "cell_type": "markdown", "id": "cd63f7d7", "metadata": {}, "source": [ "As expected, the current player is now player 2 (Bob), let's check the legal actions available to Bob:" ] }, { "cell_type": "code", "execution_count": 33, "id": "8d81ff6b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2, 3]" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state.legal_actions()" ] }, { "cell_type": "markdown", "id": "fdb5194f", "metadata": {}, "source": [ "Whereas player 1 (Alice) had the option to \"Raise\" (action 0) and \"Fold\" (action 1), player 2 (Bob) now has the option to \"Meet\" (action 2) or \"Pass\" (action 3).\n", "Let's have Bob choose to \"Pass\":" ] }, { "cell_type": "code", "execution_count": 34, "id": "97913fe5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6: Terminal: Alice wins 1 -1" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state.apply_action(3)\n", "state" ] }, { "cell_type": "markdown", "id": "1bf09576", "metadata": {}, "source": [ "Since Bob passed, Alice takes the small win and we reach a terminal state." ] } ], "metadata": { "kernelspec": { "display_name": "gbt_pygraphviz", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.13" } }, "nbformat": 4, "nbformat_minor": 5 }