Modeling two player games in the sigma graphical cognitive

Pdf File 277.84 KByte, 10 Pages

Modeling Two-Player Games in the Sigma Graphical Cognitive Architecture

David V. Pynadath1, Paul S. Rosenbloom1,2, Stacy C. Marsella1,2, and Lingshan Li1,2

1Institute for Creative Technologies, 2Department of Computer Science University of Southern California, Los Angeles, CA USA

Abstract. Effective social interaction and, in particular, a Theory of Mind are critical components of human intelligence, allowing us to form beliefs about other people, generate expectations about their behavior, and use those expectations to inform our own decision-making. This article presents an investigation into methods for realizing Theory of Mind within Sigma, a graphical cognitive architecture. By extending the architecture to capture independent decisions and problem-solving for multiple agents, we implemented Sigma models of several canonical examples from game theory. We show that the resulting Sigma agents can capture the same behaviors prescribed by equilibrium solutions.

1 Introduction

Social interaction is a critical function of human intelligence, allowing us to make effective decisions when interacting with other people. We form beliefs about others, use those beliefs to generate expectations about their behavior, and use those expectations to inform our own behavior. In doing so, we attribute an intelligence like our own to other people as well. This cognitive capacity for Theory of Mind distinguishes social interaction from the decision-making we do in isolation [18]. We therefore expect that a system capable of artificial general intelligence (AGI) would provide natural support for Theory of Mind. Previous computational approaches have successfully captured aspects of this cognitive capacity, including modeling the beliefs of others [4, 6], predicting behavior [1], and factoring behavior predictions into decision-making [3, 13].

We are interested here in how all of these Theory of Mind capabilities may be realized within Sigma (), a nascent cognitive system--an integrated computational model of intelligent behavior--that is grounded in a cognitive architecture, a model of the fixed structure underlying a cognitive system [11]. In prior work, we have demonstrated this architecture's ability to support multiple cognitive capabilities, such as problem solving [14], mental imagery [16], and learning [15]. Here, our goal is to extend this existing integration to the critical cognitive capability of Theory of Mind, while shedding additional light on both Sigma and the architectural basis of Theory of Mind.

To that end, we explore the degree to which existing capabilities within Sigma's cognitive architecture already provide the necessary support for Theory of Mind. We draw on canonical examples from the game theory literature as test cases for driving the need for reasoning about other agents [2]. These games provide a natural setting for Theory of Mind, because agents seeking to perform well must take each other's action

Proceedings of the Sixth Conference on Artificial General Intelligence (AGI-13)

selection into account. Across these games, we examine the conditions under which the Sigma-generated outcomes conform to those prescribed by game-theoretic analyses.

We also identify the need for new capabilities within Sigma to address these games and present the extensions we implemented to provide them. Sigma turns out to support two distinct approaches to Theory of Mind, based respectively on automatic versus controlled processing [17] or, alternatively, System 1 versus System 2 thinking [7]. Both modes of reasoning are possible without any change to the underlying architecture, providing a novel unification of the two forms of Theory of Mind within a single cognitive system. By successfully realizing Theory of Mind reasoning and combining it with Sigma's other cognitive capabilities (e.g., learning), we now open up novel lines of investigation into more general mechanisms for social reasoning.

2 Sigma

Sigma's cognitive architecture is built upon graphical models [8]. Graphical models provide a general computational technique for efficient computation with complex multivariate functions by leveraging forms of independence to: decompose them into products of simpler functions; map these products onto graphs; and solve the graphs via, for example, message passing or sampling methods. They are particularly attractive as a basis for broadly functional, yet simple and theoretically elegant cognitive architectures, because they provide a single general representation and inference algorithm for processing symbols, probabilities and signals. The cognitive architecture leverages this generality through a core knowledge structure--the conditional--that provides a deep blending of the forms of conditionality found in both rules and probabilistic networks.

Sigma's long-term memory comprises a set of conditionals, which are jointly compiled into a single factor graph [9] at the level below. Memory access occurs by passing messages in this graph, via the summary product algorithm [9], until quiescence; that is, until there are no more messages to send. Each message is an n-dimensional piecewise linear function that is defined over an array of rectilinear regions. These piecewise linear functions can approximate arbitrary continuous functions as closely as desired, as well as be restricted to represent both discrete probability distributions and relational symbol structures. Working memory consists of a set of peripheral nodes in the graph that provide fixed evidence during solution of the long-term-memory graph.

This way of viewing the Sigma cognitive system divides it into three layers: one for the graphical models, one for the (Sigma) cognitive architecture, and one for the knowledge and skills encoded via conditionals on top of the cognitive architecture. However, the Sigma cognitive architecture also embodies its own hierarchy of three layers that is central to realizing Theory of Mind. Modeled after the Soar architecture [10], the Sigma architecture comprises a reactive layer that acts as the inner loop for a deliberative layer that acts as the inner loop for a reflective layer. The reactive layer simply consists of quiescent memory access. In Soar, this form of knowledge search provides a parallel associative retrieval of information stored in rules. In Sigma it searches over the more general representation provided by conditionals and factor graphs.

Sigma's deliberative layer provides a model of sequential action [14], during which operators are selected and applied in service of achieving goals (or of maximizing util-

ities). In Sigma, as in Soar, operator selection and application are both mediated by the reactive layer, with the core cognitive cycle becoming quiescent memory access followed by a decision. In general, the distinction between reactive and deliberative layers maps onto two well-known distinctions in human cognition: automatic versus controlled [17] and System 1 versus System 2 [7].

The reflective layer determines how metalevel processing--including problem space search--occurs when resolving impasses in deliberative processing. In Sigma, three kinds of impasses can occur when attempting to make a decision. A none impasse occurs when there are no candidate operators for selection. A tie impasse occurs when there are multiple candidate operators, but insufficient knowledge to choose among them. A no-change impasse occurs when an operator is selected but no state change results. When processing begins, there is only a base-level state (state 0) active. When an impasse happens in a state, a structure representing it is added to working memory and a new state with a number one higher is added. Whenever an impasse occurs for a state, all existing higher numbered states are deleted, with just the one new state then added. Thus, if impasses occur simultaneously for multiple states, only the lowest numbered impasse goes into effect, as all higher numbered ones are for states that are deleted.

An impasse, along with its resulting state--and all higher numbered states (which ultimately only exist in service of this lower numbered impasse)--will also go away when it is no longer valid. This can happen because the impasse is resolved: the appearance of a candidate operator resolves a none impasse, the selection of an operator resolves a tie impasse, and a change in the state resolves a no-change impasse. It can also happen because the impasse changes, such as when the appearance of multiple undifferentiated operators shifts the impasse from none to tie. Communication across states in the metalevel hierarchy occurs via the affine transforms that were earlier implemented in Sigma in service of modeling mental imagery [16]. Translation along the state dimension moves information to higher numbered states (to initialize metalevel states) and to lower numbered states (to return results from reflective processing).

3 Single-Stage Simultaneous-Move Games

We begin our investigation by modeling

B

C

D

single-stage simultaneous-move games within Sigma. Two agents, A and B, choose from two operators: C and D

AC D

rACC , rBCC rADC , rBDC

rACD, rBCD rADD, rBDD

(typically, cooperate and defect, respec- Fig. 1: Payoff matrix for two-player, tively). After the players simultaneously simultaneous-move game reveal their operator selection, the corre-

sponding entry in the payoff matrix yields the outcome for A and B. For example, if A

cooperates and B defects, then Table 1 specifies that A gets rACD and B gets rBCD.

The typical analysis of such games revolves around Nash equilibria. In such an equi-

librium, each agent is unable to increase its reward by unilaterally changing its chosen

strategy. For example, if rBCC > rBCD and rBDC > rBDD, then agent B will choose

to cooperate, as it yields a higher payoff, regardless of what agent A does. Knowing

that B will cooperate, agent A will also choose to cooperate if rACC > rADC . If both

conditions hold, the situation where both agents cooperate constitutes a Nash equilibrium, because neither has incentive to defect. In this investigation, we consider only such pure-strategy equilibria, where the agents' choices have no stochastic component.

There are at least three different ways the independent decision-making processes of A and B could conceivably be represented within Sigma. Multiple agents could be represented in Sigma as multiple instances of the whole system, or as multiple graphs (and associated decision making) within a single instance of Sigma, or by adding an agent dimension to functions within a single graph. For this work, we chose the third approach, so as to maximize the possibility of sharing functional structure among multiple agents, and thus to minimize the cost of doing complex Theory of Mind reasoning. In addition to adding an agent dimension to functions, this involved simply extending the decision procedure to make independent decisions for each agent.

To model such games in Sigma, we require four conditionals to completely capture the generation of the two agents' expectations and decisions. Two conditionals (AB AA and BA BB in Figure 2) specify the value of each agent's operator selection as a function of its expectations about the other agent's selection. For example, the function at P AB would specify that if B chooses Fig. 2: Factor graph generated for one-shot games oB {C, D}, then the value to A of choosing oA {C, D} would be rAoAoB . Agent B has an analogous conditional (BA BB) regarding the impact of its expectation of A's choice. The remaining two conditionals (AA AB and BB BA) capture A's and B's expectations about each other's action values in response to their own choices.

When executing this model, Sigma begins by computing values over each agent's options (AA and BB in Figure 2), first assuming a uniform distribution over the other agent's possible moves (AB and BA). The resulting values are normalized into a probability distribution over the deciding agent's selection, which then feeds, in turn, into the other agent's conditional (AA AB and BB BA). The values go around this loop several times before converging. By using a linear transformation of action values into probabilities, each agent assigns a nonzero probability that the other does not choose a best response, possibly deviating from a Nash equilibri. However, if agents select operators stochastically according to this distribution (instead of deterministic maximization), then the agents would be playing a quantal response equilibrium, as their actions would be a best response to the probability of each other's choices [12].

Perhaps the most famous example of a single-stage, simultaneous-move game is the Prisoner's Dilemma (Table 1a). From the payoff matrix of Table 1a, we can see that each agent is better off by choosing to defect, regardless of what the other agent does. Thus, there is a single pure-strategy Nash equilibrium, where both agents choose D, even though both agents would be better off if they both chose C.

B C

D

A C 0.3, 0.3 0.1, 0.4 D 0.4, 0.1 0.2, 0.2

(a) Prisoner's Dilemma

B C

D

A C 0.3, 0.3 0.0, 0.1 D 0.1, 0.0 0.1, 0.1

(b) Stag Hunt

B

C

D

B C

D

A C 0.1, 0.1/0.4 0.2, 0.4/0.1 D 0.3, 0.1/0.4 0.1, 0.4/0.1

A C 0.1, 0.2 0.2, 0.1 D 0.2, 0.1 0.1, 0.2

(c) Dominant B strategy

(d) Matching pennies

Table 1: Payoff matrices for one-shot games

When we use Table 1a's payoff matrix, Sigma requires 401 messages to compute the agents' action selection values. The resulting values favor defecting over cooperating, 0.572 vs. 0.428, for both agents, matching the Nash equilibrium. We are thus able to use a single graph to model both agents' individual, self-interested perspectives, as opposed to finding the C-C outcome that is socially optimal from a global perspective.

The Stag Hunt (Table 1b) is an example of a coordination game, because any outcome where the two agents choose the same action is a Nash equilibrium. Intuitively, if both agents cooperate to hunt a stag, they will succeed and receive the maximum payoff. However, any player who defects to hunt a hare will succeed, albeit for a smaller payoff. An agent who attempts to hunt the stag unassisted (i.e., chooses to cooperate when the other agent defects) will fail and receive no payoff. Thus, while this game shares the same D-D equilibrium as the Prisoner's Dilemma, it also supports a C-C equilibrium, which is also the socially optimal outcome.

In the Stag Hunt, Sigma requires 681 messages to compute a distribution where both agents favor cooperating over defecting with values of 0.573 vs. 0.427. Our Sigma model thus finds the socially optimal Nash equilibrium. The values are similar to those for the Prisoner's Dilemma, although the cycle requires significantly more messages.

In the Prisoner's Dilemma, defecting is dominant for each agent regardless of the other agent's choice. In the Stag Hunt, agents greedily seeking the highest potential payoff might also find the C-C equilibrium accidentally. We implemented the game of Table 1c to make sure that our Sigma agents are properly considering their opponents' decisions. There are two variations for B's payoffs, separated by a "/". With the left payoffs, D is a dominant action for B, while with the right, C is dominant. In the former case, C is A's best response, while in the latter, D is the best response. Thus, we can generate two different optimal decisions for A by changing only B's payoffs.

For both versions of this game, Sigma computes a distribution over actions after a total of 226 messages. B's values are symmetric across both versions, with the dominant action being preferred 0.708 vs. 0.292. When B prefers C, A prefers D 0.595 vs. 0.405, while when B prefers D, A prefers C 0.511 vs. 0.489. Thus, agent A makes the equilibrium-specified decision under both cases, while the higher payoff for D in Table 1c shows up in the stronger bias in the former case.

Unlike the previous games, the Matching Pennies game (Table 1d) has no purestrategy Nash equilibrium. If both agents choose the same action, B gets a higher payoff; otherwise, A gets the higher payoff. To see that no pure-strategy equilibrium exists, consider the case if A chooses C, so that B chooses C, so that A chooses D, so that B chooses D, so that A chooses C again, returning us to the beginning of this sequence.

In this game, Sigma finishes computation after only 151 messages with each agent having a value of 0.5 for both operators. With this distribution over values, neither agent has a preference over its operator selection. Given the lack of any equilibrium in this game, it makes sense for the agents to be indifferent among their options.

Sigma is thus able to find the equilibrium strategies through graphical messagepassing, without requiring the logical, case-by-case reasoning that is required in general. Of course, as already described, the graph is not guaranteed to find the equilibrium strategy in all games. There is a potential relationship between the quantitative values computed by Sigma and the frequency with which people play equilibrium strategies in the same games, but studying that relationship is beyond the scope of this investigation.

4 Sequential Games: The Ultimatum Game

In Section 3's games, both players reveal their moves simultaneously. In contrast, the agents in sequential games take turns choosing their actions. In this section, we examine the ultimatum game [5], where agent A starts with a fixed amount of money (3 in our example), and offers a portion of it to B, who then accepts or rejects the offer. If B accepts, it receives the offered amount, while A keeps the remainder. However, if B rejects, both agents get 0 (Figure 3 shows the game tree).

The game's sequential nature leads to a different strategy structure than in simultaneous-move games. While Fig. 3: Extensive form of agent A's strategy is still a single choice (the amount to ultimatum game offer), agent B's strategy is now a function from A's possible offers to a binary acceptance/rejection. There are many Nash equilibria within this strategy space, including one if A offers 0 and B accepts any offer. B has no incentive to reject an offer of 0, and if B accepts any offer, then A has no incentive to offer more. At the other extreme, there is an equilibrium where A offers everything to agent B and B rejects if offered anything less than everything. Again, neither agent has reason to unilaterally deviate.

This highly divergent space of Nash equilibria leads us instead to the stricter concept of subgame perfection, where the overall equilibrium strategy is also an equilibrium strategy over any subgame in the game tree. For example, the second Nash equilibrium mentioned is not a subgame perfect equilibrium, because it is not an equilibrium strategy for B to (for example) reject in the subtree rooted after A offers only 1. Therefore, it is not credible for B to reject any offer less than everything. We can determine the subgame perfect equilibrium strategy by backward induction from the end payoffs. If A offers any amount more than 0, B's best response is to accept the offer, because any positive offer is better than the 0 it would get by rejecting. Out of these positive offers,

Download Pdf File