Nash Q-learning agents in Hotelling’s model: Reestablishing equilibrium

doi:10.1016/j.cnsns.2021.105805

Communications in Nonlinear Science and Numerical Simulation

Volume 99, August 2021, 105805

https://doi.org/10.1016/j.cnsns.2021.105805 Get rights and content

Highlights

•
Agent-based simulation under the Hotelling’s model setting is conducted.
•
Two agents use the Nash Q-learning mechanism for adaptation.
•
Under quadratic consumer cost function, agents learn an aggressive market strategy.
•
Principle of Minimum Differentiation can be justified based on repeated interactions.

Abstract

This paper examines adaptive agents’ behavior in a stochastic dynamic version of the Hotelling’s location model. We conduct an agent-based numerical simulation under the Hotelling’s setting with two agents who use the Nash Q-learning mechanism for adaptation. This allows exploring what alternations this technique brings compared to the original analytic solution of the famous static game-theoretic model with strong assumptions imposed on players. We discover that under the Nash Q-learning and quadratic consumer cost function, agents with high enough valuation of future profits learn behavior similar to aggressive market strategy. Both agents make similar products and lead a price war to eliminate their opponent from the market. This behavior closely resembles the Principle of Minimum Differentiation from Hotelling’s original paper with linear consumer costs. However, the quadratic consumer cost function would otherwise result in the maximum differentiation of production in the original model. Thus, the Principle of Minimum Differentiation can be justified based on repeated interactions of the agents and long-run optimization.

Introduction

Some of the most influential economic models stand on game theory. Game-theoretic concepts have been famously used in the past to show that in a society of self-interested individuals, the tragedy of the commons arises [15], or that non-dictatorial voting methods are subject to strategic voting [7]. Game theory allows us to formulate and analyze problems that include decision making in competitive or cooperative environments and offers solution concepts such as the Nash equilibrium. Based on game-theoretic models, we can make conditional conclusions about the behavior of real economic actors. Such models often rely on strong assumptions such as agents’ perfect rationality and complete and perfect information. However, humans are neither perfectly rational nor have perfect and complete information available. Thus, in addition to finding the Nash equilibrium in games, we should in the first place also ask how and if ever boundedly rational agents or agents do so without perfect information and get to play the Nash equilibrium.

Experimental economics provides a few studies that partly tackle these issues. In experiments presented, participants were supposed to play certain games multiple times. It has been shown that over time most of the experiment participants were getting closer to the Nash equilibrium in the Beauty Contest game [17] and similar conclusions were obtained for bargaining games [22]. These results indicate that some underlying adaptive processes could, at least some games, enable agents to converge towards the Nash equilibrium during repeated interactions. Recently, [2] build a dynamical model of experimental oligopoly games with the Cournot-Nash outcome as a stationary state of the model with two types of agents: adaptive agents that adjust their behavior to increases their profit and agents with imitative behavior. The authors suggest that their model is capable of reproducing the outcomes of experimental oligopoly games qualitatively.

With the rise in modern computers’ operational capacity, new techniques for analyzing economic systems have emerged. For instance, using numerical simulations, [21] study a Cournot duopoly model with heterogeneous competitors using the bifurcation analysis and further analyzing the stability switching curves. The authors suggest stability conditions of the unique Nash equilibrium and conclude the stability of the economy. Another such technique is called ‘agent-based simulation’ that consists of software agents placed in a virtual environment and the environment itself. Agents interact with each other and/or with the environment, and from their micro behavior, a global behavioral pattern can emerge. The rules that guide the agents’ behavior range from simple heuristics to more complex, possibly adaptive ones. For example, Waltman and Kaymak [24] use Q-earning to model firms in repeated Cournot oligopoly game, and [12] study differentiated market dynamics for agents imitating the behavior of more successful agents. Nagel and Vriend [18] apply learning direction theory to analyze agents in an oligopolistic environment with restricted information and Golman and Page [8] study basins of attraction and equilibrium selection under different learning rules. Nakov and Nuño [19] use mechanisms similar to reinforcement learning to simulate learning of economic agents on stock markets and Lahkar and Seymour [13] apply reinforcement learning to show that agents in a population game revise mixed strategies. An overview of learning methods can be found in [3], [6]. This paper analyzes a learning method inspired by reinforcement learning called Nash Q-learning [11].

Utilizing an agent-based simulation and the reinforcement learning methodology, we explore how adaptive agents without perfect information behave in Hotelling’s location model [10] with quadratic consumer cost functions. Comparative analysis between the theoretical findings and the results of the agent-based simulation is provided. Additionally, we evaluate the reinforcement learning suitability for use in economic agent-based simulations and compare it to other learning methods.

Hotelling’s location model is a microeconomic model presented by Harold Hotelling in 1929. The author found that two rational producers in the same market should make their products as similar as possible [10]. This phenomenon is called the Principle of Minimum Differentiation. Nevertheless, it has been shown that Hotelling’s conclusions regarding minimum differentiation are invalid, and based on Hotelling’s argumentation, “nothing can be said about the tendency of both sellers to agglomerate at the center of the market” [4, p. 1145]. Slightly modified versions of the location model with different consumer cost functions have been proposed, where the Principle of Maximum Differentiation [4] and the Principle of General Differentiation [5] can be justified. However, Hanaki et al. [9] analytically and numerically show that for $n, n \geq 2,$ boundedly rational players following myopic best-reply strategy, the players spend most of the time around the center of Hotelling’s street, which could re-establish Hotelling’s Principle of Minimum Differentiation. Similarly, Matsumura et al. [16] show that minimum differentiation could be realized with evolutionary dynamics. Also, according to Bester et al. [1, p. 165], there are infinitely many mixed strategies in Hotelling’s location game, in which “coordination failure invalidates the Principle of ‘Maximum Differentiation’ and firms may even locate at the same point”.

The simulation contains two self-interested agents competing in a location model framework. The agents have no previous knowledge of the game or the opponent. The agents’ main challenge is to communicate respective preferences through mutual interaction, learn about the game pay-offs, and try to find the Nash equilibrium strategy profile of the game. Every round of the simulation, agents have to choose what direction to move (location change) and what price to charge (price change). After taking actions, they receive information about their opponent’s action and thus also about the current state of the game. They also receive positive or negative feedback based on how well they played in that particular round. The feedback is constructed in compliance with Hotelling’s profit function. Agents can see theirs as well as their opponent’s profit. The Nash Q-learning algorithm by Hu and Wellman [11] is used to guide our agents’ adaptive behavior. Since Hotelling’s location model contains convenient symmetries, agents learn not only from their experience but also from their opponent’s experience. That is, both agents model their opponent as if they were the opponent.

The paper proceeds as follows. Section 1 provides details of Hotelling’s location model. In Section 2, we theoretically discuss the learning methods. In Section 3, we describe technical details of the implementation, and Section 4 summarizes and interprets the important results of our simulation-based analysis. The penultimate Section 5 addresses several technical issues of our pioneering approach that might introduce open questions for future research. Finally, in Section 6, we conclude the paper with a summary of the crucial findings. The complete code is available on GitHub.

Section snippets

Hotelling’s location model

The location model by Hotelling [10] introduces a strategic game among two producers. The game can be divided into two stages. In the second stage (short-run), producers compete in prices given to a fixed pair of locations $c_{1}, c_{2} \in [a, b]$ . In the first stage (long-run), producers compete in a location given that they adjust the Nash equilibrium in prices instantly. Then a combination of prices and locations $(p_{l}^{*}, p_{r}^{*}, c_{l}^{*}, c_{r}^{*})$ is a pure strategy Nash equilibrium if $p_{l}^{*}, p_{r}^{*}$ are the Nash equilibrium

Reinforcement learning

Many times, humans and animals alike have no explicit teacher who would show them how to act in an unknown environment. Nevertheless, through experimenting with available actions and a recognition of how the actions influence the world that humans and animals perceive, they are able to form conclusions about cause and effect, consequences of actions, and about what to do in order to achieve goals [23]. According to the authors, agents in a reinforcement learning setting learn how to choose

Implementation

The original Hotelling’s model is a differential game, where the seller’s location and price are continuous variables. For simulation purposes, we discretized the price and location space. Thus, the game played by our agents is no longer smooth, and there can be multiple equilibria. Moreover, the original game is played only once. Since learning happens iteratively, we added a time dimension to our simulation. The agents cannot freely select their location in a single step but have to travel

Asymptotic behavior

First, we summarize the asymptotic behavior of key metrics for our agents. We observe the development of prices, locations, and profits in our simulation. From the location we also calculate the distance between the agents at every time step. The development of all parameters is captured for 30,000 time steps. To soften the raw data we use rolling averages of various lengths according to given situations. Let us now only concentrate on profits. We take profits from all 100 simulation runs for

Technical issues

We take Hotelling’s location model [10] and cut the continuous parameters from the model into discrete pieces in order to be able to run a discrete simulation. This is technically straightforward to accomplish. However, it brings the following issues:

i.
On one hand, there is surely a Nash equilibrium in a finite game [20], on the other hand, this equilibrium may be non-unique and depends on the way we cut the parameters of the differential game.
ii.
Due to non-uniqueness of the Nash equilibrium, our

Conclusion

This paper aims at examining whether simulated adaptive behavior in Hotelling’s location model without perfect information brings different results from the theoretical model. Originally, Hotelling’s model with linear consumer costs supported the Principle of Minimum Differentiation, where the sellers meet in the middle of the street. However, the model was shown to be invalid [4], and valid alternatives with adjusted consumer cost function have been proposed [5]. Nevertheless, the adjustments

CRediT authorship contribution statement

Jan Vainer: Methodology, Software, Validation, Formal analysis, Data curation, Writing - original draft, Visualization, Investigation. Jiri Kukacka: Conceptualization, Resources, Supervision, Writing - review & editing, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Jiri Kukacka gratefully acknowledges financial support from the Charles University PRIMUS program [project PRIMUS/19/HUM/17] and from the Charles University UNCE program [project UNCE/HUM/035].

References (25)

H. Bester et al.
A noncooperative analysis of Hotelling’s location game
Games Econ Behav
(1996)
L. Cerboni Baiardi et al.
Experimental oligopolies modeling: a dynamic approach based on heterogeneous behaviors
Commun Nonlinear Sci Numer Simul
(2018)
C. d’Aspremont et al.
On Hotelling’s “Stability in competition”
Econometrica
(1979)
D. Friedman
Evolutionary economics goes mainstream: a review of the theory of learning in games
Journal of Evolutionary Economics
(1998)
H. Hotelling
Stability in competition
The Economic Journal
(1929)
R. Lahkar et al.
Reinforcement learning in population games
Games Econ Behav
(2013)
M.L. Littman
Friend-or-Foe Q-learning in general-sum games
ICML
(2001)
J.F. Nash
Equilibrium points in n-person games
Proceedings of the National Academy of Sciences
(1950)
A.E. Roth et al.
Bargaining and market behavior in Jerusalem, Ljubljana, Pittsburgh, and Tokyo: an experimental study
Am Econ Rev
(1991)
C.J.C.H. Watkins
Learning from delayed rewards
(1989)

T. Chmura et al.

Learning in experimental 2x2 games

Games Econ Behav

(2012)

N. Economides

Minimal and maximal product differentiation in Hotelling’s duopoly

Econ Lett

(1986)

Cited by (5)

How to promote the transition of fuel vehicle enterprises under dual credit policy? An improved tripartite evolutionary game analysis with time delay
2024, Energy
Under the context of carbon peak and carbon neutrality, new energy vehicle (NEV) is considered as ultimate objective in development of automotive industry. To facilitate sustainable growth of automotive market and ensure a smooth transition for fuel vehicle (FV) enterprises, this paper constructs a tripartite evolutionary game model based on tripartite Hoteling model, which involving NEV enterprises, FV enterprises and governments. The herd effect and time-delay effect are considered to improve the tripartite evolutionary game. Results indicate that: (1) Fund feedback to NEV enterprises will counterintuitively decrease probability of active assistance from NEV enterprises. (2) Governments’ willingness for strict involvement positively correlates with supervision costs but negatively correlates with additional benefits. (3) Time delay in funds leads to weak effect of high default penalties in promoting active assistance of NEV enterprises in early stage, but opposite situation appears in later stage. (4) Higher NEV production proportion counterintuitively stimulates FV enterprises to give up transition. (5) High rewards drive diffusion of herd effect, further increasing competition among vehicle enterprises leading to FV enterprises giving up transition. (6) Herd effect and time-delay effect increase dependence of FV enterprises on funds results in higher sensitivity of credit trading price compared to credit ratio requirement.
A large-scale equilibrium model of energy emergency production: Embedding social choice rules into Nash Q-learning automatically achieving consensus of urgent recovery behaviors
2022, Energy
Citation Excerpt :
A novel grouping Nash scenarios learning algorithm with social choice rules is developed by combining Social Choice Theory with being modified Nash Q-learning methods. Here, the Nash Q-learning methods follow a noncooperative multiagent context based on assuming Nash equilibrium behaviour over the current Q-values [34], the Nash Q-learning mechanism for adaptation [35], Nash Q-learning algorithm applied for computation of game equilibrium under the unknown environment [36], and Q-learning algorithm to solve the problem of non-zero sum Nash games of linear time invariant systems [37]. Introducing Social Choice Theory concerns with the Social Choice Rules in Common Resource Conflicts [38], Nash equilibrium and group strategy consensus problem [39], a Graph Model for Conflict Resolution methodology [40], a positive theory of social choice [41], an approach to rational equilibrium analysis [42] and the social choice rule implemented as the unique perfect equilibrium outcome of a game of perfect information [43].
A large-scale energy emergency production plan driven by extreme events in energy supply chain networks is the low-probability/high-consequence event that is difficult to prepare. One of the most prominent challenges is efficiently computing the equilibrium point characterized with more frequently involved in common resource conflicts due to different player behaviors in energy supply chain networks. In this paper, a novel large-scale equilibrium model of energy emergency production: embedding social choice rules into Nash Q-learning automatically achieving consensus of urgent recovery behaviors, is proposed to tackle this challenge. The main contributions of this work are that firstly set up a large-scale equilibrium model of energy emergency production to formulate energy emergency production plans by modifying the large-scale energy equilibrium model, and the computational limitations of Generalized Nash Equilibrium are overcame by combination of Nash Q-learning methods and individuals’ preferences reaching a collective decision which guarantees uniqueness of the large-scale Nash equilibrium to achieve both system-level efficiency and maximum fairness. Simulations results show that the generalized Nash bargaining solution can be implemented by the proposed large-scale equilibrium model of energy emergency production, in which outcome of the game is the emergency production stable equilibria alternative with no chance moves in a given consensus level, and compared with the existing techniques considering non-cooperative behaviors, it has a significantly lower minimisation of time for the energy restoration by twenty-nine percent, and reduces minimisation of cost for the energy restoration by seventeen percent and minimisation of carbon dioxide emissions by twenty-three percent with disaster recovery.
Simulation of the Hotelling–Smithies game: Hotelling was not so wrong
2022, Communications in Nonlinear Science and Numerical Simulation
Citation Excerpt :
Last but not least, we aim to connect our work with more general approaches to the problem of how learning dynamics lead (or not) to the NE [34,35].
This work studies the Hotelling game with elastic demand, that is, the Hotelling–Smithies game, through numerical simulation. The implemented simulation technique allows to monitor the correctness of the analytical solution of the game when it is available, for example, the monopolistic model, and to explore scenarios where the analytical solution is not available or cumbersome to find. This is even the case for Nash equilibrium in games with fixed locations. In games with variable location and price, and in games with variable location and fixed price, the locations reached through the simulation drift towards the center, as stated by H. Hotelling in his seminal article.
Research on pricing models for technology-trading platforms with different business models: A two-stage dynamic game model
2024, Managerial and Decision Economics
Supply chain ethics and transparency: An agent-based model approach with Q-learning agents
2022, Managerial and Decision Economics

View full text

Research paperNash Q-learning agents in Hotelling’s model: Reestablishing equilibrium

Highlights

Abstract

Introduction

Section snippets

Hotelling’s location model

Reinforcement learning

Implementation

Asymptotic behavior

Technical issues

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Games Econ Behav

Commun Nonlinear Sci Numer Simul

Econometrica

Journal of Evolutionary Economics

The Economic Journal

Games Econ Behav

ICML

Proceedings of the National Academy of Sciences

Am Econ Rev

Learning from delayed rewards

Learning in experimental 2x2 games

Games Econ Behav

Minimal and maximal product differentiation in Hotelling’s duopoly

Econ Lett

Research paper
Nash Q-learning agents in Hotelling’s model: Reestablishing equilibrium