Renyuan Xu, ISE, USC

Abstract: Linear-quadratic (LQ) framework is widely studied in the literature of stochastic control, game theory, and mean-field analysis due to its simple structure, tractable solution, and local approximation power to nonlinear control problems. In this talk, we discuss several theoretical results of the policy gradient (PG) method, a popular reinforcement learning algorithm, for several LQ problems where agents are assumed to have limited information about the stochastic system. In the single-agent setting, we explain how the PG method is guaranteed to learn the global optimal policy. In the multi-agent setting, we show that (a modified) PG method could guide agents to find the Nash equilibrium solution provided there is a certain level of noise in the system. The noise can either come from the underlying dynamics or carefully designed explorations from the agents. Finally when the number of agents goes to infinity, we propose an exploration scheme with entropy regularization that could help each individual agent to explore the unknown system as well as the behavior of other agents. The proposed scheme is shown to be able to speed up and stabilize the learning procedure. The numerical performance of PG methods is demonstrated with two examples, one is the optimal execution problem under the single-agent setting and the other one is the institutional negotiation/bargaining problem under the multi-agent setting. This talk is based on several projects with Xin Guo (UC Berkeley), Ben Hambly (U of Oxford), Huining Yang (U of Oxford), and Thaleia Zariphopoulou (UT Austin).

This program is open to all eligible individuals. USC operates all of its programs and activities consistent with the university’s Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation or any other prohibited factor.

 

Event Details

See Who Is Interested

0 people are interested in this event


Zoom Meeting: https://usc.zoom.us/j/96993120756?pwd=dU5NT2xEemM3RVlJb1ZuQ0lsTnR1Zz09

Meeting ID: 969 9312 0756
Passcode: 171859

User Activity

No recent activity