Title:
Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming
Poster
Preview Converted Images may contain errors
Abstract
Multi-model Markov decision process (MMDP)
is a promising framework for computing policies
that are robust to parameter uncertainty in MDPs.
MMDPs aim to find a policy that maximizes the
expected return over a distribution of MDP mod-
els. Because MMDPs are NP-hard to solve, most
methods resort to approximations. In this paper,
we derive the policy gradient of MMDPs and propose CADP, which combines a coordinate ascent
method and a dynamic programming algorithm for
solving MMDPs. The main innovation of CADP
compared with earlier algorithms is to take the coordinate ascent perspective to adjust model weights
iteratively to guarantee monotone policy improvements to a local maximum. A theoretical analysis
of CADP proves that it never performs worse than
previous dynamic programming algorithms like
WSU. Our numerical results indicate that CADP
substantially outperforms existing methods on several benchmark problems.
Authors
First Name |
Last Name |
Marek
|
Petrik
|
Xihong
|
Su
|
Leave a comment
Submission Details
Conference GRC
Event Graduate Research Conference
Department Computer Science (GRC)
Group Poster Presentation
Added March 29, 2024, 3:51 p.m.
Updated March 29, 2024, 3:59 p.m.
See More Department Presentations Here