Title:
Inverse Reinforcement Learning of Interaction Dynamics from Demonstrations
Poster
Preview Converted Images may contain errors
Abstract
This poster presents a framework to learn the reward function underlying high-level sequential tasks from demonstrations. The purpose of reward learning, in the context of learning from demonstration (LfD), is to generate policies that mimic the demonstrator's policies, thereby enabling imitation learning. We focus on a human-robot interaction (HRI) domain where the goal is to learn and model structured interactions between a human and a robot. Such interactions can be modeled as a partially observable Markov decision process (POMDP) where the partial observability is caused by uncertainties associated with the ways humans respond to different stimuli. The key challenge in finding a good policy in such a POMDP is determining the reward function that was observed by the demonstrator. Existing inverse reinforcement learning (IRL) methods for POMDPs are computationally very expensive and the problem is not well understood. In comparison, IRL algorithms for Markov decision process (MDP) are well defined and computationally efficient. We propose an approach of reward function learning for high-level sequential tasks from human demonstrations where the core idea is to reduce the underlying POMDP to an MDP and apply any efficient MDP-IRL algorithm. Our extensive experiments suggest that the reward function learned this way generates POMDP policies that mimic the policies of the demonstrator well.
Authors
First Name |
Last Name |
Marek
|
Petrik
|
Momotaz
|
Begum
|
Mostafa
|
Hussein
|
Leave a comment
Submission Details
Conference GRC
Event Graduate Research Conference
Department Computer Science (GRC)
Group Leitzel - Poster
Added April 14, 2020, 12:53 p.m.
Updated April 20, 2020, 11:10 a.m.
See More Department Presentations Here