Title:

Inverse Reinforcement Learning of Interaction Dynamics from Demonstrations

Poster

Preview Converted Images may contain errors

Abstract

This poster presents a framework to learn the reward function underlying high-level sequential tasks from demonstrations. The purpose of reward learning, in the context of learning from demonstration (LfD), is to generate policies that mimic the demonstrator's policies, thereby enabling imitation learning. We focus on a human-robot interaction (HRI) domain where the goal is to learn and model structured interactions between a human and a robot. Such interactions can be modeled as a partially observable Markov decision process (POMDP) where the partial observability is caused by uncertainties associated with the ways humans respond to different stimuli. The key challenge in finding a good policy in such a POMDP is determining the reward function that was observed by the demonstrator. Existing inverse reinforcement learning (IRL) methods for POMDPs are computationally very expensive and the problem is not well understood. In comparison, IRL algorithms for Markov decision process (MDP) are well defined and computationally efficient. We propose an approach of reward function learning for high-level sequential tasks from human demonstrations where the core idea is to reduce the underlying POMDP to an MDP and apply any efficient MDP-IRL algorithm. Our extensive experiments suggest that the reward function learned this way generates POMDP policies that mimic the policies of the demonstrator well.

Authors

First Name Last Name
Marek Petrik
Momotaz Begum
Mostafa Hussein

File Count: 1


Leave a comment

Comments are viewable only by submitter



Submission Details

Conference GRC
Event Graduate Research Conference
Department Computer Science (GRC)
Group Leitzel - Poster
Added April 14, 2020, 12:53 p.m.
Updated April 20, 2020, 11:10 a.m.
See More Department Presentations Here