08.07.2024

Presentation Series: Train Your Engineering Network. Niklas Dieckow

Image generated by DALL·E 3, using Microsoft Copilot Image Creator.

A Simple Model for Collaborative Policies in Reinforcement Learning

Abstract

In human-AI collaboration, the human and the AI agents often share different responsibilities: one of the two being more in charge of guiding the decision, and the other one in charge of making the effective choices. Following this idea, we consider the scenario of an asymmetric collaboration of two agents aiming to take actions together within an environment modeled by a Markov Decision Process (MDP).
The first agent acts as a supervisor that provides the other agent with suggested actions. Meanwhile, the second agent acts as an executor that is in charge of choosing the action to execute in the environment. We model the executor's choice as dependent on a single parameter ϕ ∈ [0, 1], which represents the probability of the agent executing the supervisor's action rather than following its own policy. Under the assumption that the supervisor has access to an optimal policy, we investigate theoretical implications of this model and apply it in practice to a toy problem.
This talk presents some preliminary results on this topic and discusses ideas for future research.

Talk in the series “Train Your Engineering Network”.