Abstract
This presentation addresses the challenge of sample inefficiency in robotic reinforcement learning with sparse rewards and natural language goal representations. We introduce a mechanism for hindsight instruction replay, leveraging expert feedback, and a seq2seq model for generating linguistic hindsight instructions. Remarkably, our findings demonstrate that self-supervised language generation, where the agent autonomously generates linguistic instructions, significantly enhances learning performance. These results underscore the promising potential of hindsight instruction grounding in reinforcement learning for robotics.
Talk in the series “Train Your Engineering Network”.