Using imitation and reinforcement learning to tackle long-horizon robotic tasks

Reinforcement learning (RL) is a widely used machine-learning technique that entails training AI agents or robots using a system of reward and punishment. So far, researchers in the field of robotics have primarily applied RL techniques in tasks that are completed over relatively short periods of time, such as moving forward or grasping objects.

A team of researchers at Google and Berkeley AI Research has recently developed a new approach that combines RL with learning by imitation, a process called relay policy learning. This approach, introduced in a paper prepublished on arXiv and presented at the Conference on Robot Learning (CoRL) 2019 in Osaka, can be used to train artificial agents to tackle multi-stage and long-horizon tasks, such as object manipulation tasks that span over longer periods of time.

“Our research originated from many, mostly unsuccessful, experiments with very long tasks using reinforcement learning (RL),” Abhishek Gupta, one of the researchers who carried out the study, told TechXplore. “Today, RL in robotics is mostly applied in tasks that can be accomplished in a short span of time, such as grasping, pushing objects, walking forward, etc. While these applications have a lot value, our goal was to apply reinforcement learning to tasks that require multiple sub-objectives and operate on much longer timescales, such as setting a table or cleaning a kitchen.”

Blog