TR2019-128

Learning from Trajectories via Subgoal Discovery


Learning to solve complex goal-oriented tasks with sparse terminal-only rewards often requires an enormous number of samples. In such cases, using a set of expert trajectories could help to learn faster. However, Imitation Learning (IL) via supervised pre-training with these trajectories may not perform as well and generally requires additional finetuning with expert-in-the-loop. In this paper, we propose an approach which uses the expert trajectories and learns to decompose the complex main task into smaller sub-goals. We learn a function which partitions the state-space into sub-goals, which can then be used to design an extrinsic reward function. We follow a strategy where the agent first learns from the trajectories using IL and then switches to Reinforcement Learning (RL) using the identified sub-goals, to alleviate the errors in the IL step. To deal with states which are underrepresented by the trajectory set, we also learn a function to modulate the sub-goal predictions. We show that our method is able to solve complex goal-oriented tasks, which other RL, IL or their combinations in literature are not able to solve.

 

  • Related News & Events

    •  NEWS   New robotics benchmark system
      Date: November 16, 2020
      MERL Contacts: Devesh Jha; Daniel Nikovski; Diego Romeres; Alan Sullivan; Jeroen van Baar
      Research Areas: Artificial Intelligence, Machine Learning, Robotics
      Brief
      • MERL researchers, in collaboration with researchers from MELCO and the Department of Brain and Cognitive Science at MIT, have released simulation software Circular Maze Environment (CME). This system could be used as a new benchmark for evaluating different control and robot learning algorithms. The control objective in this system is to tip and the tilt the maze so as to drive one (or multiple) marble(s) to the innermost ring of the circular maze. Although the system is very intuitive for humans to control, it is very challenging for artificial intelligence agents to learn efficiently. It poses several challenges for both model-based as well as model-free methods, due to its non-smooth dynamics, long planning horizon, and non-linear dynamics. The released Python package provides the simulation environment for the circular maze, where movement of multiple marbles could be simulated simultaneously. The package also provides a trajectory optimization algorithm to design a model-based controller in simulation.
    •