TR2022-154
Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application
-
- "Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application", IEEE Transaction on Robotics, DOI: 10.1109/TRO.2022.3184837, Vol. 38, No. 6, pp. 3879-3898, December 2022.BibTeX TR2022-154 PDF
- @article{Romeres2022dec,
- author = {Amadio, Fabio and Dalla Libera, Alberto and Antonello, Riccardo and Nikovski, Daniel N. and Carli, Ruggero and Romeres, Diego},
- title = {Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application},
- journal = {IEEE Transaction on Robotics},
- year = 2022,
- volume = 38,
- number = 6,
- pages = {3879--3898},
- month = dec,
- doi = {10.1109/TRO.2022.3184837},
- issn = {1941-0468},
- url = {https://www.merl.com/publications/TR2022-154}
- }
,
- "Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application", IEEE Transaction on Robotics, DOI: 10.1109/TRO.2022.3184837, Vol. 38, No. 6, pp. 3879-3898, December 2022.
-
MERL Contacts:
-
Research Area:
Abstract:
In this paper, we present a Model-Based Reinforcement Learning (MBRL) algorithm named Monte Carlo Probabilistic Inference for Learning COntrol (MC-PILCO). The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient. This defines a framework in which we ablate the choice of the following components: (i) the selection of the cost function, (ii) the optimization of policies using dropout, (iii) an improved data efficiency through the use of structured kernels in the GP models. The combination of the aforementioned aspects affects dramatically the performance of MC-PILCO. Numerical comparisons in a simulated cart-pole environment show that MC-PILCO exhibits better data efficiency and control performance w.r.t. state-of-the-art GP-based MBRL algorithms. Finally, we apply MC-PILCO to real systems, considering in particular systems with partially measurable states. We discuss the importance of modeling both the measurement system and the state estimators during policy optimization. The effectiveness of the proposed solutions has been tested in simulation and on two real systems, a Furuta pendulum and a ball-and-plate rig.
Related News & Events
-
NEWS Karl Berntorp gave Spotlight Talk at CDC Workshop on Gaussian Process Learning-Based Control Date: December 5, 2022
Where: Cancun, Mexico
MERL Contact: Karl Berntorp
Research Areas: Control, Machine LearningBrief- Karl Berntorp was an invited speaker at the workshop on Gaussian Process Learning-Based Control organized at the Conference on Decision and Control (CDC) 2022 in Cancun, Mexico.
The talk was part of a tutorial-style workshop aimed to provide insight into the fundamentals behind Gaussian processes for modeling and control and sketching some of the open challenges and opportunities using Gaussian processes for modeling and control. The talk titled ``Gaussian Processes for Learning and Control: Opportunities for Real-World Impact" described some of MERL's efforts in using Gaussian processes (GPs) for learning and control, with several application examples and discussing some of the key benefits and limitations with using GPs for learning-based control.
- Karl Berntorp was an invited speaker at the workshop on Gaussian Process Learning-Based Control organized at the Conference on Decision and Control (CDC) 2022 in Cancun, Mexico.
Related Publication
- @article{Romeres2021feb,
- author = {Romeres, Diego and Amadio, Fabio and Dalla Libera, Alberto and Antonello, Riccardo and Carli, Ruggero and Nikovski, Daniel N.},
- title = {Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application},
- journal = {arXiv},
- year = 2021,
- month = feb,
- url = {https://arxiv.org/abs/2101.12115}
- }