Learning MPC with Error Dynamics Regression for Autonomous Racing

Abstract

This work presents a novel Learning Model Predictive Control (LMPC) strategy for autonomous racing at the handling limit that can iteratively explore and learn unknown dynamics in high-speed operational domains. We start from existing LMPC formulations and modify the system dynamics learning method. In particular, our approach uses a nominal, global, nonlinear, physics-based model with a local, linear, data-driven learning of the error dynamics. We conduct experiments in simulation, 1/10th scale hardware, and deployed the proposed LMPC on a full-scale autonomous race car used in the Indy Autonomous Challenge (IAC) with closed loop experiments at the Putnam Park Road Course in Indiana, USA.

The results show that the proposed control policy exhibits improved robustness to parameter tuning and data scarcity. Incremental and safety-aware exploration toward the limit of handling and iterative learning of the vehicle dynamics in high-speed domains is observed both in simulations and experiments.

Why Using Learning MPC for Robot Learning

Learning MPC combines some of the best aspects of traditional and data-driven optimal control. On one side, we have a strong prior in the form of a physics-based model. On the other side, we have the flexibility of data-driven methods to learn the modeling error. This is complemented with some other things:

Constraint reasoning: MPC is good at solving with constraints.
Policy learning: not just for dynamics, LMPC embeds a policy learning tool as well, which is introduced below.
Reference-free learning: with a few demonstrations to get LMPC started, it can converge towards optimal policy without an explicit reference trajectory.

Naturally, some learning tasks benifit more from LMPC. These are usually safety-critical or agile tasks in the real world. For example, autonomous racing, where the vehicle is operating at the limit of handling, is an interesting candidate for LMPC.

How Learning MPC Works

Learning MPC is a data-driven MPC method that uses previous experience to iteratively improve the control policy. It can also be augmented with regression to learn the dynamics of the system.

Learning MPC with Error Dynamics Regression

Previous iterations of states and actions are stored in the sampled Safe Set (SS). Assuming a time-invariant environment, returning to the SS states will guarantee successful completion of the task - this is the control-invariance property of the SS. Therefore, we constrain the last state in the MPC horizon to be in the SS, which yields a iterative and safe exploration behavior around the SS. With the appropriate cost function, the control policy will learn to converge to the optimal policy.

LMPC Can Also Learn Dynamics

We can query the SS to obtain neighboring states and actions to the robot's current state. We can then use this data to learn the dynamics of the system. This is done by fitting a linear model to the data, and has been attempted in previous work.

Instead of doing a full regression to learn the entire dynamics, this work introduces a residual learning approach. The motivation is to leverage a prior model, either physics-based or learned-in-simulation. For the latter, this method helps bridge the Sim2Real gap. We hypothesize that this approach can be more data-efficient and robust to hyperparameter tunings.

To formulate the A, B and C matrices for the DDP-based MPC, we first linearize the prior model to form the baseline prediction for these matrices. We then query the SS to obtain the residual dynamics, which is then added on top of the prior prediction. This is done by solving a weighted least-squares problem.

Results

Small-scale experiment on F1TENTH race cars shows that the proposed error-regression LMPC can learn the dynamics of the vehicle and converge to the optimal policy. Compared with the full-regression LMPC, the error-regression LMPC is more robust to hyperparameter tuning and data scarcity. The plot below compares the failure rate of the two methods. Full-regression LMPC fails 5 out of 10 trials with suboptimal hyperparameters, while the error-regression LMPC fails only 1 out of 10 trials.

The plot below visualizes the learning progress at the 1st, 5th, and 20th iteration. The vehicle starts with few-shot non-expert demonstrations of driving, and gradually learns to drive faster and more aggressively. The vehicle is able to learn to drive at the limit of handling and converge to a consistent lap time.

We also did full-size experiments on profession autonomous race cars in the Indy Autonomous Challenge. These autonomous race cars are capable of driving up to 340 km/h (200 mph).

Takeaways

The series of works on Learning MPC shows that

MPC is a useful paradigm for real-world robot learning.
A locally accurate dynamics may just be enough: we don't need to learn a globally expressive dynamics model.
Iterative learning and exploration helps to mitigate the "chicken-and-egg" problem between high-quality data and good control policy in robot learning.

Learning Model Predictive Control
With Error Dynamics Regression
For Autonomous Racing

ICRA 2024

Learning MPC formulates real-world safe robot learning in an MPC paragidm. Error dynamics regression bridges the Sim2Real gap.

Abstract

Talk

Why Using Learning MPC for Robot Learning

How Learning MPC Works

LMPC Can Also Learn Dynamics

Results

Takeaways

Related Links

BibTeX

The End

Learning Model Predictive Control With Error Dynamics Regression For Autonomous Racing

ICRA 2024

Learning MPC formulates real-world safe robot learning in an MPC paragidm. Error dynamics regression bridges the Sim2Real gap.

Abstract

Talk

Why Using Learning MPC for Robot Learning

How Learning MPC Works

LMPC Can Also Learn Dynamics

Results

Takeaways

Related Links

BibTeX

The End

Learning Model Predictive Control
With Error Dynamics Regression
For Autonomous Racing