Abstract

Due to high dimensionality and non-convexity, real-time optimal control using full-order dynamics models for legged robots is challenging. Therefore, Nonlinear Model Predictive Control (NMPC) approaches are often limited to reduced-order models. Sampling-based MPC has shown poten- tial in nonconvex even discontinuous problems, but often yields suboptimal solutions with high variance, which limits its applications in high-dimensional locomotion. This work introduces DIAL-MPC (Diffusion-Inspired Annealing for Legged MPC), a sampling-based MPC framework with a novel diffusion-style annealing process. Such an annealing process is supported by our theoretical landscape analysis of Model Predictive Path Integral Control (MPPI) and the connection between MPPI and single-step diffusion. Algorithmically, DIAL-MPC iteratively refines solutions online and achieves both global coverage and local convergence. In quadrupedal torque-level control tasks, DIAL-MPC reduces the tracking error of standard MPPI by 13.4 times and outperforms reinforcement learning (RL) policies by 40% in challenging jumping tasks without any training. In particular, DIAL-MPC enables precise real- world quadrupedal jumping with payload. To the best of our knowledge, DIAL-MPC is the first training-free method that optimizes over full-order quadruped dynamics in real-time.
DIAL-MPC is a sampling-based MPC framework with a novel diffusion-style annealing process.
DIAL-MPC is a sampling-based MPC framework with a novel diffusion-style annealing process.

What is DIAL-MPC?

DIAL-MPC is a training-free full-order torque-level legged robot controller: DIAL-MPC is a MPC framework that can optimize over full-order legged robot dynamics in real-time without heavy assumption on dynamics and cost functions. (i.e. plug-and-play with any model and cost function) To our knowledge, this is the first framework achieving both real-time flexibility and RL-level agility in legged locomotion.
Diffusion-Inspired Annealing for Legged MPC (DIAL-MPC) = Sampling-based MPC + Diffusion-style Annealing: To achieve efficient real-time locomotion, DIAL-MPC extends MPPI with a diffusion-style annealing process in both trajectory-level and action-level to achieve better global coverage and local convergence.
Trajectory-level annealing: at certain time step, DIAL-MPC optimizes the planned trajectory iteratively.
 
Action-level annealing: across different time step, the same action is optimized when doing receding horizon with a scheduled sampling kernel.

How Does DIAL-MPC Work?

MPPI is a single-stage diffusion process. Given the following non-convex cost function, MPPI aims to sample from p0. Our theoretical analysis reveals that MPPI is a single-stage diffusion process, where the score function is approximated with Monte Carlo sampling.
notion image
MPPI suffers from suboptimal solutions or high variance. Due to the sparsity and non-smooth nature of p0, MPPI either optimize a over-smoothed function or a highly non-smooth function, leading to suboptimal solutions or high variance. Diffusion process overcomes this problem by iteratively refining solutions over a series of smoothing levels.
notion image
DIAL-MPC use a multi-stage diffusion process for better coverage and convergence. Compared with MPPI which only uses a single-stage diffusion process, DIAL-MPC iteratively refines solutions in a diffusion manner, leading to better coverage and convergence for contact-rich locomotion tasks.
notion image

Compare With Other Methods

Compared with RL: DIAL-MPC use a very similar sampling-based method as RL. But DIAL-MPC is trainig-free and can achieve higher precision control thanks to its diffusion-style annealing process.
Compared with Standard Sampling-based MPC: DIAL-MPC achieve more agile motion and higher precision control with better sampling strategy inspired by diffusion models.
Compared with Nonlinear MPC: DIAL-MPC can handle full-order dynamics and arbitrary cost functions in a plug-and-play manner, while existing MPC methods require high-order models and specialized cost functions.
Aspect
DIAL-MPC
Sampling-based MPC Baselines
Nonlinear MPC
Model-Free RL
Agile Motion
Yes
No
Need careful system identification and solver design
Yes
High-precision Control
Yes
No
Yes
Need careful reward engineering and training design
Full-Order Dynamics
Yes
Yes
No, especially with contact
Yes
Aribitrary Cost
Yes
Yes
No
Yes
Training-Free
Yes
Yes
Yes
No
Test-time generalization
Yes
Yes
Yes
Need extra components in training stage

BibTeX

 
badge