Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing

Haoru Xue*, Chaoyi Pan*, Zeji Yi, Guannan Qu, Guanya Shi

Carnegie Mellon University

Abstract

Due to high dimensionality and non-convexity, real-time optimal control using full-order dynamics models for legged robots is challenging. Therefore, Nonlinear Model Predictive Control (NMPC) approaches are often limited to reduced-order models. Sampling-based MPC has shown poten- tial in nonconvex even discontinuous problems, but often yields suboptimal solutions with high variance, which limits its applications in high-dimensional locomotion. This work introduces DIAL-MPC (Diffusion-Inspired Annealing for Legged MPC), a sampling-based MPC framework with a novel diffusion-style annealing process. Such an annealing process is supported by our theoretical landscape analysis of Model Predictive Path Integral Control (MPPI) and the connection between MPPI and single-step diffusion. Algorithmically, DIAL-MPC iteratively refines solutions online and achieves both global coverage and local convergence. In quadrupedal torque-level control tasks, DIAL-MPC reduces the tracking error of standard MPPI by 13.4 times and outperforms reinforcement learning (RL) policies by 40% in challenging jumping tasks without any training. In particular, DIAL-MPC enables precise real- world quadrupedal jumping with payload. To the best of our knowledge, DIAL-MPC is the first training-free method that optimizes over full-order quadruped dynamics in real-time.

DIAL-MPC is a sampling-based MPC framework with a novel diffusion-style annealing process.

What is DIAL-MPC?

DIAL-MPC is a training-free full-order torque-level legged robot controller: DIAL-MPC is a MPC framework that can optimize over full-order legged robot dynamics in real-time without heavy assumption on dynamics and cost functions. (i.e. plug-and-play with any model and cost function) To our knowledge, this is the first framework achieving both real-time flexibility and RL-level agility in legged locomotion.

Diffusion-Inspired Annealing for Legged MPC (DIAL-MPC) = Sampling-based MPC + Diffusion-style Annealing: To achieve efficient real-time locomotion, DIAL-MPC extends MPPI with a diffusion-style annealing process in both trajectory-level and action-level to achieve better global coverage and local convergence.

Trajectory-level annealing: at certain time step, DIAL-MPC optimizes the planned trajectory iteratively.

Action-level annealing: across different time step, the same action is optimized when doing receding horizon with a scheduled sampling kernel.

How Does DIAL-MPC Work?

MPPI is a single-stage diffusion process. Given the following non-convex cost function, MPPI aims to sample from p0. Our theoretical analysis reveals that MPPI is a single-stage diffusion process, where the score function is approximated with Monte Carlo sampling.

MPPI suffers from suboptimal solutions or high variance. Due to the sparsity and non-smooth nature of p0, MPPI either optimize a over-smoothed function or a highly non-smooth function, leading to suboptimal solutions or high variance. Diffusion process overcomes this problem by iteratively refining solutions over a series of smoothing levels.

DIAL-MPC use a multi-stage diffusion process for better coverage and convergence. Compared with MPPI which only uses a single-stage diffusion process, DIAL-MPC iteratively refines solutions in a diffusion manner, leading to better coverage and convergence for contact-rich locomotion tasks.

Compare With Other Methods

Compared with RL: DIAL-MPC use a very similar sampling-based method as RL. But DIAL-MPC is trainig-free and can achieve higher precision control thanks to its diffusion-style annealing process.

Compared with Standard Sampling-based MPC: DIAL-MPC achieve more agile motion and higher precision control with better sampling strategy inspired by diffusion models.

Compared with Nonlinear MPC: DIAL-MPC can handle full-order dynamics and arbitrary cost functions in a plug-and-play manner, while existing MPC methods require high-order models and specialized cost functions.

Aspect	DIAL-MPC	Sampling-based MPC Baselines	Nonlinear MPC	Model-Free RL
Agile Motion	Yes	No	Need careful system identification and solver design	Yes
High-precision Control	Yes	No	Yes	Need careful reward engineering and training design
Full-Order Dynamics	Yes	Yes	No, especially with contact	Yes
Aribitrary Cost	Yes	Yes	No	Yes
Training-Free	Yes	Yes	Yes	No
Test-time generalization	Yes	Yes	Yes	Need extra components in training stage

BibTeX



                @misc{xue2024fullordersamplingbasedmpctorquelevel,
                    title={Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing}, 
                    author={Haoru Xue and Chaoyi Pan and Zeji Yi and Guannan Qu and Guanya Shi},
                    year={2024},
                    eprint={2409.15610},
                    archivePrefix={arXiv},
                    primaryClass={cs.RO},
                    url={https://arxiv.org/abs/2409.15610}, 
                }