A graphical abstract of MADR and the robotic experiments. In this work, we propose enriching self-supervised learning of HJ-PDE's, i.e. DeepReach, with supervision given by the best sampled game rollout (top left),
where the opponents policy is defined by the current value approximation (top right). We demonstrate this approach performs across games with general dynamics (bottom), namely with TurtleBots, Drones and Humanoid experiments.
Abstract
Hamilton-Jacobi (HJ) Reachability offers a framework for generating safe value functions and policies in the face of adversarial disturbance, but is limited by the curse of dimensionality. Physics-informed deep learning is able to overcome this infeasibility,
but itself suffers from slow and inaccurate convergence, primarily due to weak PDE gradients and the complexity of self-supervised learning. A few works, recently, have demonstrated that enriching the self-supervision process with regular supervision
(based on the nature of the optimal control problem), greatly accelerates convergence and solution quality, however, these have been limited to single player problems and simple games. In this work, we introduce MADR,
a general framework to robustly approximate the two-player, zero-sum differential game value function. In doing so, MADR yields the corresponding optimal strategies for both players in zero-sum games as well as safe policies
for worst-case robustness. We test MADR on a multitude of high-dimensional simulated and real robotic agents with varying dynamics and games, finding that our approach significantly out-performs state-of-the-art baselines in
simulation and produces impressive results in hardware.
Algorithm
Graphical depiction of the MPC formulation combining both player's rollouts in the loss function for the proposed MPG-guided adversarial PINN training.
We augment the training loss function for the neural net, combining the MPC rollouts for both players as an additional loss term to the traditional PDE loss.
Essentially, for each player, we generate its own actions using sampling based MPC, and the opponents actions from the current approximation of the learned value
at each timestep along the trajectory rollout.
Hardware Experiments
All videos shown at 2x speed
Drones - Light Up Trajectory
TurtleBots - Evader: DP vs Pursuer: MADR
Drones - Trajectory with MADR Follow Policy
Humanoid - Drone Pursuing
Humanoid - Drone Evading
Citation
If you use our method or code in your research, please consider citing the paper as follows:
@online{TeohTonkensEtAl2025,
author = {Teoh, R., Tonkens, S., Sharpless, W., Yang, A., Feng, Z., Bansal, S., and Herbert, S.},
title = {MADR: MPC-Guided Adversarial DeepReach},
year = {2025},
}