DMPO addresses three interconnected challenges in real-time robotic control:
Multi-step sampling in diffusion and flow-based policies incurs significant latency, while distillation-based one-step methods require complex training pipelines.
Our Solution: MeanFlow enables mathematically-derived single-step inference without knowledge distillation, achieving 694x speedup.
One-step generation methods risk mapping distinct observations to indistinguishable representations, degrading action quality.
Our Solution: Dispersive regularization encourages feature diversity across embeddings, preventing collapse without architectural modifications.
Pure imitation learning cannot surpass expert demonstrations, yet RL fine-tuning is impractical with slow multi-step inference.
Our Solution: One-step inference enables efficient PPO fine-tuning, breaking through the imitation learning ceiling.
Answer: Yes. DMPO achieves dramatic inference efficiency gains with true one-step generation:
Answer: Yes. Dispersive regularization significantly improves success rates by preventing representation collapse:
Answer: Yes. DMPO with only 1 denoising step achieves competitive or superior performance compared to all baselines:
| Method | NFE | Distill. | Lift | Can | Square | Transport |
|---|---|---|---|---|---|---|
| DP-C (Teacher) | 100 | - | 97% | 96% | 82% | 46% |
| CP | 1 | Yes | - | - | 65% | 38% |
| OneDP-S | 1 | Yes | - | - | 77% | 72% |
| MP1 | 1 | No | 95% | 80% | 35% | 38% |
| DMPO (Ours) | 1 | No | 100% | 100% | 83% | 88% |
| Model | Vision | Params | Steps | Time (4090) | Freq | Speedup |
|---|---|---|---|---|---|---|
| DP (DDPM) | ResNet-18x2 | 281M | 100 | 391.1ms | 2.6Hz | 1x |
| CP | ResNet-18x2 | 285M | 1 | 5.4ms | 187Hz | 73x |
| MP1 | PointNet | 256M | 1 | 4.1ms | 244Hz | 96x |
| DMPO (Ours) | light ViT | 1.78M | 1 | 0.6ms | 1770Hz | 694x |
Answer: Yes. We validated DMPO on a Franka-Emika-Panda robot with Intel RealSense D435i camera using an NVIDIA RTX 2080 GPU, demonstrating robust sim-to-real transfer.
Radar charts comparing DMPO against baselines across eight evaluation dimensions: Inference Speed, Model Lightweight, Success Rate, Data Efficiency, Representation Quality, Distillation Free, Beyond Demos, and Training Stability. Each dimension is scored on a 1-5 scale.