DM1: MeanFlow with Dispersive Regularization

for 1-Step Robotic Manipulation
Guowei Zou1, Haitao Wang1, Hejun Wu1, Yukun Qian1, Yuhang Wang1, Weibing Li1,*
1Sun Yat-sen University     *Corresponding Author
TL;DR: We introduce DM1, a novel flow matching framework that integrates dispersive regularization into MeanFlow to enable one-step action generation while preventing representation collapse. DM1 achieves 20-40× faster inference and improves success rates by 10-20 percentage points on robotic manipulation tasks, demonstrating robust sim-to-real transfer.

Overview

DM1 Overview
Visualization of the effect of dispersive regularization in DM1: (a) Example rollouts showing how similar observations can lead to incorrect vs. correct grasps; (b–c) Feature distributions without and with dispersive loss; (d) Method landscape illustrating the speed–quality trade-off; (e) Quantitative comparison of success rate versus inference time.

Approach Overview

DM1 addresses two fundamental challenges in flow-based robotic control:

🚀 Challenge 1: One-Step Efficiency

Diffusion policies require 50-100 neural function evaluations (NFEs), preventing real-time deployment. Flow-based models reduce steps but still require iterative ODE integration.

Our Solution: MeanFlow enables true 1-NFE generation by directly predicting average velocity fields, achieving 20-40× speedup.

⚠️ Challenge 2: Representation Collapse

One-step generation methods suffer from representation collapse where distinct observations map to nearly identical embeddings, degrading performance on complex tasks.

Our Solution: Dispersive regularization encourages feature diversity across multiple embedding layers without architectural modifications.

DM1 Framework
DM1 Framework Architecture: Integrating one-step action generation with dispersive regularization across multimodal inputs (vision + state).

Key Technical Contributions

Performance Evaluation

To comprehensively evaluate the performance and effectiveness of the DM1 framework, our analysis focuses on the following four research questions:

RQ1: Does DM1 achieve one-step generation efficiency?

Answer: Yes. DM1 achieves 20-40× faster inference compared to baseline methods while maintaining competitive performance. With only 5 denoising steps, DM1 attains:

RQ2: Does dispersive regularization prevent representation collapse?

Answer: Yes. Dispersive regularization significantly improves success rates across all tasks by preventing representation collapse. As shown in Figure 3 below, MeanFlow with dispersive regularization (MF+Disp) consistently outperforms vanilla MeanFlow (MF), especially on complex tasks like Transport.

Success Rate Comparison
Figure 3: Comprehensive evaluation across varying denoising steps and weight configurations on four robotic manipulation tasks (Lift, Can, Square, Transport). The results validate RQ2 by showing that dispersive regularization prevents collapse and improves performance.
Task Baseline (32-128 steps) DM1 (5 steps) Improvement Speedup
Lift ~85% 99% +14% 6.4-25.6×
Can Variable High success +10-20% 20-40×
Square Moderate Improved +15-25% 20-40×
Transport Low Significantly improved +20-30% 20-40×
RQ3: How do different dispersive regularization variants compare?

Answer: Among the four dispersive regularization variants (InfoNCE-L2, InfoNCE-Cosine, Hinge, Covariance-based), InfoNCE-Cosine performs best. Figure 4 below shows the analysis across different regularization weights:

Dispersive Variants Analysis
Figure 4: Analysis of success rate versus weight configurations across different tasks. This addresses RQ3 by comparing different dispersive regularization variants, showing InfoNCE-Cosine's superior performance.

Key Findings:

Real-World Deployment

RQ4: Does DM1 transfer to real-world robotic systems?

Answer: Yes. We validated DM1 on a Franka-Emika-Panda robot with eye-in-hand RGB camera (96×96×3) using an NVIDIA RTX 2080 GPU, demonstrating robust sim-to-real transfer.

Real Robot Experiments
Real-world deployment on Franka Panda manipulator: grasping tasks showing MeanFlow success (green checkmarks) vs. ShortCut Flow failures (red crosses). Top row: wrist-mounted camera views (actual visual input to policy). Middle row: successful executions. Bottom row: failure cases.
Lift Task - Success ✅
Lift Task - Failure ❌
Can Task - Success ✅
Can Task - Failure ❌

Key Results

Latency Breakdown (Real Robot)

Per-stage latency breakdown (ms) for Lift task on physical robot. MF: MeanFlow (Ours), SC: ShortCut, RF: ReFlow. Numbers in parentheses indicate denoising steps.

Planner Camera State Prep. MF(1) MF(5) SC(32) RF(128) Planning Send T-MF(1) T-MF(5) T-SC(32) T-RF(128)
Cartesian 5.4 0.1 0.4 2.4 10.5 76.5 305.3 1.7 1.1 11.1 19.2 85.2 314.1
BiT-RRT* 7.6 0.2 0.5 2.4 11.4 77.0 306.9 89.9 2.5 103.1 112.1 177.7 407.6
RRTConnect* 8.4 0.2 0.4 2.5 13.8 79.1 312.8 152.4 2.8 166.7 178.0 243.3 477.0
RRT* 8.7 0.2 0.5 2.4 12.2 78.0 308.6 604.8 2.6 619.3 629.0 694.8 925.4

Key Observations:

Citation

@misc{zou2025dm1meanflowdispersiveregularization, title={DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation}, author={Guowei Zou and Haitao Wang and Hejun Wu and Yukun Qian and Yuhang Wang and Weibing Li}, year={2025}, eprint={2510.07865}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2510.07865} }

Related Work