DM1: MeanFlow with Dispersive Regularization

TL;DR: We introduce DM1, a novel flow matching framework that integrates dispersive regularization into MeanFlow to enable one-step action generation while preventing representation collapse. DM1 achieves 20-40× faster inference and improves success rates by 10-20 percentage points on robotic manipulation tasks, demonstrating robust sim-to-real transfer.

Overview

Visualization of the effect of dispersive regularization in DM1: (a) Example rollouts showing how similar observations can lead to incorrect vs. correct grasps; (b–c) Feature distributions without and with dispersive loss; (d) Method landscape illustrating the speed–quality trade-off; (e) Quantitative comparison of success rate versus inference time.

Approach Overview

DM1 addresses two fundamental challenges in flow-based robotic control:

🚀 Challenge 1: One-Step Efficiency

Diffusion policies require 50-100 neural function evaluations (NFEs), preventing real-time deployment. Flow-based models reduce steps but still require iterative ODE integration.

Our Solution: MeanFlow enables true 1-NFE generation by directly predicting average velocity fields, achieving 20-40× speedup.

⚠️ Challenge 2: Representation Collapse

One-step generation methods suffer from representation collapse where distinct observations map to nearly identical embeddings, degrading performance on complex tasks.

Our Solution: Dispersive regularization encourages feature diversity across multiple embedding layers without architectural modifications.

DM1 Framework Architecture: Integrating one-step action generation with dispersive regularization across multimodal inputs (vision + state).

Key Technical Contributions

One-Step MeanFlow: Direct transformation from Gaussian noise to target actions through learned average velocity fields
Multi-Layer Dispersive Regularization: Applied to temporal, noise, and conditional embeddings to prevent collapse
Four Regularization Variants: InfoNCE-L2, InfoNCE-Cosine, Hinge Loss, and Covariance-based methods
No Architectural Changes: Simple regularization mechanism requiring no additional network modules

Performance Evaluation

To comprehensively evaluate the performance and effectiveness of the DM1 framework, our analysis focuses on the following four research questions:

RQ1: Does DM1 achieve one-step generation efficiency?

Answer: Yes. DM1 achieves 20-40× faster inference compared to baseline methods while maintaining competitive performance. With only 5 denoising steps, DM1 attains:

Simulation: 0.07s per action (DM1) vs. 2-3.5s (baselines) - 28-50× speedup
Network Inference: 10.5-13.8ms (5 steps) vs. 76.5-79.1ms (ShortCut 32 steps) vs. 305.3-312.8ms (ReFlow 128 steps)

RQ2: Does dispersive regularization prevent representation collapse?

Answer: Yes. Dispersive regularization significantly improves success rates across all tasks by preventing representation collapse. As shown in Figure 3 below, MeanFlow with dispersive regularization (MF+Disp) consistently outperforms vanilla MeanFlow (MF), especially on complex tasks like Transport.

Figure 3: Comprehensive evaluation across varying denoising steps and weight configurations on four robotic manipulation tasks (Lift, Can, Square, Transport). The results validate RQ2 by showing that dispersive regularization prevents collapse and improves performance.

Task	Baseline (32-128 steps)	DM1 (5 steps)	Improvement	Speedup
Lift	~85%	99%	+14%	6.4-25.6×
Can	Variable	High success	+10-20%	20-40×
Square	Moderate	Improved	+15-25%	20-40×
Transport	Low	Significantly improved	+20-30%	20-40×

RQ3: How do different dispersive regularization variants compare?

Answer: Among the four dispersive regularization variants (InfoNCE-L2, InfoNCE-Cosine, Hinge, Covariance-based), InfoNCE-Cosine performs best. Figure 4 below shows the analysis across different regularization weights:

Figure 4: Analysis of success rate versus weight configurations across different tasks. This addresses RQ3 by comparing different dispersive regularization variants, showing InfoNCE-Cosine's superior performance.

Key Findings:

InfoNCE-Cosine: Maximizes angular diversity - best overall performance
InfoNCE-L2: Maximizes pairwise Euclidean distances - good but slightly lower than Cosine
Hinge Loss: Enforces minimum separation margin - effective for simple tasks
Covariance-Based: Encourages decorrelation - moderate performance

Real-World Deployment

RQ4: Does DM1 transfer to real-world robotic systems?

Answer: Yes. We validated DM1 on a Franka-Emika-Panda robot with eye-in-hand RGB camera (96×96×3) using an NVIDIA RTX 2080 GPU, demonstrating robust sim-to-real transfer.

Real-world deployment on Franka Panda manipulator: grasping tasks showing MeanFlow success (green checkmarks) vs. ShortCut Flow failures (red crosses). Top row: wrist-mounted camera views (actual visual input to policy). Middle row: successful executions. Bottom row: failure cases.

Lift Task - Success ✅

Lift Task - Failure ❌

Can Task - Success ✅

Can Task - Failure ❌

Key Results

✅ Real-time control: 19.2ms total latency enabling 50Hz+ control frequency
✅ Robust execution: Natural, smooth motions without jittery behavior
✅ Practical viability: Successfully completed manipulation tasks that baseline methods failed
✅ Sim-to-real transfer: Policies trained in simulation transfer effectively to physical hardware

Latency Breakdown (Real Robot)

Per-stage latency breakdown (ms) for Lift task on physical robot. MF: MeanFlow (Ours), SC: ShortCut, RF: ReFlow. Numbers in parentheses indicate denoising steps.

Planner	Camera	State	Prep.	MF(1)	MF(5)	SC(32)	RF(128)	Planning	Send	T-MF(1)	T-MF(5)	T-SC(32)	T-RF(128)
Cartesian	5.4	0.1	0.4	2.4	10.5	76.5	305.3	1.7	1.1	11.1	19.2	85.2	314.1
BiT-RRT*	7.6	0.2	0.5	2.4	11.4	77.0	306.9	89.9	2.5	103.1	112.1	177.7	407.6
RRTConnect*	8.4	0.2	0.4	2.5	13.8	79.1	312.8	152.4	2.8	166.7	178.0	243.3	477.0
RRT*	8.7	0.2	0.5	2.4	12.2	78.0	308.6	604.8	2.6	619.3	629.0	694.8	925.4

Key Observations:

Network Inference: MF(5) requires only 10.5-13.8ms vs. 76.5-79.1ms (ShortCut) vs. 305.3-312.8ms (ReFlow) - 7-29× speedup
Best Configuration: Cartesian planning + MF(5) = 19.2ms total latency enabling 50Hz+ control
Planning Bottleneck: Motion planning complexity (1.7ms to 604.8ms) dominates total latency for complex planners
Practical Insight: Efficient MeanFlow inference enables real-time control even with simple planning strategies

Citation

@misc{zou2025dm1meanflowdispersiveregularization, title={DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation}, author={Guowei Zou and Haitao Wang and Hejun Wu and Yukun Qian and Yuhang Wang and Weibing Li}, year={2025}, eprint={2510.07865}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2510.07865} }

Related Work

Diffusion Policy (CoRL 2023): Pioneered diffusion models for visuomotor control
ReinFlow (2025): Flow matching with online RL for robotic manipulation
MeanFlow (NeurIPS 2025): Mean flows for one-step generative modeling
FlowPolicy (AAAI 2025): 3D flow-based policy via consistency flow matching
D2PPO (2025): Diffusion Policy Policy Optimization with Dispersive Loss
π_0.5 (2025): a Vision-Language-Action Model with Open-World Generalization