Benchmark Results

RQ1: Can CoFlow match or exceed strong offline MARL baselines across continuous and discrete benchmarks?

Answer: Yes. CoFlow-C is best on most table entries, while CoFlow-D remains competitive under decentralized execution. The gains are most visible on coordination-heavy MPE tasks.

Main Result Tables

Full result tables are inserted as screenshots. Bold marks the best method in each row, underline marks the second-best method, and tables scroll horizontally on narrow screens.

Table 1: MPE

Continuous-action cooperative benchmark on Spread, Tag, and World.

Full MPE performance table from the CoFlow paper
Table 2: SMAC

Discrete-action StarCraft benchmark with partial observability.

Full SMAC performance table from the CoFlow paper
Table 3: MA-MuJoCo

Continuous-action locomotion benchmark on 2xAnt and 4xAnt.

Full MA-MuJoCo performance table from the CoFlow paper

Coordination Evidence

RQ2: Do the return gains flow through inter-agent coordination rather than stronger per-agent capacity?

Answer: Yes. Scaling the CVA gate produces a dose-response effect in both reward and landmark coverage, and disabling CVA hurts every tested MPE task.

CVA coordination evidence
CVA gate scans show monotonic improvements in reward and landmark coverage. Direct CVA-off ablations confirm that the gains come from cross-agent information flow.
Learned CVA attention weights
Learned attention matrices adapt to task structure: symmetric mixing in Spread, predator-focused exchange in Tag, and role-aware blocks in SMAC.
CoFlow centralized training and decentralized execution diagram
CTDE view from the paper: centralized training uses all agents' observations, while decentralized execution masks other agents and preserves coordination through learned CVA patterns.

Few-Step Inference

RQ3: When is single-pass inference enough?

Answer: On most configurations, one denoising step is already within the best 1--10 step performance range. CoFlow-base, which removes the finite-difference consistency surrogate, needs more refinement steps and is less stable at k = 1.

Inference Budget

CoFlow generally saturates within 1--3 steps, including both centralized and decentralized execution variants.

Efficiency Driver

The finite-difference surrogate compresses multi-step refinement into the learned averaged velocity field.

MPE denoising-step sweep
MPE denoising-step sweep across tasks and data qualities.
Per-configuration k-step reward versus best observed reward
Per-configuration scatter from the paper: most CoFlow-C and CoFlow-D points sit near the diagonal already at low denoising budgets.