CoFlow: Results

Benchmark Results

RQ1: Can CoFlow match or exceed strong offline MARL baselines across continuous and discrete benchmarks?

Answer: Yes. CoFlow-C is best on most table entries, while CoFlow-D remains competitive under decentralized execution. The gains are most visible on coordination-heavy MPE tasks.

Main Result Tables

Full result tables are inserted as screenshots. Bold marks the best method in each row, underline marks the second-best method, and tables scroll horizontally on narrow screens.

Table 1: MPE

Continuous-action cooperative benchmark on Spread, Tag, and World.

Full MPE performance table from the CoFlow paper

Table 2: SMAC

Discrete-action StarCraft benchmark with partial observability.

Full SMAC performance table from the CoFlow paper

Table 3: MA-MuJoCo

Continuous-action locomotion benchmark on 2xAnt and 4xAnt.

Full MA-MuJoCo performance table from the CoFlow paper

Coordination Evidence

RQ2: Do the return gains flow through inter-agent coordination rather than stronger per-agent capacity?

Answer: Yes. Scaling the CVA gate produces a dose-response effect in both reward and landmark coverage, and disabling CVA hurts every tested MPE task.

CVA gate scans show monotonic improvements in reward and landmark coverage. Direct CVA-off ablations confirm that the gains come from cross-agent information flow.

Learned attention matrices adapt to task structure: symmetric mixing in Spread, predator-focused exchange in Tag, and role-aware blocks in SMAC.

CoFlow centralized training and decentralized execution diagram

CTDE view from the paper: centralized training uses all agents' observations, while decentralized execution masks other agents and preserves coordination through learned CVA patterns.

Few-Step Inference

RQ3: When is single-pass inference enough?

Answer: On most configurations, one denoising step is already within the best 1--10 step performance range. CoFlow-base, which removes the finite-difference consistency surrogate, needs more refinement steps and is less stable at k = 1.

Inference Budget

CoFlow generally saturates within 1--3 steps, including both centralized and decentralized execution variants.

Efficiency Driver

The finite-difference surrogate compresses multi-step refinement into the learned averaged velocity field.

MPE denoising-step sweep across tasks and data qualities.

Per-configuration k-step reward versus best observed reward

Per-configuration scatter from the paper: most CoFlow-C and CoFlow-D points sit near the diagonal already at low denoising budgets.