From Imitation to Exploration to World Models

By Guowei Zou | December 31, 2025

Here's something I've been thinking about: the way robots learn and the way we learn aren't all that different.

Learning to Copy First

Anyone who works on robot policy learning knows this: let a model explore randomly from scratch, and it's going to be a disaster. You have to show it how an expert does things first. That's behavior cloning, pretty standard stuff in the field.

Think about how we pick up new skills. Speaking, writing, riding a bike. We all start by watching someone else and copying what they do. Even in grad school, your first paper is basically imitating the structure of a few good ones you've read.

Imitation doesn't sound glamorous, but it works. It gives you a baseline that actually runs, so you don't crash and burn right out of the gate.

When to Break Free

The thing is, imitation alone won't get you very far.

I run into this all the time in experiments: a model does fine within the training distribution, then falls apart as soon as the scenario changes. It learned what to do, but not why. So when things shift even slightly, it's lost.

I've been through similar phases myself. Early on, I kept looking for the "right" methodology, thinking I just needed to follow the recipe. Then I realized most of those "right ways" come with assumptions baked in. Change the problem, and they might not apply anymore.

So when should you move from imitation to exploration? I think there's a signal: when you start questioning the standard answer.

Not just to be contrarian, but because you notice the answer rests on assumptions, and your situation might be different.

Exploration Can Go Wrong Too

Of course, being willing to explore doesn't mean you'll explore in the right direction.

In RL, there's this concept called the world model: the agent's internal picture of how things work. If that picture is wrong, the more you explore, the further off track you go. It's like navigating with a bad map. The farther you walk, the more lost you get.

Same goes for people. If someone grows up surrounded by bad information, their mental model is already flawed. All the "independent thinking" in the world won't help if you're just spinning inside a broken framework.

So maybe the real question isn't "should I explore?" It's: Is what you're learning actually true? Are your sources reliable? Is the feedback you're getting pointing you in the right direction?

Finding Your Own Path

At the end of the day, whether you're training a robot or figuring out what to do with your life, the core question is similar: when to follow, and when to try things on your own.

There's no formula for that. But if you keep questioning, keep updating your mental model based on what you see, you're probably headed somewhere good.

Finding your own path isn't about knowing the destination upfront. It's about correcting the map as you go.

Back to Home

From Imitation to Exploration to World Models

Learning to Copy First

When to Break Free

Exploration Can Go Wrong Too

Finding Your Own Path

Comments & Discussion