Formalization of intelligence

Why reinforcement learning feels like the “formalization of intelligence.”

I’ve been working through the google deepmind × ucl deep learning lecture series. I’ve already moved on to later lectures, but I wanted to pause and reflect on how the first lecture on reinforcement learning changed how I think about intelligence itself.

The reward hypothesis

Unlike supervised learning, reinforcement learning is not about extracting patterns from static data. It is the science of decision-making under uncertainty.

Any goal can be framed as maximizing expected cumulative reward. If you can define the reward signal, you can define the problem. This is deceptively powerful.

Reward vs. Value

Reward is immediate feedback. Value is a long-term prediction of future returns. This distinction explains why optimal behavior often requires sacrificing short-term reward to maximize long-term value — the kind of “delayed gratification” we associate with intelligent agents.

The structure of an agent

Policies, value functions, and models together form an agent’s internal representation of the world. The distinction between learning through interaction and planning through internal reasoning is especially compelling to me.

It feels less like programming a machine and more like designing the conditions under which intelligence can emerge. That’s what makes it feel like a formalization of something deeper.