snake — reinforcement learning, live

hunger clock 0 / 0

step without eating → +1. reaches max → dies 💀

Map 12

episode0

score0

best0

length3

ε (explore)1.000

avg score (50)0.00

tf backend…

🧠 the network, thinking

This is the snake's entire mind. On the left are the only 11 things it can sense: is there danger one step ahead/right/left, which way it's heading, and roughly where the food is. Those signals flow through two hidden layers of 128 neurons (brighter = more active) and arrive as three Q-values on the right — its estimated payoff for going straight, turning right, or turning left. The brightest one (green) is the move it makes. Early on the outputs are near-random noise; as it learns, watch the active inputs start to drive a clear, confident choice.

abbreviation key

fwd! rgt! lft!danger ahead / right / left hd↑ hd→ hd↓ hd←heading direction fd← fd→ fd↑ fd↓food direction fwd rgt lftgo straight / turn right / left

How the learning unfolds

Starvation. A random brain wanders aimlessly. The hunger clock (under the board) keeps climbing, and when it fills the snake dies without ever eating — so doing nothing useful is fatal. This is the first thing it's pressured to fix.
Survival. The cheapest lesson to learn is "don't walk into a wall or your own tail" — the danger inputs light up right before death, so the network quickly learns to avoid them. Episodes get longer; score is still ~0.
Eating. It discovers that reaching the food (+10) resets the hunger clock and keeps it alive. It starts steering by the food inputs and the score climbs.
Efficiency. A small −0.01 penalty every step means dawdling costs reward, so it stops taking scenic routes and heads more directly toward food.

ε is the exploration rate: high = mostly random moves (needed to discover what works), decaying toward greedy as it gains confidence. Use ×2 / ×3 to speed up. Progress autosaves — pause or refresh and it resumes. Resize the Map (5–20) while paused; the same brain keeps playing because its 11 senses don't depend on board size.