🧠 the network, thinking
This is the snake's entire mind. On the left are the only 11 things it can sense: is there danger one step ahead/right/left, which way it's heading, and roughly where the food is. Those signals flow through two hidden layers of 128 neurons (brighter = more active) and arrive as three Q-values on the right — its estimated payoff for going straight, turning right, or turning left. The brightest one (green) is the move it makes. Early on the outputs are near-random noise; as it learns, watch the active inputs start to drive a clear, confident choice.
How the learning unfolds
- Starvation. A random brain wanders aimlessly. The hunger clock (under the board) keeps climbing, and when it fills the snake dies without ever eating — so doing nothing useful is fatal. This is the first thing it's pressured to fix.
- Survival. The cheapest lesson to learn is "don't walk into a wall or your own tail" — the danger inputs light up right before death, so the network quickly learns to avoid them. Episodes get longer; score is still ~0.
- Eating. It discovers that reaching the food (+10) resets the hunger clock and keeps it alive. It starts steering by the food inputs and the score climbs.
- Efficiency. A small −0.01 penalty every step means dawdling costs reward, so it stops taking scenic routes and heads more directly toward food.
ε is the exploration rate: high = mostly random moves (needed to discover what works), decaying toward greedy as it gains confidence. Use ×2 / ×3 to speed up. Progress autosaves — pause or refresh and it resumes. Resize the Map (5–20) while paused; the same brain keeps playing because its 11 senses don't depend on board size.