mono/packages/ui/docs/tetris-neural-ai.md
2026-02-08 15:09:32 +01:00

4.9 KiB
Raw Blame History

Tetris Neural AI Analysis Report

Date: 2026-02-07 Target File: src/apps/tetris/neuralAI.ts Context: neuralNetwork.ts, aiStrategies.ts, ai_heuristics.md (KI)

Executive Summary

The neuralAI.ts file acts as the bridge between the raw Neural Network engine (neuralNetwork.ts) and the game logic. It implements a "Self-Reinforcing Imitation" learning strategy, where the AI learns to predict the heuristic weights that led to successful games.

The implementation is Architecturally Sound and strictly follows the design documented in the Knowledge Items. The versioning system is robust for the network itself.


Detailed Analysis

1. Learning Paradigm: "Success Imitation"

Status: Valid Strategy

  • Mechanism: The train method in neuralNetwork.ts scales the error gradients by the reward (delta.scale(example.reward)).
  • Implication:
    • High Reward (Good Games): The network strongly updates to match the weights used in that game.
    • Zero Reward (Bad Games): The network effectively ignores these examples. It does not "learn from mistakes" (negative reinforcement); it only "learns to repeat success" (positive reinforcement).
  • Assessment: This is a safe and stable approach for this specific problem domain. It prevents the network from learning "what NOT to do" (which is efficient but unstable) and focuses on "what TO do".

2. Feature Normalization & Inputs

Status: Correct (Tetris-Specific) The normalization constants in extractBoardFeatures are well-calibrated for standard Tetris:

  • linesCleared / 4: Perfectly matches the max lines (Tetris).
  • holesCreated / 5: Reasonable upper bound for a single placement. (A generic "I" piece placement rarely creates >4 holes).
  • rowTransitions / 50: A generic board has 10 columns * 20 rows. Max transitions per row is ~10. Summing transitions across relevant surface rows... 50 is a safe normalization ceiling.

3. Output Scaling (Weights Configuration)

Status: Intentional Bias As noted in the Strategies report, the outputToWeights function applies specific scalars that bias the AI towards survival:

  • The lineCleared weight is capped at 1000 (vs 10,000 in manual defaults).
  • The holesCreated penalty is capped at 3000 (vs 800 in manual defaults).
  • Correction: This confirms the Neural AI is designed to be a "Survivalist" agent, prioritizing board cleanliness over aggressive scoring. This is optimal for Consistency, though it might achieve lower "high scores" than a risky greedy algorithm.

4. Versioning & Data Safety

Status: Robust

  • Versioning: The NETWORK_VERSION_KEY assertion ensures that if the code structure changes (e.g., v5 -> v6), old incompatible localStorage data is wiped.
  • Manual Overrides: The system correctly prioritizes MANUAL_OVERRIDE_KEY, allowing developers to forcibly guide the AI during debugging without corrupting the trained network.

5. Code Quality & Performance

Status: Efficient

  • Batching: Training is performed in batches of 10 (neuralAI.ts: batchSize = 10), preventing UI thread freeze during backpropagation.
  • Memory: The Matrix class implementation in neuralNetwork.ts is a lightweight wrapper around standard arrays, suitable for the small network size (10 -> 16 -> 12 -> 10 = ~450 connections).
  • Reward Function: Math.min(score / 100000, 1) ** 0.5. The square root dampening prevents improved performance from causing exploding gradients in later training stages.

Potential Edge Cases

1. The "Cold Start" Initial Loop

If the initialized random network outputs weights that cause immediate game over (Reward ≈ 0), the network will never learn because delta.scale(0) kills the gradient.

  • Current Mitigation: The StrategyManager's HallOfFame likely won't have data yet.
  • Risk: The first few generations rely purely on random initialization luck.
  • Recommendation: Ensure NeuralNetwork initialization (randomize) produces weights in a range that allows some play (e.g., non-zero heightAdded penalty), or seed the initial network with "Decent" manual weights pre-trained.

2. Reward Saturation

The reward function is capped at 1.0.

  • score / 100000.
  • If the AI becomes too good (Score > 100,000), the reward flattens to 1.0.
  • The AI will stop distinguishing between "Great" (150k) and "Godlike" (500k) games.
  • Recommendation: Monitor if the AI plateaus around 100k points. If so, increase the normalization divisor.

Conclusion

The Neural AI implementation is solid. The "Imitation of Success" paradigm is well-implemented, and the rigorous version checking in neuralAI.ts protects against data corruption. The only minor long-term risk is the reward saturation cap at 100k score.