Course Description:
This course guides students through the fascinating journey from classical Shatranj (ancient chess) to modern chess AI engines. It starts with Python programming, Object-Oriented Programming (OOP), and control flow, then gradually moves into chess puzzles, Horse Tour, Eight Queens, Minimax & Alpha-Beta, and Suli’s Diamond by teaching various chess AI algorithms.
In the advanced sections, students will explore Reinforcement Learning, Deep Q-Learning, Policy Gradient, Monte Carlo Tree Search, and the architecture behind AlphaZero. They will also learn how to customize Shatranj-compatible AI engines and implement self-play training on MiniChess and Connect-4 projects.
By the end of this course, students will be able to:
-
Create games and AI simulations using Python and OOP
-
Apply Minimax, Alpha-Beta pruning, and adversarial search techniques
-
Solve chess puzzles like Knight’s Tour and Eight Queens
-
Understand Reinforcement Learning and AlphaZero architecture
-
Build their own MiniChess or Connect-4 AI projects
Who this course is for:
-
Chess enthusiasts curious about AI
-
Students eager to learn Python programming
-
Anyone interested in game theory and Reinforcement Learning
Section 1: Foundations & Python Basics (Lessons 1–6)
-
1Lesson 1 – Project and Curriculum Scope and Priorities
Summary: Introduces Shatranj.AI project, Erasmus+ foundations, partner organizations, digital platforms.
- Introduces the Shatranj.AI project, its Erasmus+ foundations, partner organizations, and digital platforms.
- Project vision, Erasmus KA2 context
- Partner institutions and cultural heritage focus
- Overview of platforms (editor, LMS, code tools)
- Teacher roles and student outcomes
- Curriculum structure overview
- Python/Jupyter introduction
-
2Lesson 2 – Introduction to Computing & Python Setup
Basic computing concepts and Python/Jupyter installation.
Login required to access full content.- Students learn core computing concepts and set up Python/Jupyter.
- CPU, RAM, I/O basics
- Bits, bytes, binary representation
- JupyterLab installation
- First Python notebook execution
- Variables, simple expressions
- Access to Drive folders
-
3Lesson 3 – Python Data Types
Numbers, strings, lists, and essential data handling in Python.
Login required to access full content.- Covers Python’s built‑in data types and basic operations.
- Integers, floats, strings, booleans
- Type conversion
- Lists and indexing
- Mutability concepts
- Chess-pieces-as-strings exercises
-
4Lesson 4 – Conditionals, Loops, Control Flow
If/else logic, loops, and control flow fundamentals.
Login required to access full content.- Introduces logic, loops, and interactive programs.
- If/elif/else logic
- Boolean operations
- For/while loops
- Break/continue
- Simple input programs
-
5Lesson 5 – Functions, Scope, Lambda
Writing functions, parameters, returns, and variable scope.
Login required to access full content.- Teaches modular code with functions.
- Defining functions
- Parameters and returns
- Local/global scope
- Lambdas
- Small functional project (piece-value calculator)
-
6Lesson 6 – Files, Exceptions, Libraries, Testing
Reading/writing files, handling errors, and using libraries.
Login required to access full content.- Working with files and robust code.
- File read/write
- Try/except
- Importing libraries
- Simple testing
- Handling invalid inputs
Section 2: Object-Oriented Programming & Board Game Modeling
Section 3: Chess Foundations & Engine Code
-
8Lesson 8– Chess & Shatranj Board Representation
Board representation and basic piece movement logic.
Login required to access full content.- Students start building engine components.
- Board representations
- Piece movement basics
- UTF-8 rendering
- Chess board editor tools
- Printing and moving pieces
-
9Lesson 9– Piece Movement, Game State Updates, and Terminal Conditions
Board representation and basic piece movement logic.
Login required to access full content.- Students start building engine components.
- Board representations
- Piece movement basics
- UTF-8 rendering
- Chess board editor tools
- Printing and moving pieces
Section 4: Classical and Adverserial Search Algorithms
-
10Lesson 10– Search Problems and Graph Traversal (DFS, BFS, UCS)
DFS, BFS, UCS, and A* search foundations.
Login required to access full content.- Core AI search algorithms for games.
- Search problem structure
- DFS, BFS, UCS
- A* and heuristics
- Minimax introduction
- Pacman/chess examples
- Graph-tracing exercises
-
11Lesson 11 – Heuristic Search and Adversarial Game Trees (A*, minimax, expectiminimax, alpha–beta)
Recursion, backtracking, and heuristic solutions.
Login required to access full content.-
Search problem structure
-
DFS, BFS, UCS
-
A* and heuristics
-
Minimax introduction
-
Expectiminimax
-
Pacman/chess examples
-
Graph-tracing exercises
- Alpha-Beta
-
Section 5: Solving Ancient Chess Puzzles with AI algorithms
-
12Lesson 12 – Horse Tour
Recursion, backtracking, and heuristic solutions.
Login required to access full content.- Explores Knight’s Tour with recursion and heuristics.
- Knight graph movement
- Open/closed tours
- Backtracking with DFS
- Warnsdorff heuristic
- Connection to TSP
-
13Lesson 13 – Eight Queens Puzzle
Backtracking and constraint-based problem solving.
Login required to access full content.- Constraint satisfaction with backtracking.
- Queen attack logic
- Recursive search
- Optimization techniques
- Historical queen references
- Notebook implementations
-
14Lesson 14 – Wheat & Chessboard Problem, Rook Polynomials, Smullyan and other Logic Puzzles, Magic Squares
Exponential growth and classic logic puzzles related to the chessboard
Login required to access full content.-
Chess origin story: Wheat and the chessboard
-
Rook Polynomials
-
Smullyan logic puzzles and retroanalysis
-
Magic Squares
-
-
15Lesson 15 – Minimax, Alpha-Beta, Heuristics, Mate Search, and Rumi and Dilaram Mates
Advanced search, pruning, and checkmate logic.
Login required to access full content.- Deep adversarial search and chess endgames.
- Minimax computation
- Alpha-beta pruning
- Opposition, triangulation
- Historical sources (Al-Adli er-Rumi, Rumi's mate from Alfonso's book and Kitab ash-Shatranj)
Section 7: The intertwined history of artificial intelligence and modern chess software
Section 8: Reinforcement Learning
-
18Lesson 18 – Reinforcement Learning Foundations: Gridworld, Dynamic Programming, and Complexity
Introduces reinforcement learning (RL) by solving a small gridworld exactly when the rules are known, then shows why this “all‑knowing” approach breaks for large games like chess.
· Agent–environment loop; states, actions, rewards, episodes; discount factor γ.
· Policy evaluation (“drifting robot”) and value iteration (“treasure hunter”) using Bellman backups.
· Visual value propagation and deriving an optimal policy from the value function.
· The curse of dimensionality: state‑space vs game‑tree complexity; Shannon number motivation.
· Historical “giant games” (e.g., Tamerlane chess, Go) as context for why learning is needed.
Login required to access full content.
-
19Lesson 19 – The Frozen Rook: Tabular Q-Learning on FrozenLake
Moves from planning to learning: the agent starts with no map and learns a policy by trial and error using tabular Q-learning.
Login required to access full content.· Formulate FrozenLake/Frozen Rook as an MDP: S, A, R, P, terminal states, γ.
· Q-learning update rule and ε-greedy exploration (exploration→exploitation schedule).
· Train an agent in Gymnasium FrozenLake; compare deterministic vs slippery transitions.
· Inspect what was learned via Q-table heatmaps / policy arrows; tune α, γ, ε and episode counts.
· Scaling lessons: sparse rewards, delayed credit, and why larger maps are harder.
-
20Lesson 20 – Two Rooks vs Lone King: Learning Checkmate with Temporal-Difference Q-Learning
Applies Q-learning to a small chess endgame and makes the RL codebase “real” by separating the experiment notebook from the learning and training modules.
Login required to access full content.
· Temporal-Difference (TD) learning: identify the TD error inside the Q-learning update; why TD updates during play.
· Why Monte Carlo learning is too slow for chess-like, delayed-reward games.
· Engineering stack: rl.py (Q-memory + TD update), trainer.py (episode loop, exploration schedule), notebook as the lab.
· Encode chess positions as machine-readable state (FEN) and train a tabular agent on a bounded endgame/puzzle state space.
· Limits: why tabular methods fail for full chess (curse of dimensionality) and the need for function approximation.
-
21Lesson 21 – Deep Q-Networks: From Q-Tables to Neural Networks
Introduces function approximation for RL by replacing the Q-table with a neural network (DQN) and applying it to several small board games.
Login required to access full content.
· Why Q-tables don’t scale: too many states; generalization requires a model that can “guess” values for unseen positions.
· Deep Q-Network (DQN) training loop: replay buffer, target network, mini-batch updates, ε-decay.
· Implement and experiment with DQN on games such as Connect-4 (4Connect), Fox & Hounds, and Othello/Reversi.
· Diagnostics: learning curves, stability issues (overestimation, divergence) and practical mitigations.
· Compare approaches: DQN vs NNUE-style evaluation and handcrafted evaluation (HCE) to discuss architecture tradeoffs.
- States → value
- States → probability distribution over moves
- Explain why tabular methods fail in full chess
- Show that we need function approximation
- Introduce NNEU concept here:
What is NNEU?
– neural network evaluation unit
– a neural net replacing hand-coded evaluation
– outputs value (who is better)
– outputs policy (what moves are likely good)
The students don’t need deep math; only intuition
Deep Q-Learning (DQN) with simple games
- Use Snake or Catch
- Teach replay buffer, target network
- Show why DQN does NOT work for chess (action space too large), preparing AlphaZero.
Policy gradient & actor-critic basics
(Short overview, not too mathematical)
- Policy gradient: learn policy directly
- Actor-critic: policy (actor) + value estimate (critic)
- Prepare students for AlphaZero’s architecture (policy + value output).
Monte Carlo Tree Search (full lesson)
- Expand nodes
- Simulate
- Backup values
- Choose action using visit counts
- Show how policy prior improves MCTS
- Show how MCTS improves the policy network → the AlphaZero training loop emerges
Putting it all together: the AlphaZero algorithm
- Self-play generates games
- MCTS guided by neural net creates improved policy
- Neural net trained on (state, policy_targets, value)
- Iteration of self-play → training → stronger MCTS → stronger games
- Show how AlphaZero solves simple mini-chess (4×4, 5×5)
- Connect back to your 2-rook checkmate lesson
Students now understand:
– how RL can solve small tasks
– why full chess needs powerful NNEUs, MCTS, and self-play
Build a mini AlphaZero for Connect-4 or MiniChess
- Connect-4 board is perfect for classroom AlphaZero demo
- Implement:
– neural net (small CNN or MLP)
– MCTS
– self-play training
Lc0 is a direct open-source implementation inspired by DeepMind's AlphaZero project, and it has far exceeded its success in chess.
https://lczero.org/ -
22Lesson 22 - Monte Carlo Rollouts and MCTS on Qirkat
Builds a complete Qirkat environment and then progresses from random rollouts to full Monte Carlo Tree Search (MCTS) with UCT selection.
· Implement Qirkat rules backbone (5×5 board, C3 empty) and the maximum-capture rule that forces capture sequences.
· Move generation that enumerates capture lines, enforces compulsory capture, and filters to maximum-length captures.
· Monte Carlo baselines: random rollouts and flat Monte Carlo move evaluation before adding tree reuse.
· MCTS pipeline: selection, expansion, rollout/evaluation, backpropagation; UCT/visit-count final move choice.
· Reproducible game logs and audit tooling for step-by-step playback and debugging.
Section 9: AlphaZero
-
23Lesson 23 – AlphaZero on Reversi / Othello
Develop AlphaZero on Othello game and have it play vs minimax with heuristic opponent at progressively deeper difficulties
-
24Lesson 24 – AlphaZero on Qirkat: PUCT, Policy/Value Nets, and Self-Play
Upgrades MCTS into AlphaZero-style search by adding a neural network that supplies a policy prior and a value estimate, then trains through self-play.
Login to access full curriculum
· Bridge intuition with a tiny ‘Connect2’ AlphaZero demo, then transfer the ideas to Qirkat.
· Replace UCT with PUCT: combine visit statistics with a learned policy prior to guide exploration.
· Neural network heads: policy (move probabilities) and value (position evaluation) used in place of random rollouts.
· AlphaZero loop: self-play → training targets (π, z) → network update → repeat; evaluate via tournament matches/logs.
· Path-aware move encoding for variable-length capture sequences so different capture paths remain distinct.
-
25Lesson 25 – Turkish Checkers (Dama): Alpha-Beta, PUCT-guided MCTS, Alpha Zero
Implements Turkish Checkers and compares classical search (alpha–beta) with MCTS using a reusable match runner and batch simulation logs.
· Game engine: board representation, legal moves with multi-jump captures, and move-path encoding.
· Evaluation function plus Negamax/Alpha-Beta search agent; depth vs strength tradeoffs.
· MCTS agent for Turkish Checkers and head-to-head comparisons against alpha–beta.
· Universal match runner (play_game) and batch simulation utilities for reproducible experiments.
· Exportable logs (zipped) for classroom review and debugging.




